# Using BERT

In this notebook, we will learn how to use a pre-trained and fine-tuned BERT model using [🤗 Hugging Face](https://huggingface.co/). 🤗 HuggingFace is an initiative aimed at standardizing the usage of Transformer-based models. It comprises a series of packages facilitating the training, fine-tuning, and deployment of this family of models, along with tools for data manipulation to enable training and evaluation. Additionally, 🤗 Hugging Face provides a Virtual Hub, allowing users to upload models and datasets for various tasks. This enables other users to leverage and conduct their own training and fine-tuning processes. Additionally, Hugging Face offers a free online mini-course on how to use its tools and the main concepts behind them.

## 0. Understanding the Model

In this example, we will utilize the *bhadresh-savani/bert-base-go-emotion* model, available on the Hugging Face Hub. This model has been fine-tuned from a  BERT model for the emotion prediction task. This task is a multiclass variation of the Sentiment Analysis task, aiming to identify the primary emotion conveyed by a text in natural language. As it is built upon the original BERT model, this model is specifically designed to process texts in English.

## 1. Installing the Hugging Face Packages

In [1]:
!pip install transformers tokenizers datasets



## 2. Instantiating the Tokenizer

In [2]:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bhadresh-savani/bert-base-go-emotion")

tokenizer_config.json:   0%|          | 0.00/333 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

## 3. Experimenting with the Tokenizer

In [3]:
text="This was mine. How extraordinary! And it looks the same as it did the last time I saw it."
token_ids = tokenizer.encode(text)
decoded_ids = tokenizer.decode(token_ids)
tokens = [tokenizer.decode(tid) for tid in token_ids]

print(token_ids)
print(decoded_ids)
print(tokens)

[101, 2023, 2001, 3067, 1012, 2129, 9313, 999, 1998, 2009, 3504, 1996, 2168, 2004, 2009, 2106, 1996, 2197, 2051, 1045, 2387, 2009, 1012, 102]
[CLS] this was mine. how extraordinary! and it looks the same as it did the last time i saw it. [SEP]
['[CLS]', 'this', 'was', 'mine', '.', 'how', 'extraordinary', '!', 'and', 'it', 'looks', 'the', 'same', 'as', 'it', 'did', 'the', 'last', 'time', 'i', 'saw', 'it', '.', '[SEP]']


In [4]:
tokenizer.encode_plus(text)

{'input_ids': [101, 2023, 2001, 3067, 1012, 2129, 9313, 999, 1998, 2009, 3504, 1996, 2168, 2004, 2009, 2106, 1996, 2197, 2051, 1045, 2387, 2009, 1012, 102], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

## 4. Instantiating the Model

In [5]:
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("bhadresh-savani/bert-base-go-emotion")

config.json:   0%|          | 0.00/2.10k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

In [6]:
print(model)

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12,

## 5. Using the Model Pipeline

In [7]:
from transformers import pipeline

classifier = pipeline(task="text-classification", model=model, tokenizer=tokenizer)
phrases=["I love you!", "I agree with you.",
         "Sorry but i'm not undertsanding it. Are they equal our different?",
         "There is no hope in this world."]

for phrase in phrases:
  print(phrase)
  print(classifier(phrase))
  print('___________________________')

I love you!
[{'label': 'love', 'score': 0.929985523223877}]
___________________________
I agree with you.
[{'label': 'approval', 'score': 0.8416864275932312}]
___________________________
Sorry but i'm not undertsanding it. Are they equal our different?
[{'label': 'remorse', 'score': 0.4549950063228607}]
___________________________
There is no hope in this world.
[{'label': 'optimism', 'score': 0.6909884214401245}]
___________________________


In [8]:
classifier = pipeline(task="text-classification", model=model, tokenizer=tokenizer,
                      return_all_scores=True)
classifier("Sorry but i'm not undertsanding it. Are they equal our different?")



[[{'label': 'admiration', 'score': 0.002495299093425274},
  {'label': 'amusement', 'score': 0.002659096149727702},
  {'label': 'anger', 'score': 0.007812724448740482},
  {'label': 'annoyance', 'score': 0.014419985003769398},
  {'label': 'approval', 'score': 0.007302277255803347},
  {'label': 'caring', 'score': 0.012592120096087456},
  {'label': 'confusion', 'score': 0.06595852226018906},
  {'label': 'curiosity', 'score': 0.15274281799793243},
  {'label': 'desire', 'score': 0.002141708042472601},
  {'label': 'disappointment', 'score': 0.02836877666413784},
  {'label': 'disapproval', 'score': 0.018133215606212616},
  {'label': 'disgust', 'score': 0.0059063308872282505},
  {'label': 'embarrassment', 'score': 0.031151730567216873},
  {'label': 'excitement', 'score': 0.0038021565414965153},
  {'label': 'fear', 'score': 0.0019308759365230799},
  {'label': 'gratitude', 'score': 0.024720745161175728},
  {'label': 'grief', 'score': 0.005955155473202467},
  {'label': 'joy', 'score': 0.0032954646