# __Introduction to BERT and Transformers Library__
- BERT stands for Bidirectional Encoder Representations from Transformers.
- BERT is pre-trained on a large corpus of unlabeled text, including the entire Wikipedia (that's 2,500 million words!) and the Book Corpus (800 million words).
- BERT is based on the Transformer architecture.

## Steps to be followed:
1. Import the required libraries
2. Analyze the sentiment using the transformer pipeline
3. Create text generation
4. Create named entity recognition (NER)
5. Generate a masked language model using a model and a tokenizer

### Step 1: Import Required Libraries
- The code from the transformers import pipeline allows for easy access to pre-trained models and simplified execution of NLP tasks using the transformers library.



In [1]:
from transformers import pipeline

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
import os
# Disable oneDNN optimizations to avoid potential minor numerical differences caused by floating-point round-off errors.
os.environ['TF_ENABLE_ONEDNN_OPTS'] = '0'

In [3]:
from transformers import pipeline
import torchvision
torchvision.disable_beta_transforms_warning()

### Step 2: Analyze Sentiment Using Transformer Pipeline

- Import the pipeline function from the Transformers library, which enables easy access to pre-trained NLP models
- The snippet creates a sentiment analysis pipeline using the pre-trained model and uses it to classify the sentiment of the input text **I hate you**
- The result, including the sentiment label and score, is then printed

In [4]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")

result = classifier("I hate you")[0]
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use mps:0


label: NEGATIVE, with score: 0.9991


- Perform sentiment analysis on the text **I love you**.
- Print the sentiment analysis result.

In [5]:
result = classifier("I love you")[0]
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")

label: POSITIVE, with score: 0.9999


In [6]:
result = classifier("The food was not bad.")[0]
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")

label: POSITIVE, with score: 0.9989


**Observation**
- The sentiment analysis model is highly confident that the sentiment of the text **I love you** is positive, with a score of 0.9999.

### Step 3: Create Text Generation
- It creates a text generation pipeline using the pipeline function from the Transformers library.
- It generates text starting with the provided prompt **As far as I am concerned, I will** using the text generation pipeline, with a maximum length of 50 tokens and without sampling, which is deterministic output.
- The generated text is then printed.

In [11]:
text_generator = pipeline("text-generation")
print(text_generator("As far as I am concerned, I will",
      max_new_tokens=15, do_sample=False, truncation=True))

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use mps:0
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'As far as I am concerned, I will be the first to admit that I am not a fan of the idea of'}]


### Step 4: Create Named Entity Recognition (NER)
- It creates a NER pipeline using the pipeline function from the Transformers library.

- It applies the NER pipeline to the provided sequence, which is a text containing named entities. The pipeline identifies and extracts named entities such as organization names **Hugging Face Inc.**, locations **New York City**, and others. The extracted entities are then printed.

In [20]:
ner_pipe = pipeline("ner")
sequence = """Please get me a pepperoni pizza, medium size with pine-apple and cheese """

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use mps:0


- Print the Entities after Performing Named Entity Recognition on the Sequence

In [22]:
for entity in ner_pipe(sequence):
    print(entity)


### Step 5: Generate Masked Language Model Using a Model and a Tokenizer

- Masked Language Modeling Using a Model and a Tokenizer
  - Masked language modeling is a task where a model fills in masked tokens in a sequence, improving its understanding of language. It involves predicting missing tokens by considering the context of surrounding words.

- The process includes the following steps:
  - Instantiate a tokenizer and a model from the checkpoint name.
  - Define a sequence with a masked token, placing the tokenizer.mask_token instead of a word.
  - Encode that sequence into a list of IDs and find the position of the masked token in that list.
  - Retrieve the predictions at the index of the masked token
  - Retrieve the top 5 tokens using the PyTorch topk or TensorFlow top_k methods
  - Replace the masked token with the tokens and print the results

 ### Masked Language Modeling
- Import the necessary modules from the transformers library and torch
- Load the pre-trained tokenizer and model
- Define the input sequence with a masked token
- Tokenize the input sequence and convert to tensors
- Find the index of the masked token and generate token predictions using the model
- Get the indices of the top 5 predicted tokens and print them in the sequence

In [23]:
from transformers import AutoModelForMaskedLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased")
model = AutoModelForMaskedLM.from_pretrained("distilbert-base-cased")

sequence = (
    "Distilled models are smaller than the models they mimic. Using them instead of the large "
    f"versions would help {tokenizer.mask_token} our carbon footprint."
)

inputs = tokenizer(sequence, return_tensors="pt")
mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]

In [24]:
print(sequence)

Distilled models are smaller than the models they mimic. Using them instead of the large versions would help [MASK] our carbon footprint.


In [25]:
print(inputs)

{'input_ids': tensor([[  101, 12120,  2050,  8683,  1181,  3584,  1132,  2964,  1190,  1103,
          3584,  1152, 27180,   119,  7993,  1172,  1939,  1104,  1103,  1415,
          3827,  1156,  1494,   103,  1412,  6302,  2555, 10988,   119,   102]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1]])}


In [26]:
print(inputs["input_ids"])

tensor([[  101, 12120,  2050,  8683,  1181,  3584,  1132,  2964,  1190,  1103,
          3584,  1152, 27180,   119,  7993,  1172,  1939,  1104,  1103,  1415,
          3827,  1156,  1494,   103,  1412,  6302,  2555, 10988,   119,   102]])


In [27]:
print(tokenizer.mask_token_id)

103


In [28]:
print(torch.where(inputs["input_ids"] == tokenizer.mask_token_id))

(tensor([0]), tensor([23]))


In [29]:
print(mask_token_index)

tensor([23])


<br>

__Predict the best 5 word that could be at the mask position__

In [30]:
token_logits = model(**inputs).logits
mask_token_logits = token_logits[0, mask_token_index, :]

In [31]:
print(token_logits)

tensor([[[ -6.6732,  -6.6450,  -6.7923,  ...,  -5.5930,  -5.2783,  -5.6559],
         [ -6.3221,  -5.6379,  -5.8990,  ...,  -4.6864,  -4.1499,  -5.3507],
         [ -5.9863,  -6.0991,  -5.8089,  ...,  -5.2297,  -4.3015,  -6.5971],
         ...,
         [ -7.8892,  -7.6719,  -7.6357,  ...,  -6.9083,  -5.5853,  -6.2459],
         [-14.7710, -14.2714, -14.1642,  ..., -11.4769, -12.1692, -13.1041],
         [-14.3695, -13.9839, -13.6330,  ..., -11.2066, -11.6754, -12.7083]]],
       grad_fn=<ViewBackward0>)


In [32]:
print(mask_token_logits)

tensor([[-5.5502, -5.6790, -5.3256,  ..., -5.4807, -4.5107, -4.2441]],
       grad_fn=<IndexBackward0>)


In [33]:
top_5_tokens = torch.topk(mask_token_logits, 5, dim=1).indices[0].tolist()

In [34]:
print(top_5_tokens)

[4851, 2773, 9711, 18134, 4607]


In [35]:
for token in top_5_tokens:
    print(tokenizer.decode([token]))
    print(sequence.replace(tokenizer.mask_token, tokenizer.decode([token])))
    print(" ")

reduce
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help reduce our carbon footprint.
 
increase
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help increase our carbon footprint.
 
decrease
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help decrease our carbon footprint.
 
offset
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help offset our carbon footprint.
 
improve
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help improve our carbon footprint.
 


**Observation**
- The output provides alternative sentence suggestions by replacing the masked token with different predicted tokens, demonstrating how using distilled models instead of larger ones can impact the carbon footprint.