# __Introduction to BERT and Transformers Library__
- BERT stands for Bidirectional Encoder Representations from Transformers.
- BERT is pre-trained on a large corpus of unlabeled text, including the entire Wikipedia (that's 2,500 million words!) and the Book Corpus (800 million words).
- BERT is based on the Transformer architecture.

## Steps to Be Followed:
1. Importing required libraries
2. Analyzing sentiment using transformer pipeline
3. Creating text generation
4. Creating named entity recognition (NER)
5. Generating masked language model using a model and a tokenizer

### Step 1: Importing Required Libraries
- The code from the transformers import pipeline allows for easy access to pre-trained models and simplified execution of NLP tasks using the transformers library.



In [None]:
from transformers import pipeline

###Step 2: Analyzing Sentiment Using Transformer Pipeline

- Import the pipeline function from the Transformers library, which enables easy access to pre-trained NLP models
- The snippet creates a sentiment analysis pipeline using the pre-trained model and uses it to classify the sentiment of the input text **I hate you**
- The result, including the sentiment label and score, is then printed

In [None]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")

result = classifier("I hate you")[0]
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Xformers is not installed correctly. If you want to use memorry_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


label: NEGATIVE, with score: 0.9991


- Perform sentiment analysis on the text **I love you**.
- Print the sentiment analysis result.

In [None]:
result = classifier("I love you")[0]
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")

label: POSITIVE, with score: 0.9999


**Observation**
- The sentiment analysis model is highly confident that the sentiment of the text **I love you** is positive, with a score of 0.9999.

### Step 3: Creating Text Generation
- It creates a text generation pipeline using the pipeline function from the Transformers library.
- It generates text starting with the provided prompt **As far as I am concerned, I will** using the text generation pipeline, with a maximum length of 50 tokens and without sampling, which is deterministic output.
- The generated text is then printed.

In [None]:
text_generator = pipeline("text-generation")
print(text_generator("As far as I am concerned, I will", max_length=50, do_sample=False))


No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'As far as I am concerned, I will be the first to admit that I am not a fan of the idea of a "free market." I think that the idea of a free market is a bit of a stretch. I think that the idea'}]


### Step 4: Creating Named Entity Recognition (NER)
- It creates a NER pipeline using the pipeline function from the Transformers library.

- It applies the NER pipeline to the provided sequence, which is a text containing named entities. The pipeline identifies and extracts named entities such as organization names **Hugging Face Inc.**, locations **New York City**, and others. The extracted entities are then printed.

In [None]:
ner_pipe = pipeline("ner")
sequence = """Hugging Face Inc. is a company based in New york city. Manhattan bridge is visible from the window."""

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

- Print the Entities after Performing Named Entity Recognition on the Sequence

In [None]:
for entity in ner_pipe(sequence):
    print(entity)

{'entity': 'I-ORG', 'score': 0.99954754, 'index': 1, 'word': 'Hu', 'start': 0, 'end': 2}
{'entity': 'I-ORG', 'score': 0.99008596, 'index': 2, 'word': '##gging', 'start': 2, 'end': 7}
{'entity': 'I-ORG', 'score': 0.9977804, 'index': 3, 'word': 'Face', 'start': 8, 'end': 12}
{'entity': 'I-ORG', 'score': 0.99955183, 'index': 4, 'word': 'Inc', 'start': 13, 'end': 16}
{'entity': 'I-LOC', 'score': 0.99441695, 'index': 11, 'word': 'New', 'start': 40, 'end': 43}
{'entity': 'I-LOC', 'score': 0.86684936, 'index': 12, 'word': 'yo', 'start': 44, 'end': 46}
{'entity': 'I-LOC', 'score': 0.9872277, 'index': 13, 'word': '##rk', 'start': 46, 'end': 48}
{'entity': 'I-LOC', 'score': 0.9989699, 'index': 16, 'word': 'Manhattan', 'start': 55, 'end': 64}



### Step 5: Generating Masked Language Model Using a Model and a Tokenizer

- Masked Language Modeling Using a Model and a Tokenizer
  - Masked language modeling is a task where a model fills in masked tokens in a sequence, improving its understanding of language. It involves predicting missing tokens by considering the context of surrounding words.

- The process includes the following steps:
  - Instantiate a tokenizer and a model from the checkpoint name.
  - Define a sequence with a masked token, placing the tokenizer.mask_token instead of a word.
  - Encode that sequence into a list of IDs and find the position of the masked token in that list.
  - Retrieve the predictions at the index of the masked token
  - Retrieve the top 5 tokens using the PyTorch topk or TensorFlow top_k methods
  - Replace the masked token with the tokens and print the results

 ### Masked Langauge Modeling
- Import the necessary modules from the transformers library and torch
- Load the pre-trained tokenizer and model
- Define the input sequence with a masked token
- Tokenize the input sequence and convert to tensors
- Find the index of the masked token and generate token predictions using the model
- Get the indices of the top 5 predicted tokens and print them in the sequence
- Load the pre-trained tokenizer and model
- Define the input sequence with a masked token
- Tokenize the input sequence and convert to tensors
- Find the index of the masked token and generate token predictions using the model
- Get the indices of the top 5 predicted tokens and print them in the sequence
- Print the top 5 predicted tokens in the masked position

In [None]:
from transformers import AutoModelForMaskedLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased")
model = AutoModelForMaskedLM.from_pretrained("distilbert-base-cased")

sequence = (
    "Distilled models are smaller than the models they mimic. Using them instead of the large "
    f"versions would help {tokenizer.mask_token} our carbon footprint."
)

inputs = tokenizer(sequence, return_tensors="pt")
mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]

token_logits = model(**inputs).logits
mask_token_logits = token_logits[0, mask_token_index, :]

top_5_tokens = torch.topk(mask_token_logits, 5, dim=1).indices[0].tolist()

for token in top_5_tokens:
    print(sequence.replace(tokenizer.mask_token, tokenizer.decode([token])))

Downloading (…)okenizer_config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/411 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/263M [00:00<?, ?B/s]

Distilled models are smaller than the models they mimic. Using them instead of the large versions would help reduce our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help increase our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help decrease our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help offset our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help improve our carbon footprint.


**Observation**
- The output provides alternative sentence suggestions by replacing the masked token with different predicted tokens, demonstrating how using distilled models instead of larger ones can impact the carbon footprint.