Prereq: Go through `download-models.ipynb` before this to download the required models. 

We are using the [pipeline()](https://huggingface.co/docs/transformers/v4.46.3/en/main_classes/pipelines) class in the [transformers](https://huggingface.co/docs/transformers/index) library of huggingface (HF) to help with inference for BERT. The `pipeline()` class abstracts mose of the complex code from the library. Making it easy to conduct common tasks.

We also create and set an environment variable `HF_HUB_OFFLINE` to 1 to ensure that we run inference offline using the models stored locally. Otherwise, the transformers library might download the model from HF servers and store it in cache.

In [1]:
import os
from transformers import pipeline

# Ensures no online calls are made by setting the environment variable to 1
os.environ["HF_HUB_OFFLINE"] = "1" 

In this code, we create a callable `pipeline` object for question answering using BERT, with three arguments:
- `"question-answering"` specifies the task type. The `pipeline()` function supports many predefined tasks, such as text classification, etc.
- Set the `model` path correctly to match the local location of your BERT model.
- `device="mps"` allows inference on the Apple M-series GPU for better performance.

#### Why question-answering?
BERT is a limited model due to being encoder-only. We cannot expect text generation like that from GPT-2. BERT works great for very specific tasks like finding answers within a context (question answering), text classification, fill mask, token classification, etc.

In this notebook, we use BERT for question answering and hence define 2 variables `context` and `question` and pass them to the pipeline object we create.

In [2]:
# Load the question-answering pipeline with a pre-trained BERT model on the mps device
qa_pipeline = pipeline("question-answering", model="../../models/huggingface/bert-large-uncased-whole-word-masking-finetuned-squad_qa", device = "mps")

# Define the context and question
context = "Here is the definition taught to us: Instead of forces, Lagrangian mechanics uses the energies in the system. The central quantity of Lagrangian mechanics is the Lagrangian, a function which summarizes the dynamics of the entire system. Overall, the Lagrangian has units of energy, but no single expression for all physical systems. Any function which generates the correct equations of motion, in agreement with physical laws, can be taken as a Lagrangian.\n\n But doesn't this have circular reasoning? You are saying a given L is a correct lagrangian if it produces the correct equations of motion."
question = "What sort of reasoning does the user claim the definition to have?"

Next, we call the pipeline object created above and pass the question and context as arguments. The way BERT has been trained is to search through the `context` string for the answer. 

How does it work? 
- There are two variables, `start_index` and `end_index`
- For each character in the string, BERT assigns the probability of it being the start index / end index
- Once that is done, it takes the substring between the `start_index` and `end_index` as the answer

In [3]:
# Get the answer
result = qa_pipeline(question=question, context=context)

# Output the result
print(f"Answer: {result['answer']}")
print(f"Score: {result['score']}")
print(f"Index of startiing character: {result['start']}")
print(f"Index of ending character (excluded): {result['end']}")


Answer: circular
Score: 0.809292733669281
Index of startiing character: 482
Index of ending character (excluded): 490


Next, we use a `fill-mask` pipeline with BERT.
- `"fill-mask"` is a pre-defined task to predict the masked word in a given text
- `[MASK]` is the token pre-defined in the config to be a placeholder


Calling `fill_mask_pipeline(text)` predicts the masked word. The `for` loop iterates over the predictions and prints each result along with scores indicating confidence levels.

In [4]:
fill_mask_pipeline = pipeline("fill-mask", model="../../models/huggingface/bert-large-uncased-whole-word-masking_mlm", device="mps")
text = "The Milky Way is a [MASK] galaxy."

for results in fill_mask_pipeline(text):
    print(results)

BertForMaskedLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model architecture class, please contact the model code owner to update it.


{'score': 0.8601288795471191, 'token': 12313, 'token_str': 'spiral', 'sequence': 'the milky way is a spiral galaxy.'}
{'score': 0.026198457926511765, 'token': 5294, 'token_str': 'massive', 'sequence': 'the milky way is a massive galaxy.'}
{'score': 0.014514694921672344, 'token': 2312, 'token_str': 'large', 'sequence': 'the milky way is a large galaxy.'}
{'score': 0.012708946131169796, 'token': 6802, 'token_str': 'distant', 'sequence': 'the milky way is a distant galaxy.'}
{'score': 0.009919708594679832, 'token': 9233, 'token_str': 'compact', 'sequence': 'the milky way is a compact galaxy.'}


Additional code:  
BERT was not built was text generation. However, HuggingFace's libraries seem to allow using BERT models (but not T5 for some reason).  
I tried to run inference on it (code below) and the results are not pretty.

You can uncomment the code below and run it to test for yourself

In [5]:
'''
# Load the text generation pipeline on the mps device
from transformers import BertLMHeadModel, AutoTokenizer
txtgen_pipeline = pipeline("text-generation", 
                       model= BertLMHeadModel.from_pretrained("../../models/huggingface/bert-large-uncased-whole-word-masking_mlm",  is_decoder = True),
                       tokenizer= AutoTokenizer.from_pretrained("../../models/huggingface/bert-large-uncased-whole-word-masking_mlm"),
                       device = "mps")

# Define the prompt for conditional generation.
prompt = "Once upon a time, there"

# Run the inference pipeline
result = txtgen_pipeline(prompt)

print(result[0]['generated_text'])
'''

'\n# Load the text generation pipeline on the mps device\nfrom transformers import BertLMHeadModel, AutoTokenizer\ntxtgen_pipeline = pipeline("text-generation", \n                       model= BertLMHeadModel.from_pretrained("../../models/huggingface/bert-large-uncased-whole-word-masking_mlm",  is_decoder = True),\n                       tokenizer= AutoTokenizer.from_pretrained("../../models/huggingface/bert-large-uncased-whole-word-masking_mlm"),\n                       device = "mps")\n\n# Define the prompt for conditional generation.\nprompt = "Once upon a time, there"\n\n# Run the inference pipeline\nresult = txtgen_pipeline(prompt)\n\nprint(result[0][\'generated_text\'])\n'

Note: You won't need to run this for BERT if you've already ran the `download-models.ipynb` notebook. In case you need to download the model again, run this code block only once (by uncommenting it) to download the files. The download size is ~1.3GB for `bert-large`. Comment out or remove this code block afterward.

In [6]:
'''
from transformers import BertForQuestionAnswering, AutoTokenizer, BertForMaskedLM

# Define the directory to save the model and tokenizer
qa_model = "bert-large-uncased-whole-word-masking-finetuned-squad"
mlm_model = "bert-large-uncased-whole-word-masking"
qa_directory = "../../models/huggingface/" + qa_model + "_qa"
mlm_directory = "../../models/huggingface/" + mlm_model + "_mlm"

# Download/save the tokenizer for the BertForQuestionAnswering model
tokenizer = AutoTokenizer.from_pretrained(qa_model)
tokenizer.save_pretrained(qa_directory)

# Download/save the BertForQuestionAnswering model
model = BertForQuestionAnswering.from_pretrained(qa_model)
model.save_pretrained(qa_directory)

# Download/save the tokenizer for the BertForMaskedLM model
tokenizer = AutoTokenizer.from_pretrained(mlm_model)
tokenizer.save_pretrained(mlm_directory)

# Download/save the BertForMaskedLM model
model = BertForMaskedLM.from_pretrained(mlm_model)
model.save_pretrained(mlm_directory)
'''

'\nfrom transformers import BertForQuestionAnswering, AutoTokenizer, BertForMaskedLM\n\n# Define the directory to save the model and tokenizer\nqa_model = "bert-large-uncased-whole-word-masking-finetuned-squad"\nmlm_model = "bert-large-uncased-whole-word-masking"\nqa_directory = "../../models/huggingface/" + qa_model + "_qa"\nmlm_directory = "../../models/huggingface/" + mlm_model + "_mlm"\n\n# Download/save the tokenizer for the BertForQuestionAnswering model\ntokenizer = AutoTokenizer.from_pretrained(qa_model)\ntokenizer.save_pretrained(qa_directory)\n\n# Download/save the BertForQuestionAnswering model\nmodel = BertForQuestionAnswering.from_pretrained(qa_model)\nmodel.save_pretrained(qa_directory)\n\n# Download/save the tokenizer for the BertForMaskedLM model\ntokenizer = AutoTokenizer.from_pretrained(mlm_model)\ntokenizer.save_pretrained(mlm_directory)\n\n# Download/save the BertForMaskedLM model\nmodel = BertForMaskedLM.from_pretrained(mlm_model)\nmodel.save_pretrained(mlm_directo