# ðŸ¤– BERT-Based Question Answering with Transformers

This notebook demonstrates the application of a **pre-trained BERT model**
for **extractive question answering** using the Transformers library.

A BERT Large model fine-tuned on the **SQuAD dataset** is used to:
- Encode a question and its supporting context  
- Predict the start and end positions of an answer span  
- Extract the answer directly from the source text  
- Visualize model confidence across input tokens  

This work highlights practical understanding of Transformer architectures,
tokenization mechanics, and model interpretability.


In [None]:
from transformers import BertForQuestionAnswering, BertTokenizer
import torch

---

## Model Background

BERT (Bidirectional Encoder Representations from Transformers) is a Transformer-based
language model trained on large-scale text corpora, including:

- BooksCorpus (~11,000 books)  
- Wikipedia articles  

### Key characteristics
- Bidirectional contextual understanding  
- Deep semantic representations  
- Large-scale parameterization  
  - **BERT Base**: ~110M parameters  
  - **BERT Large**: ~340M parameters


In [None]:
# Load pre-trained BERT model fine-tuned on SQuAD
model_name = 'bert-large-uncased-whole-word-masking-finetuned-squad'

In [None]:
model =  BertForQuestionAnswering.from_pretrained(model_name)   #loading the pre-trained model

In [None]:
tokenizer = BertTokenizer.from_pretrained(model_name)   #loading the pre-trained tokenizer

---

## Questionâ€“Context Encoding

The model receives:
- A natural language question  
- A supporting context passage containing the answer  

Both inputs are encoded together into a single sequence using
model-specific tokenization and special boundary tokens.


In [None]:
#example question and text containing the answer
question = "When was the first dvd released?"


In [None]:
answer_document = "The first DVD (Digital Versatile Disc) was released on March 24, 1997. It was a movie titled 'Twister' and was released in Japan. DVDs quickly gained popularity as a replacement for VHS tapes and became a common format for storing and distributing digital video and data."

In [None]:
encoding = bert_tokenizer.encode_plus(text= question, text_pair=answer_document)  #tokenizing the question and answer text pairs 

In [None]:
print(encoding)

In [None]:
inputs = encoding['input_ids']  #getting the input ids from the encoding
sentence_embeddings = encoding['token_type_ids']   #getting the segment ids from the encoding
tokens =  tokenizer.convert_ids_to_tokens(inputs)   #converting the input ids to tokens

---

## Model Inference and Answer Extraction

The question answering head predicts:
- **Start logits** indicating where the answer begins  
- **End logits** indicating where the answer ends  

The most likely token span is selected as the final answer.


In [None]:
tokenizer.decode(101)

In [None]:
tokenizer.decode(102)

In [None]:
output =  model(input_ids = torch.tensor([inputs]), token_type_ids= torch.tensor([sentence_embeddings]) )  #getting the model output 

In [None]:
start_index = torch.argmax(output.start_logits)
end_index = torch.argmax(output.end_logits)

print(start_index)
print(end_index)

In [None]:
answer = ' '.join(tokens[start_index: end_index+1])  #getting the answer from the tokens
print(answer)

---

## Token-Level Confidence Visualization

Start and end logits are visualized to interpret how confidently
the model selects the answer span across input tokens.


In [None]:
import matplotlib as plt 
import seaborn as sns


In [None]:
s_scores = output.start_logits.detach().numpy().flatten()
e_scores = output.end_logits.detach().numpy().flatten()

In [None]:
tokens_labels = []   #we want token labels as a list of strings with token and its index
for (i, token) in enumerate(tokens):
    tokens_labels.append('{:} - {:>2}'.format(token, i))

In [None]:
ax =  sns.barplot(x=tokens_labels, y=s_scores) 
ax.set_xticklabels(ax.get_xticklabels(), rotation=90, ha="center")
ax.grid(True)

In [None]:
ax =  sns.barplot(x=tokens_labels, y=e_scores)
ax.set_xticklabels(ax.get_xticklabels(), rotation=90, ha="center")
ax.grid(True)

---

## Summary

This notebook demonstrates the practical application of BERT for
**extractive question answering**, including:

- Transformer-based tokenization  
- Questionâ€“context encoding  
- Span-based answer prediction  
- Token-level confidence visualization  

The implementation reflects an applied understanding of modern
NLP architectures and their real-world usage.
