# First QA Model

For our first QA model we will setup a simple question-answering pipeline using HuggingFace transformers and a pretrained BERT model. We will be testing it on our SQuAD data so let's load that first.

In [1]:
import json

with open('../../data/squad/dev.json', 'r') as f:
    squad = json.load(f)

As usual, we initialize our transformer tokenizer and model. This time, we will be using a BERT model that has been trained for question-and-answering on the SQuAD dataset. Which is why we will be using the validation dataset (rather than training dataset) from SQuAD.

In [2]:
from transformers import BertTokenizer, BertForQuestionAnswering

modelname = 'deepset/bert-base-cased-squad2'

tokenizer = BertTokenizer.from_pretrained(modelname)
model = BertForQuestionAnswering.from_pretrained(modelname)

Downloading:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/152 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/508 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/433M [00:00<?, ?B/s]

Transformers comes with a useful class called [`pipeline`](https://huggingface.co/transformers/main_classes/pipelines.html) which allows us to setup easy to use pipelines for common architectures.

One of those pipelines is the `question-answering` pipeline which allows us to feed a  dictionary containing a `'question'` and `'context'` and return an answer. Which we initialize like so:

In [3]:
from transformers import pipeline

qa = pipeline('question-answering', model=model, tokenizer=tokenizer)

Now we can begin asking questions, let's take a few examples from our `squad` data.

In [4]:
squad[:2]

[{'question': 'In what country is Normandy located?',
  'answer': 'France',
  'context': 'The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse ("Norman" comes from "Norseman") raiders and pirates from Denmark, Iceland and Norway who, under their leader Rollo, agreed to swear fealty to King Charles III of West Francia. Through generations of assimilation and mixing with the native Frankish and Roman-Gaulish populations, their descendants would gradually merge with the Carolingian-based cultures of West Francia. The distinct cultural and ethnic identity of the Normans emerged initially in the first half of the 10th century, and it continued to evolve over the succeeding centuries.'},
 {'question': 'When were the Normans in Normandy?',
  'answer': '10th and 11th centuries',
  'context': 'The Normans (Norman: Nourmands; French: Normands; Latin: No

In [7]:
# we will intialize a list for answers
answers = []

for pair in squad[:5]:
    # pass in our question and context to return an answer
    ans = qa({
        'question': pair['question'],
        'context': pair['context']
    })
    print(ans)
    # append predicted answer and real to answers list
    answers.append({
        'predicted': ans['answer'],
        'true': pair['answer']
    })

{'score': 0.9995271563529968, 'start': 159, 'end': 166, 'answer': 'France.'}
{'score': 0.7168666124343872, 'start': 94, 'end': 117, 'answer': '10th and 11th centuries'}
{'score': 0.7168666124343872, 'start': 94, 'end': 117, 'answer': '10th and 11th centuries'}
{'score': 0.08210879564285278, 'start': 256, 'end': 283, 'answer': 'Denmark, Iceland and Norway'}
{'score': 0.5019515156745911, 'start': 308, 'end': 314, 'answer': 'Rollo,'}


In [6]:
answers

[{'predicted': 'France.', 'true': 'France'},
 {'predicted': '10th and 11th centuries', 'true': '10th and 11th centuries'},
 {'predicted': '10th and 11th centuries',
  'true': 'in the 10th and 11th centuries'},
 {'predicted': 'Denmark, Iceland and Norway',
  'true': 'Denmark, Iceland and Norway'},
 {'predicted': 'Rollo,', 'true': 'Rollo'}]

So we can see that we're getting almost exact matches. Next, we'll take a look at how we can begin quantifying these results.