# Question Answering
Build a question answering system. In particular, you should do the followings:
- Download an annotated question answering dataset, such as [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/).
- Load and fine-tune a pre-trained question answering model using a library, such as `Simple Transformers`.
- Evaluate your question answering model.
- Run it on some example questions and check the answers. 

## Importing Modules

In [1]:
import os
import json
import random
import sklearn.model_selection
import simpletransformers.question_answering

2021-12-02 11:06:44.486508: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-12-02 11:06:44.486535: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


## Downloading an Loading the Dataset
We load the SQuAD dataset. As we can see below, each entry of the dataset consists of the followings:
- `context`: The paragraph or text from which the question is asked.
- `qas`: A list of questions and answers.

In [2]:
# !wget -O train-v2.0.json https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json
data = json.load(open("train-v2.0.json", "r"))
data = [item for topic in data["data"] for item in topic["paragraphs"]]
data = random.sample(data, 100)
train_data, validation_data = sklearn.model_selection.train_test_split(data)
data[0]

{'qas': [{'question': 'What sorts of departments might one see in a major department store?',
   'id': '5726494f708984140094c16f',
   'answers': [{'text': 'clothing, furniture, home appliances, toys, cosmetics, gardening, toiletries, sporting goods',
     'answer_start': 63}],
   'is_impossible': False},
  {'question': "Who started influencing department stores in the 1970's? ",
   'id': '5726494f708984140094c170',
   'answers': [{'text': 'discounters', 'answer_start': 608}],
   'is_impossible': False},
  {'question': 'What has begun pressuring department stores in more recent years?',
   'id': '5726494f708984140094c171',
   'answers': [{'text': 'online stores such as Amazon.', 'answer_start': 681}],
   'is_impossible': False},
  {'question': 'Other than at the check-out lanes at the front of a store, where may customers check out?',
   'id': '5726494f708984140094c172',
   'answers': [{'text': 'at sales counters within each department.',
     'answer_start': 425}],
   'is_impossible': 

## Fine-Tuning the Pre-Trained Model

In [3]:
train_args = {
    "learning_ratevalidation_data": 5,
    "num_train_epochs": 1,
    "max_seq_length": 384,
    "doc_stride": 128,
    "overwrite_output_dir": True,
    "reprocess_input_data": False,
    "train_batch_size": 2,
    "n_best_size": 5,
    "gradient_accumulation_steps": 8,   
    "save_model_every_epoch": False, 
    "early_stopping_consider_epochs": True,
    "use_early_stopping": True,   
}
model = simpletransformers.question_answering.QuestionAnsweringModel("roberta", "roberta-base", args=train_args, use_cuda=False)
model.train_model(train_data)

Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForQuestionAnswering: ['lm_head.bias', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForQuestionAnswering were not initialized from the model checkpoint at roberta-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use 

Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Running Epoch 0 of 1:   0%|          | 0/266 [00:00<?, ?it/s]

(33, 4.573533609509468)

## Evaluating the Fine-Tuned Model
We can now evaluate the trained model on a validation set. The following metrics will be calculated by default:
- `correct`: Number of predicted answers matching the true answer exactly.
- `similar`: Number of predicted answers that are a substring of the true answer or vice versa.
- `incorrect`: Number of predicted answers that does not meet the criteria for correct or similar.
- `eval_loss`: Cross Entropy Loss for eval_data.

In [4]:
result, texts = model.eval_model(validation_data)
result

convert squad examples to features: 100%|████████████████████████████████████████████████████████████████████████████████| 184/184 [00:01<00:00, 174.17it/s]
add example index and unique id: 100%|████████████████████████████████████████████████████████████████████████████████| 184/184 [00:00<00:00, 464351.35it/s]


Running Evaluation:   0%|          | 0/23 [00:00<?, ?it/s]

{'correct': 75,
 'similar': 109,
 'incorrect': 0,
 'eval_loss': -3.8359746725662895}

## Predicting Answers of Questions

In [9]:
new_questions = [
    {
        "context": """With French, Belgian and Portuguese activity in the lower Congo River region undermining orderly incursion of tropical Africa, the Berlin Conference of 1884–85 was held to regulate the competition between the European powers in what was called the "Scramble for Africa" by defining "effective occupation" as the criterion for international recognition of territorial claims. The scramble continued into the 1890s, and caused Britain to reconsider its decision in 1885 to withdraw from Sudan. A joint force of British and Egyptian troops defeated the Mahdist Army in 1896, and rebuffed a French attempted invasion at Fashoda in 1898. Sudan was nominally made an Anglo-Egyptian Condominium, but a British colony in reality.""",
        "qas": [
            {
                "question": "Where did France try to invade in 1898?",
                "id": "0",
            }
        ],
    }
]

predictions, raw_outputs = model.predict(new_questions, n_best_size=1)
predictions

convert squad examples to features: 100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 52.08it/s]
add example index and unique id: 100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 15477.14it/s]


Running Prediction:   0%|          | 0/1 [00:00<?, ?it/s]

[{'id': '0', 'answer': ['empty']}]