## Import the models

In [12]:
from luqa import DocumentReader, DocumentRetriever, LookupQA

### Document Reader: BERT and ALBERT

In [3]:
reader_bert = DocumentReader(pretrained_path='./reader/bert-base-uncased/', model_out='reader/bert-base-uncased')
reader_albert = DocumentReader(pretrained_path='./reader/albert-base-v2/', model_out='reader/albert-base-v2')
reader = {'bert-base-uncased': reader_bert, 'albert-base-v2': reader_albert}

### Document Retriever: Wikipedia API 
We examine 2 retriever models, given the question and the API searches for articles that are related to the question:
1. Model that takes 1 first full wikipedia article.
2. Model that takes summary of the first article.

Notice that number of wikipedia pages retrieved can be modified by changing the variable n_pages in the DocumentRetriever() model, while setting the boolean variable to True or False will result in retrieving the full page or just the summary of the article.

In [13]:
retriever_1_full = DocumentRetriever(n_pages=1, summary=False)
retriever_1_summary = DocumentRetriever(n_pages=1, summary=True)
retriever = {'1_full' : retriever_1_full, 
             '1_summary' : retriever_1_summary}

#### sample question:

In [6]:
question = 'What discipline did Winkelmann create?'

### Full end-to-end model: LookupQA
We want to examine the full model on a specific question, given BERT and ALBERT as the Document Reader, while the Document Retriever takes context using Wikipedia API

In [14]:
luqa_bert_1_full = LookupQA(retriever=retriever['1_full'], reader=reader['bert-base-uncased'])
luqa_bert_1_full.get_answer(question=question)

'art history'

In [15]:
luqa_albert_1_full = LookupQA(retriever=retriever['1_full'], reader=reader['albert-base-v2'])
luqa_albert_1_full.get_answer(question=question)

'archaeology.'

In [16]:
luqa_bert_1_summary = LookupQA(retriever=retriever['1_summary'], reader=reader['bert-base-uncased'])
luqa_bert_1_summary.get_answer(question=question)

'art history'

In [17]:
luqa_albert_1_summary = LookupQA(retriever=retriever['1_summary'], reader=reader['albert-base-v2'])
luqa_albert_1_summary.get_answer(question=question)

'scientific archaeology'

### Analysis
- BERT models:

The models with BERT as the Document Reader both gave out 'art history' as the answer for the question, which is not the same as the given answer 'scientific archaeology' in the SQuAD 2.0 dataset. However, by examining the question and the Wikipedia article about Winkelmann, we observed that the answer 'art history' is not necessarily wrong. We suspect this behaviour occurred because the model weighted 'art history' more than 'scientific archaeology' in this particular context.

- ALBERT models:

The 1 full page model using ALBERT gave 'archaeology' as the answer, which is also partially correct. However, the 1 summary page model correctly output the answer 'scientific archaeology'. We suspect that by giving a more narrowed and precised context, the model will output a more accurate answer. 