# Open Domain Question Answering based on Explain Anything Like I'm Five Blog Post

This notebook is a step by step implementation of the open domain question answering system
described in the blog post https://yjernite.github.io/lfqa.html. You should read that first as there is a lot 
of good detail there on the how and whys (particuarly the model training) which I won't be replicating here. 

The purpose of this notebook, is to work through all the details of setting up the QA system assuming we already have pre-trained models. There are a lot of details and I wanted to go through the exercise and understand each
step. 

No training of models is done. I used the pre-trained models available on the Huggingface Hub.  The code in this noteook is largely a refactoring and commenting of the original code that you can find here: https://github.com/huggingface/notebooks/blob/master/longform-qa/lfqa_utils.py.

# Preliminaries

In [18]:
# Do some imports. There are not many.

import time
import numpy as np 
import pandas as pd

from datasets import load_dataset # The blog used the nlp library, this has been superceeded by datasets

import torch as torch
from transformers import AutoTokenizer, AutoModel, AutoModelForSeq2SeqLM

import faiss # For doing fast similarity search: https://github.com/facebookresearch/faiss

# add the src path
import sys, os
sys.path.append("../src") 

### Import the dataset we will need

To set up the QA system we actually only need the wiki_snippets dataset. This is what we use for making the dense index.  It's basically wikipedia cut up into 100 word chunks.  

The eli5 dataset is actually only used for model training. It is however pretty interesting and worth examining. I won't do it here though.

In [3]:
# the wiki_snippets dataset only has a training data partition 
# It was made by dicing up the wiki40b dataset

# load the dataset and grab the training partition
wiki40b_snippets = load_dataset("wiki_snippets", name = "wiki40b_en_100_0")["train"]

Reusing dataset wiki_snippets (/home/rob/.cache/huggingface/datasets/wiki_snippets/wiki40b_en_100_0/1.0.0/d152a0e6a420c02b9b26e7f75f45fb54c818cae1d83e8f164f0b1a13ac7998ae)


In [4]:
# Check the number of rows/elements in this dataset
print("number of rows:", wiki40b_snippets.num_rows)
print()
# And take a look at an element
wiki40b_snippets[123456]

number of rows: 17553713



{'_id': '{"datasets_id": 22758, "wiki_id": "Q1199856", "sp": 6, "sc": 1727, "ep": 6, "ec": 2363}',
 'article_title': 'The Tears',
 'datasets_id': 22758,
 'end_character': 2363,
 'end_paragraph': 6,
 'passage_text': 'reinvigorated by each other\'s company. Anderson talks excitedly of Tears songs like the ballad Asylum, inspired by his father\'s struggle with depression, as having moved away from ‘Suede cliches or Brett Anderson cliches ... it\'s not, you know, opiated fop territory.’"\nFrom the start, Anderson was insistent that the band would not be playing any songs by Suede. Things would change over time, however, as the band ended up playing the B-side, "The Living Dead", to an enthusiastic reception, during an encore for their show at the Sheffield Leadmill in April. In April 2005, the band\'s first single, "Refugees", was released.',
 'section_title': 'History',
 'start_character': 1727,
 'start_paragraph': 6,
 'wiki_id': 'Q1199856'}

There's a lot of info in here, but we will only be using the 'passage_text.'

# Get the Retribert Model and Tokenizer to make the passage text embeddings

This QA system works by taking a question, embedding it using a BERT type model, and then comparing that embedding with a large set of previously calclulated embeddings over all passages in the wiki_snippets dataset. The results of this similarity search are used as a context for a BART type sequence to sequence model that generates the answer. Note that the blog also discusses sparse retrival using elastic search ... I'm not going to do that. I'm only going to make the dense index. 

The dense embedings are made using a so called "retriebert" model which is a small BERT model that has two different embedding heads. One head is for embedding questions, the other is for embedding passages that contain context for answers. Otherwise the BERT weights are identical. 

This model is trained on the ELI5 dataset. As explained in the blog, the assumption is made that the ELI5 answers are similar to the wikisnippets passages. Thus, by training the retribert model so that ELI5 questions and answers are close in the embedding space, we can assume that questions will also be close to contextually appropriate passages of wikisnippets.  

Also note that the embedding space is only 128 dimensions, which is relatively small.

In [5]:
# first check that the GPU is working. I always like to do this in case some update has messed up my PyTorch
if torch.cuda.is_available():
    default_device = "cuda:0"
    print("GPU available. default_device set to cuda:0")
else:
    default_device = 'cpu'
    print("GPU not available. default_device set to cpu")

GPU available. default_device set to cuda:0


In [6]:
# now download the retriebert tokenizer from the huggingface hub
qar_tokenizer = AutoTokenizer.from_pretrained('yjernite/retribert-base-uncased')

# and the retriebert model
qar_model = AutoModel.from_pretrained('yjernite/retribert-base-uncased')

Some weights of RetriBertModel were not initialized from the model checkpoint at yjernite/retribert-base-uncased and are newly initialized: ['bert_query.embeddings.position_ids']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [7]:
# it's interesting to look at the model config. This is a pretty small embedding model ... 
# which makes it really neat that the embedding works so well!

print(qar_model.config)

RetriBertConfig {
  "_name_or_path": "yjernite/retribert-base-uncased",
  "architectures": [
    "RetriBertModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "retribert",
  "num_attention_heads": 12,
  "num_hidden_layers": 8,
  "pad_token_id": 0,
  "projection_dim": 128,
  "share_encoders": true,
  "transformers_version": "4.4.2",
  "type_vocab_size": 2,
  "vocab_size": 30522
}



# Play with the embedding model a bit to understand how to use it.

I played a bit with embedding passages and questions and comparing the resulting embedding vectors directly. 
To figure out how to do this, I looked in the model code (specifically the RetrievalQAEmbedder class in the lfqa_utils.py file). In addition to the forward method you'll also see two methods: embed_questions and embed_answers. These allow you to do just that for questions and answers independently. The relevant bits 
of these methods only differ in the embedding head used. (Note that the forward method embeds both questions and answers and only returns the loss for training. So you don't want to use that for embedding the knowledge base.)  

Another thing to note is that the Huggingface tokenizer class has changed a bit in how it can be called. So some of the code in the blog (explicitly changing to longTensors, using batch_encode_plus etc is no longer needed. You can just call the tokenizer directly..

In [8]:
# so let's grab some random passages and make up some questions based on them to see if the embeddings work
idxs = [0,5000,10000,100000,1000000]
test_passages = [p for p in wiki40b_snippets[idxs]["passage_text"]]
df_passages = pd.DataFrame(test_passages)
df_passages.style.set_properties(**{'text-align': 'left'})

Unnamed: 0,0
0,"Ági Szalóki Life She started singing as a toddler, considering Márta Sebestyén a role model. Her musical background is traditional folk music; she first won recognition for singing with Ökrös in a traditional folk style, and Besh o droM, a Balkan gypsy brass band. With these ensembles she toured around the world from the Montreal Jazz Festival, through Glastonbury Festival to the Théatre de la Ville in Paris, from New York to Beijing. Since 2005, she began to pursue her solo career and explore various genres, such as jazz, thirties ballads, or children's songs. Until now, three of her six released albums"
1,"C English In English orthography, ⟨c⟩ generally represents the ""soft"" value of /s/ before the letters ⟨e⟩ (including the Latin-derived digraphs ⟨ae⟩ and ⟨oe⟩, or the corresponding ligatures ⟨æ⟩ and ⟨œ⟩), ⟨i⟩, and ⟨y⟩, and a ""hard"" value of /k/ before any other letters or at the end of a word. However, there are a number of exceptions in English: ""soccer"" and ""Celt"" are words that have /k/ where /s/ would be expected. The ""soft"" ⟨c⟩ may represent the /ʃ/ sound in the digraph ⟨ci⟩ when this precedes a vowel, as in the words 'delicious' and 'appreciate', and also in the"
2,"17th century writers. Later 19th century mumming Old Father Christmas continued to make his annual appearance in Christmas folk plays throughout the 19th century, his appearance varying considerably according to local custom. Sometimes, as in Hervey's book of 1836, he was portrayed (below left) as a hunchback. One unusual portrayal (below centre) was described several times by William Sandys between 1830 and 1852, all in essentially the same terms: ""Father Christmas is represented as a grotesque old man, with a large mask and comic wig, and a huge club in his hand."" This representation is considered by the folklore scholar Peter"
3,"evidently been employed; I will only tell you, that we passed by winding paths, over acres and acres, with a constant varying surface, where on all sides were growing every variety of shrubs and flowers, with more than natural grace, all set in borders of greenest, closest turf, and all kept with consummate neatness. Birkenhead Park was used as a template for the creation of Sefton Park, which opened in Liverpool in 1872. Points of interest The Grand Entrance is at the northeast entrance to Birkenhead Park. It consists of three arches flanked by lodges and is in Ionic style."
4,"one way or another. The teams Each team in the NCBL recruits its own players, and raises its own funds however they wish to. Some teams have sponsors while other teams don't. 2006 Tier I Championship Game 3 Bottom of the 7th, the Nepean Brewers complete their sweep of Marc Sports. 2006 NCBL Tier I Championship Game 3 on YouTube"


In [9]:
# Now make up some questions based on each of these
test_questions = ["Who  did Ági Szalóki tour with?", 
             "Does a digraph precede a vowel?", 
             "where did Father Christmas make his appearance?",
            "What was used for the creation of Sefton park?",
            "How are players recruited?"]

df_questions = pd.DataFrame(test_questions)
df_questions.style.set_properties(**{'text-align': 'left'})

Unnamed: 0,0
0,Who did Ági Szalóki tour with?
1,Does a digraph precede a vowel?
2,where did Father Christmas make his appearance?
3,What was used for the creation of Sefton park?
4,How are players recruited?


In [10]:
# Next tokenize the questions and the passages. Note that Huggingface tokenizers assume you are
# passing in a list of strings to be tokenized. So we can pass in all the passages and questions
# at the same time.
#
# Also note that some of the syntax regarding padding has changed since the blog was written. 

tokenized_test_questions = qar_tokenizer(test_questions,
                                   max_length=128, 
                                   padding="max_length", 
                                   truncation = True, 
                                   return_tensors='pt')

tokenized_test_passages = qar_tokenizer(test_passages,
                                   max_length=128, 
                                   padding="max_length", 
                                   truncation = True, 
                                   return_tensors='pt')

In [11]:
# Now embed the passage and questions 

# First put the embedding model on the GPU and set it to eval mode
# Note the GPU isn't really necessary for a couple passages ... it is necessary later on when we are making
# the entire index
qar_model.to(default_device).eval()

# Next embed the questions. Use torch.no_grad() so we don't calculate gradients and have to detach them
with torch.no_grad():
    embedded_test_questions = qar_model.embed_questions(tokenized_test_questions["input_ids"].to(default_device), 
                                                      tokenized_test_questions["attention_mask"].to(default_device))

# put back on the cpu and convert to numpy. Note we don't have to detach gradients because we used .no_grad()
embedded_test_questions = embedded_test_questions.cpu().numpy()

# and the passages ... note the different method used
with torch.no_grad():
    embedded_test_passages = qar_model.embed_answers(tokenized_test_passages["input_ids"].to(default_device), 
                                                      tokenized_test_passages["attention_mask"].to(default_device))

embedded_test_passages = embedded_test_passages.cpu().numpy()

# and print the shape of the resultant numpy arrays
print("Size of embedded questions", np.shape(embedded_test_questions))
print("Size of embedded passages", np.shape(embedded_test_passages))


Size of embedded questions (5, 128)
Size of embedded passages (5, 128)


In [12]:
# And finally, let's find the dot product of these two matricies. This *should* be largest on the diagonal

test_scores = np.dot(embedded_test_questions,embedded_test_passages.T)
print(test_scores)

[[21.93355    7.2254443 10.973549  10.100238  13.696762 ]
 [ 2.0702991 17.253456   4.4251814  2.614605  -3.751867 ]
 [10.714686   0.5131312 24.815298   9.250219  -2.1042044]
 [ 6.839991   3.0273747  5.2295    15.840267   9.030561 ]
 [ 5.577931   8.142668  -1.9331653  1.7066447 18.850311 ]]


OK that seems to work. The diagonal has the largest elements in each row (or column, it's symmetrical). 

# Embed the entire Wikipedia data set


To make the embeddings for the entire knowledge base we essentially repeat the above steps but looping over the entire dataset and save the results. Code for doing this is in the src/qa_utils.py file, specifically the embed_passages and embed_passage_batch functions. 

You'll probably want to run this overnight. **It took me about 14 hours on a Nvidia 2070 Super GPU.** You can do a shorter test run by setting the n_batches argument of embed_passages to a small integer and then figure out how long the full run will take with some math. 

Also note that the entire embedded dataset takes up a little over 8GB of memory. This is actually relatively small because retriebert only uses an 128 dimension embedding. For reference, DPR/RAG uses BERT base which has a 768 dimensional embedding. The knowledge base takes around 70GB to store.

In [19]:
# NOTE: MAKE SURE YOU WANT TO ACTUALLY RUN THIS BECAUSE IF YOU AREN'T CAREFUL YOU MAY OVERWRITE YOUR PREVIOUSLY
# CALCULATED RESULTS!!!!!

from qa_utils import embed_passages # the code is fully commented

test_filename = "my_wiki_embeddings.dat" # change this to the full file path where you want to store the index

#Note, if the file exists, running embed passages will overwrite it. Be careful and don't do that!
if not os.path.isfile(test_filename): # here as a safety measure
    print("starting to embed")
    embed_passages(wiki40b_snippets, 
                   qar_tokenizer, 
                   qar_model, 
                   n_batches = 2, # remove this line if you want to embed the entire dataset
                   index_file_name = test_filename)
else:
    print("file already exists are you sure you want to overwrite and re-embed?")

file already exists are you sure you want to overwrite and re-embed?


# Create a Faiss Index

Faiss is a library out of Facebook that allows for super efficient vector similarity search on the scale of searching through billions of vectors. It does both exact search as well as various approximate searches that are quicker and applicable to extremely large sets of vectors. The tutorials  https://github.com/facebookresearch/faiss/wiki/Getting-started are a pretty quick and easy read and there are a bunch of linked papers.

Search algorithms use the concept of an "index" which as far as I can tell is a fancy lookup table. They are easy to make with faiss. Here we just use the simplest index which does exact search. It's still pretty fast, even on CPU!  That's good because I only have an 8GB GPU so I can't fit the index there.

In [20]:
# CHANGE THIS TO THE FILE THAT YOU STORED ALL THE EMBEDDINGS IN!
embedding_filename =  "../kb_index/wiki_index_embed.dat"

# make a numpy memmap of the embeddings in *read* mode. memmap is just a way of storing large numpy
# arrays on disk and accessing them as if they were in RAM
# you need to pass the filename, datatype, mode and the shape of the stored embeddings
wiki40b_passage_embedding = np.memmap(embedding_filename,
                                      dtype='float32', 
                                      mode='r',
                                      shape=(wiki40b_snippets.num_rows, 128)
                                     )

# Create the index on the CPU
wiki40b_index_flat = faiss.IndexFlatIP(128)
wiki40b_index_flat.add(wiki40b_passage_embedding)

# OR create the index on the GPU (If you have a enough VRAM 10GB or above probably)
# faiss_res = faiss.StandardGpuResources()
# wiki40b_index_flat = faiss.IndexFlatIP(128)
# wiki40b_gpu_index = faiss.index_cpu_to_gpu(faiss_res, 0, wiki40b_index_flat)
# wiki40b_gpu_index.add(wiki40b_passage_embedding[0:10000,:])

Now we want to query the index, here's a function to do that

In [21]:
def query_index(question, embedding_model, tokenizer, wiki_dataset, kb_index, 
                n_results=10, max_length=128, min_passage_length=20, device="cpu"):
    
    """ This is a refactoring of blog function 'query_qa_dense_index'"""
    
    embedding_model.to(device)
    
    # embed the question
    tokenized_question = tokenizer([question],
                                   max_length=max_length, 
                                   padding="max_length", 
                                   truncation = True, 
                                   return_tensors='pt')
    
    
    with torch.no_grad():
        embedded_question = embedding_model.embed_questions(tokenized_question["input_ids"].to(device), 
                                                      tokenized_question["attention_mask"].to(device))
    
    # now put on the cpu as numpy so we can do faiss. default, it should be on the cpu already    
    embedded_question = embedded_question.cpu().numpy()
    
    # now query the index, using faiss. getting more than we need to make sure the text is long enough
    D, I = kb_index.search(embedded_question, 2* n_results)
   
    # get the results of the query
    all_wikidata = [wiki_dataset[int(k)] for k in I[0]]
    all_passages = "<P> " + " <P> ".join([p["passage_text"] for p in all_wikidata])
    
    # this is just to make a dictionary we can look at in pandas
    res_list = [dict([(k, p[k]) for k in wiki_dataset.column_names]) for p in all_wikidata]
    res_list = [res for res in res_list if len(res["passage_text"].split()) > min_passage_length][:n_results]
    # add the faiss score
    for r, sc in zip(res_list, D[0]):
        r["score"] = float(sc)
    
    
    return all_passages, res_list


In [22]:
# and we can ask a question and return some contextual passages

test_question = "How did the second world war end?"

all_passages, res_list = query_index(test_question, qar_model, qar_tokenizer, 
                   wiki40b_snippets, wiki40b_index_flat, device='cpu')

df = pd.DataFrame({
    'Article': ['---'] + [res['article_title'] for res in res_list],
    'Sections': ['---'] + [res['section_title'] if res['section_title'].strip() != '' else res['article_title']
                 for res in res_list],
    'Text': ['--- ' + test_question] + [res['passage_text'] for res in res_list],
})
df.style.set_properties(**{'text-align': 'left'})

Unnamed: 0,Article,Sections,Text
0,---,---,--- How did the second world war end?
1,Europe,20th century to the present,"was to encircle Germany and cut the Germans off from Scandinavian resources. Around the same time, Germany moved troops into Denmark. The Phoney War continued. In May 1940, Germany attacked France through the Low Countries. France capitulated in June 1940. By August Germany began a bombing offensive on Britain, but failed to convince the Britons to give up. In 1941, Germany invaded the Soviet Union in Operation Barbarossa. On 7 December 1941 Japan's attack on Pearl Harbor drew the United States into the conflict as allies of the British Empire and other allied forces. After the staggering Battle of Stalingrad in 1943,"
2,Military strategy,European Allies,"invasion of French North-Africa), Sicily and southern Italy were invaded, leading to the defeat of Fascist Italy. Churchill especially favoured a Southern strategy, aiming to attack the ""soft underbelly"" of Axis Europe through Italy, Greece and the Balkans in a strategy similar to the First World War idea of ""knocking out the supports"". Roosevelt favoured a more direct approach through northern Europe, and with the Invasion of Normandy in June 1944, the weight of Allied effort shifted to the direct conquest of Germany. From 1944, as German defeat became more and more inevitable, the shape of post-war Europe assumed greater"
3,Battle of Arras (1917),Home fronts,"the war. Hundreds of thousands of casualties had been suffered at the battles of Gallipoli, the Somme and Verdun, with little prospect of victory in sight. The British Prime Minister, H. H. Asquith, resigned in early December 1916 and was succeeded by David Lloyd George. In France, Prime Minister Aristide Briand, along with Minister of Defence Hubert Lyautey were politically diminished and resigned in March 1917, following disagreements over the prospective Nivelle Offensive. The United States was close to declaring war on Germany; American public opinion was growing increasingly incensed by U-boat attacks upon civilian shipping, which had begun with"
4,Germany–United Kingdom relations,World War II & Occupation,"Germany, but the United States greatly funded and supplied the British. In December 1941, United States entered the war against Germany and Japan after the attack on Pearl Harbor by Japan, which also later overwhelmed British outposts in the Pacific from Hong Kong to Singapore. The Allied invasion of France on D-Day in June 1944 as well as strategic bombing and land forces all contributed to the final defeat of Germany. Occupation As part of the Yalta and Potsdam agreements, Britain took control of its own sector in occupied Germany. It soon merged its sector with the American and French sectors,"
5,Italian invasion of France,Italian imperial ambitions & Battle of France,"impression of weakness"". Germany supplied Italy with about one million tons of coal a month beginning in the spring of 1940, an amount that even exceeded Mussolini's demand of August 1939 that Italy receive six million tons of coal for its first twelve months of war. Battle of France On 1 September 1939, Germany invaded Poland. Following a month of war, Poland was defeated. A period of inaction, called the Phoney War, then followed between the Allies and Germany. On 10 May 1940, this inactivity ended as Germany began Fall Gelb (Case Yellow) against France and the neutral nations of"
6,Military history of Germany,First World War (1914–18) & Weimar Republic and the Third Reich (1918–39),"the British tank attack at the Battle of Cambrai. In March 1918 the German army Spring Offensive began an impressive advance creating a salient in the allied line. The offensive stalled as the British and French fell back and then counterattacked. The Germans did not have the airpower or tanks to secure their battlefield gains. The Allies, invigorated by American manpower, money, and food, counterattacked in late summer and rolled over the depleted German lines, as the German navy rebelled and support for the war on the homefront evaporated. Weimar Republic and the Third Reich (1918–39) The treaty of Versailles imposed"
7,Battle of France,Occupation,"docked at ports in Vichy France and North Africa and use them in an invasion of Britain (Operation Sea Lion). Within a month, the Royal Navy attacked the French naval forces stationed in North Africa in the Attack on Mers-el-Kébir. The British Chiefs of Staff Committee had concluded in May 1940 that if France collapsed, ""we do not think we could continue the war with any chance of success"" without ""full economic and financial support"" from the United States. Churchill's desire for American aid led in September to the Destroyers for Bases agreement that began the wartime Anglo-American partnership. The occupation"
8,Spanish Civil War,Francoist repression after the war and Republican exile & International relations,"returned. The third wave occurred after the War, at the end of March 1939, when thousands of Republicans tried to board ships to exile, although few succeeded. International relations The political and emotional repercussions of the War transcended the National scale, becoming a precursor to World War II. The war has frequently been described as the ""prelude to"" or the ""opening round of"" the Second World War, as part of an international battle against fascism. However Stanley Payne suggests this isn't accurate, arguing that the international alliance that was created in December 1941, once the United States entered WW2, was"
9,World War II,Allies gain momentum (1943–44),"named the Italian Social Republic, causing an Italian civil war. The Western Allies fought through several lines until reaching the main German defensive line in mid-November. German operations in the Atlantic also suffered. By May 1943, as Allied counter-measures became increasingly effective, the resulting sizeable German submarine losses forced a temporary halt of the German Atlantic naval campaign. In November 1943, Franklin D. Roosevelt and Winston Churchill met with Chiang Kai-shek in Cairo and then with Joseph Stalin in Tehran. The former conference determined the post-war return of Japanese territory and the military planning for the Burma Campaign, while the latter"


# Now that we have the index, we can ask a question and get a generated response.

The contextual passages are inputs into a BART-large sequence-to-sequence model. Essentially the question and contextual passages are tokenized, concatenated together and then  then truncated to 1024 tokens. This is then passed through the BART model to create the response. The blog post goes through the fine-tuning proceedure.

Here are the steps that you can take to answer a list of questions.

In [23]:
# First load up the BART model 
qa_s2s_tokenizer = AutoTokenizer.from_pretrained('yjernite/bart_eli5')
qa_s2s_model = AutoModelForSeq2SeqLM.from_pretrained('yjernite/bart_eli5')

In [24]:
# put the BERT embedding model and the BART seq-2-seq model where you want them
# I put the embedding model on my cpu (because the index is large) and the bart model on my GPU
# also put them in eval mode

torch.cuda.empty_cache()

qar_device = "cpu"
qa_s2s_device = "cuda"

qar_model.to(qar_device).eval()
qa_s2s_model.to(qa_s2s_device).eval();

In [25]:
# come up with a list of questions:
questions = ["How do birds fly by flapping their wings?",
             "How do fish breath water?",
             "How did the polynesians travel across the ocean?",
            "When did we learn to cook food?",
            "Will be ever be able to live on Mars? Why or why not?"]

In [26]:
# now loop through these and answer them

answers = []

for i in range(len(questions)):
    # create support document with the dense index
    question = questions[i]
    
    doc, res_list = query_index(question, qar_model, qar_tokenizer, 
                   wiki40b_snippets, wiki40b_index_flat, device=qar_device)
    
    # concatenate question and support document into BART input
    question_doc = "question: {} context: {}".format(question, doc)
    
    # concatenate question and support document into BART input
    question_doc = "question: {} context: {}".format(question, doc)

    # tokenize this with the BART tokenizer. Note we are truncating the total inputs at max_input_length
    s2s_inputs = qa_s2s_tokenizer([question_doc],
                                       max_length=1024, 
                                       padding="max_length", 
                                       truncation = True, 
                                       return_tensors='pt').to(qa_s2s_device)

    # now feed this into the BART sequence to sequence model and generate answers
    generated_ids = qa_s2s_model.generate(input_ids=s2s_inputs["input_ids"],
                                              attention_mask=s2s_inputs["attention_mask"],
                                              min_length=32,
                                              max_length=128,
                                              do_sample=False,
                                              early_stopping=True,
                                              num_beams=8,
                                              temperature=1.0,
                                              top_k=None,
                                              top_p=None,
                                              eos_token_id=qa_s2s_tokenizer.eos_token_id,
                                              no_repeat_ngram_size=3,
                                              num_return_sequences=1,
                                              decoder_start_token_id=qa_s2s_tokenizer.bos_token_id)[0]

    # and make the list of answers by decoding the generated ids
    answer = qa_s2s_tokenizer.decode(generated_ids, skip_special_tokens=True).strip()

    answers += [answer]

# And make a pretty dataframe to show the results
df = pd.DataFrame({
    'Question': questions,
    'Answer': answers,
})
df.style.set_properties(**{'text-align': 'left'})

Unnamed: 0,Question,Answer
0,How do birds fly by flapping their wings?,"Flapping their wings is just a way for the bird to generate lift. It's the same way you can fly by throwing a ball at a wall. The bird flaps its wings to create lift, and then uses that lift to fly."
1,How do fish breath water?,"Fish don't ""breath"" water. Their gills absorb the oxygen in the water and exhale it as gaseous oxygen. It's the same way we exhale CO2."
2,How did the polynesians travel across the ocean?,"The Polynesians didn't travel across the ocean. They traveled across the Polynesian Triangle, which is a huge stretch of ocean between the islands of Hawaii and New Caledonia."
3,When did we learn to cook food?,"Cooking has been around for as long as humans have been around. It's just a matter of getting it right the first time you do it. If you don't know how to cook, you won't be able to do it the next time."
4,Will be ever be able to live on Mars? Why or why not?,"Yes, we will be able to live on Mars in the future. The problem is that we don't have the technology to do so right now. We don't even know how to get there."


And that's basically it. I found it impressive that one could do so well with such a simple implementation using relatively small models. One can think of all sorts of tricks that could make this better.