Question Answering can be used in a variety of use cases. A very common one:  Using it to navigate through complex knowledge bases or long documents ("search setting").

A "knowledge base" could for example be your website, an internal wiki or a collection of financial reports. 
In this tutorial we will work on a slightly different domain: "Game of Thrones".   



In [None]:
# Install the latest release of Haystack in your own environment 
#! pip install farm-haystack

# Install the latest master of Haystack
!pip install git+https://github.com/deepset-ai/haystack.git
!pip install urllib3==1.25.4
!pip install torch==1.6.0+cu101 torchvision==0.6.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html


In [1]:
from haystack import Finder
from haystack.preprocessor.cleaning import clean_wiki_text
from haystack.preprocessor.utils import convert_files_to_dicts, fetch_archive_from_http
from haystack.reader.farm import FARMReader
from haystack.reader.transformers import TransformersReader
from haystack.utils import print_answers

## Document Store

Haystack finds answers to queries within the documents stored in a `DocumentStore`. The current implementations of `DocumentStore` include `ElasticsearchDocumentStore`, `FAISSDocumentStore`,  `SQLDocumentStore`, and `InMemoryDocumentStore`.

**Here:** We recommended Elasticsearch as it comes preloaded with features like [full-text queries](https://www.elastic.co/guide/en/elasticsearch/reference/current/full-text-queries.html), [BM25 retrieval](https://www.elastic.co/elasticon/conf/2016/sf/improved-text-scoring-with-bm25), and [vector storage for text embeddings](https://www.elastic.co/guide/en/elasticsearch/reference/7.6/dense-vector.html).

**Alternatives:** If you are unable to setup an Elasticsearch instance, then follow the [Tutorial 3](https://github.com/deepset-ai/haystack/blob/master/tutorials/Tutorial3_Basic_QA_Pipeline_without_Elasticsearch.ipynb) for using SQL/InMemory document stores.

**Hint**: This tutorial creates a new document store instance with Wikipedia articles on Game of Thrones. However, you can configure Haystack to work with your existing document stores.

### Start an Elasticsearch server
You can start Elasticsearch on your local machine instance using Docker. If Docker is not readily available in your environment (eg., in Colab notebooks), then you can manually download and execute Elasticsearch from source.

In [2]:
# Recommended: Start Elasticsearch using Docker
#! docker run -d -p 9200:9200 -e "discovery.type=single-node" elasticsearch:7.6.2

In [3]:
# In Colab / No Docker environments: Start Elasticsearch from source
'''! wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.2-linux-x86_64.tar.gz -q
! tar -xzf elasticsearch-7.6.2-linux-x86_64.tar.gz
! chown -R daemon:daemon elasticsearch-7.6.2

import os
from subprocess import Popen, PIPE, STDOUT
es_server = Popen(['elasticsearch-7.6.2/bin/elasticsearch'],
                   stdout=PIPE, stderr=STDOUT,
                   preexec_fn=lambda: os.setuid(1)  # as daemon
                  )
# wait until ES has started
! sleep 30'''

"! wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.2-linux-x86_64.tar.gz -q\n! tar -xzf elasticsearch-7.6.2-linux-x86_64.tar.gz\n! chown -R daemon:daemon elasticsearch-7.6.2\n\nimport os\nfrom subprocess import Popen, PIPE, STDOUT\nes_server = Popen(['elasticsearch-7.6.2/bin/elasticsearch'],\n                   stdout=PIPE, stderr=STDOUT,\n                   preexec_fn=lambda: os.setuid(1)  # as daemon\n                  )\n# wait until ES has started\n! sleep 30"

In [4]:
# Connect to Elasticsearch

from haystack.document_store.elasticsearch import ElasticsearchDocumentStore
document_store = ElasticsearchDocumentStore(host="localhost", username="", password="", index="document")

05/11/2021 22:22:44 - INFO - elasticsearch -   PUT http://localhost:9200/document [status:200 request:0.729s]
05/11/2021 22:22:44 - INFO - elasticsearch -   PUT http://localhost:9200/label [status:200 request:0.250s]


## Preprocessing of documents

Haystack provides a customizable pipeline for:
 - converting files into texts
 - cleaning texts
 - splitting texts
 - writing them to a Document Store

In this tutorial, we download Wikipedia articles about Game of Thrones, apply a basic cleaning function, and index them in Elasticsearch.

In [5]:
# Let's first fetch some documents that we want to query
# Here: 517 Wikipedia articles for Game of Thrones
doc_dir = "D:\\Train\\"
#s3_url = "https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt.zip"
#fetch_archive_from_http(url=s3_url, output_dir=doc_dir)

# Convert files to dicts
# You can optionally supply a cleaning function that is applied to each doc (e.g. to remove footers)
# It must take a str as input, and return a str.
dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, split_paragraphs=True)

# We now have a list of dictionaries that we can write to our document store.
# If your texts come from a different source (e.g. a DB), you can of course skip convert_files_to_dicts() and create the dictionaries yourself.
# The default format here is:
# {
#    'text': "<DOCUMENT_TEXT_HERE>",
#    'meta': {'name': "<DOCUMENT_NAME_HERE>", ...}
#}
# (Optionally: you can also add more key-value-pairs here, that will be indexed as fields in Elasticsearch and
# can be accessed later for filtering or shown in the responses of the Finder)

# Let's have a look at the first 3 entries:
print(dicts[:3])

# Now, let's write the dicts containing documents to our DB.
document_store.write_documents(dicts)

05/11/2021 22:22:47 - INFO - haystack.preprocessor.utils -   Converting D:\Train\0_Game_of_Thrones__season_8_.txt
05/11/2021 22:22:47 - INFO - haystack.preprocessor.utils -   Converting D:\Train\101_Titties_and_Dragons.txt
05/11/2021 22:22:48 - INFO - haystack.preprocessor.utils -   Converting D:\Train\102_The_Princess_and_the_Queen.txt
05/11/2021 22:22:48 - INFO - haystack.preprocessor.utils -   Converting D:\Train\10_Beyond_the_Wall__Game_of_Thrones_.txt
05/11/2021 22:22:48 - INFO - haystack.preprocessor.utils -   Converting D:\Train\118_Dark_Wings__Dark_Words.txt
05/11/2021 22:22:48 - INFO - haystack.preprocessor.utils -   Converting D:\Train\119_Walk_of_Punishment.txt
05/11/2021 22:22:48 - INFO - haystack.preprocessor.utils -   Converting D:\Train\11_The_Dragon_and_the_Wolf.txt
05/11/2021 22:22:48 - INFO - haystack.preprocessor.utils -   Converting D:\Train\120_And_Now_His_Watch_Is_Ended.txt
05/11/2021 22:22:48 - INFO - haystack.preprocessor.utils -   Converting D:\Train\121_The_Be

05/11/2021 22:22:49 - INFO - haystack.preprocessor.utils -   Converting D:\Train\331_Bran_Stark.txt
05/11/2021 22:22:49 - INFO - haystack.preprocessor.utils -   Converting D:\Train\332_Sansa_Stark.txt
05/11/2021 22:22:49 - INFO - haystack.preprocessor.utils -   Converting D:\Train\334_Rickon_Stark.txt
05/11/2021 22:22:49 - INFO - haystack.preprocessor.utils -   Converting D:\Train\33_David_Benioff.txt
05/11/2021 22:22:49 - INFO - haystack.preprocessor.utils -   Converting D:\Train\340_Roose_Bolton.txt
05/11/2021 22:22:49 - INFO - haystack.preprocessor.utils -   Converting D:\Train\341_Ned_Stark.txt
05/11/2021 22:22:50 - INFO - haystack.preprocessor.utils -   Converting D:\Train\342_Theon_Greyjoy.txt
05/11/2021 22:22:50 - INFO - haystack.preprocessor.utils -   Converting D:\Train\343_Catelyn_Stark.txt
05/11/2021 22:22:50 - INFO - haystack.preprocessor.utils -   Converting D:\Train\345_A_Game_of_Thrones__comics_.txt
05/11/2021 22:22:50 - INFO - haystack.preprocessor.utils -   Converting 

05/11/2021 22:22:52 - INFO - haystack.preprocessor.utils -   Converting D:\Train\504_List_of_A_Song_of_Ice_and_Fire_video_games.txt
05/11/2021 22:22:52 - INFO - haystack.preprocessor.utils -   Converting D:\Train\506_Game_of_Thrones_Theme.txt
05/11/2021 22:22:52 - INFO - haystack.preprocessor.utils -   Converting D:\Train\508_A_Game_of_Thrones__Second_Edition__card_game_.txt
05/11/2021 22:22:52 - INFO - haystack.preprocessor.utils -   Converting D:\Train\511_After_the_Thrones.txt
05/11/2021 22:22:52 - INFO - haystack.preprocessor.utils -   Converting D:\Train\512_Home__Game_of_Thrones_.txt
05/11/2021 22:22:52 - INFO - haystack.preprocessor.utils -   Converting D:\Train\513_Oathbreaker__Game_of_Thrones_.txt
05/11/2021 22:22:52 - INFO - haystack.preprocessor.utils -   Converting D:\Train\514_Book_of_the_Stranger.txt
05/11/2021 22:22:52 - INFO - haystack.preprocessor.utils -   Converting D:\Train\515_The_Door__Game_of_Thrones_.txt
05/11/2021 22:22:52 - INFO - haystack.preprocessor.utils -

[{'text': "The eighth and final season of the fantasy drama television series ''Game of Thrones'', produced by HBO, premiered on April 14, 2019, and concluded on May 19, 2019. Unlike the first six seasons, which consisted of ten episodes each, and the seventh season, which consisted of seven episodes, the eighth season consists of only six episodes.\nThe final season depicts the culmination of the series' two primary conflicts: the Great War against the Army of the Dead, and the Last War for control of the Iron Throne. The first half of the season involves many of the main characters converging at Winterfell with their armies in an effort to repel the Night King and his army of White Walkers and wights. The second half of the season resumes the war for the throne as Daenerys Targaryen assaults King's Landing in an attempt to unseat Cersei Lannister as the ruler of the Seven Kingdoms.\nThe season was filmed from October 2017 to July 2018 and largely consists of original content not foun

05/11/2021 22:22:55 - INFO - elasticsearch -   POST http://localhost:9200/_bulk?refresh=wait_for [status:200 request:2.314s]
05/11/2021 22:22:57 - INFO - elasticsearch -   POST http://localhost:9200/_bulk?refresh=wait_for [status:200 request:1.151s]
05/11/2021 22:22:58 - INFO - elasticsearch -   POST http://localhost:9200/_bulk?refresh=wait_for [status:200 request:1.106s]
05/11/2021 22:22:59 - INFO - elasticsearch -   POST http://localhost:9200/_bulk?refresh=wait_for [status:200 request:1.145s]
05/11/2021 22:23:00 - INFO - elasticsearch -   POST http://localhost:9200/_bulk?refresh=wait_for [status:200 request:1.096s]


## Initalize Retriever, Reader,  & Finder

### Retriever

Retrievers help narrowing down the scope for the Reader to smaller units of text where a given question could be answered.
They use some simple but fast algorithm.

**Here:** We use Elasticsearch's default BM25 algorithm

**Alternatives:**

- Customize the `ElasticsearchRetriever`with custom queries (e.g. boosting) and filters
- Use `TfidfRetriever` in combination with a SQL or InMemory Document store for simple prototyping and debugging
- Use `EmbeddingRetriever` to find candidate documents based on the similarity of embeddings (e.g. created via Sentence-BERT)
- Use `DensePassageRetriever` to use different embedding models for passage and query (see Tutorial 6)

In [6]:
from haystack.retriever.sparse import ElasticsearchRetriever
retriever = ElasticsearchRetriever(document_store=document_store)

In [7]:
# Alternative: An in-memory TfidfRetriever based on Pandas dataframes for building quick-prototypes with SQLite document store.

# from haystack.retriever.sparse import TfidfRetriever
# retriever = TfidfRetriever(document_store=document_store)

### Reader

A Reader scans the texts returned by retrievers in detail and extracts the k best answers. They are based
on powerful, but slower deep learning models.

Haystack currently supports Readers based on the frameworks FARM and Transformers.
With both you can either load a local model or one from Hugging Face's model hub (https://huggingface.co/models).

**Here:** a medium sized RoBERTa QA model using a Reader based on FARM (https://huggingface.co/deepset/roberta-base-squad2)

**Alternatives (Reader):** TransformersReader (leveraging the `pipeline` of the Transformers package)

**Alternatives (Models):** e.g. "distilbert-base-uncased-distilled-squad" (fast) or "deepset/bert-large-uncased-whole-word-masking-squad2" (good accuracy)

**Hint:** You can adjust the model to return "no answer possible" with the no_ans_boost. Higher values mean the model prefers "no answer possible"

#### FARMReader

In [8]:
# Load a  local model or any of the QA models on
# Hugging Face's model hub (https://huggingface.co/models)

reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=False)

05/11/2021 22:23:08 - INFO - farm.utils -   device: cpu n_gpu: 0, distributed training: False, automatic mixed precision training: None
05/11/2021 22:23:08 - INFO - farm.infer -   Could not find `deepset/roberta-base-squad2` locally. Try to download from model hub ...
Some weights of RobertaModel were not initialized from the model checkpoint at deepset/roberta-base-squad2 and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
	 We guess it's an *ENGLISH* model ... 
	 If not: Init the language model by supplying the 'language' param.
05/11/2021 22:24:07 - INFO - farm.utils -   device: cpu n_gpu: 0, distributed training: False, automatic mixed precision training: None
05/11/2021 22:24:07 - INFO - farm.infer -   Got ya 7 parallel workers to do inference ...
05/11/2021 22:24:08 - INFO - farm.infer -    0    0    0    0    0    0    0 
05/11/2021 2

#### TransformersReader

In [9]:
# Alternative:
# reader = TransformersReader(model_name_or_path="distilbert-base-uncased-distilled-squad", tokenizer="distilbert-base-uncased", use_gpu=-1)

### Finder

The Finder sticks together reader and retriever in a pipeline to answer our actual questions. 

In [10]:
finder = Finder(reader, retriever)

## Voilà! Ask a question!

In [11]:
# You can configure how many candidates the reader and retriever shall return
# The higher top_k_retriever, the better (but also the slower) your answers. 
import time
top_k_retriever = 5 #>= reader
top_k_reader = 5
start = time.time()
prediction = finder.get_answers(question="father of Arya Stark", top_k_retriever=top_k_retriever, top_k_reader=top_k_reader)
end = time.time()
print(end - start)

05/11/2021 22:24:21 - INFO - elasticsearch -   POST http://localhost:9200/document/_search [status:200 request:0.186s]
05/11/2021 22:24:21 - INFO - haystack.finder -   Got 5 candidates from retriever
05/11/2021 22:24:21 - INFO - haystack.finder -   Reader is looking for detailed answer in 4151 chars ...
Inferencing Samples: 100%|█████████████████████████████████████████████████████████| 1/1 [00:04<00:00,  4.13s/ Batches]
Inferencing Samples: 100%|█████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.29s/ Batches]
Inferencing Samples: 100%|█████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.03 Batches/s]
Inferencing Samples: 100%|█████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.15s/ Batches]
Inferencing Samples: 100%|█████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.17 Batches/s]


71.70610094070435


In [12]:
# prediction = finder.get_answers(question="Who created the Dothraki vocabulary?", top_k_reader=5)
# prediction = finder.get_answers(question="Who is the sister of Sansa?", top_k_reader=5)

In [14]:
print_answers(prediction, details="minimal")

[   {   'answer': 'Lord Eddard Stark',
        'context': 'ark daughters.\n'
                   'During the Tourney of the Hand to honour her father Lord '
                   'Eddard Stark, Sansa Stark is enchanted by the knights '
                   'performing in the event.'},
    {   'answer': 'Eddard "Ned" Stark',
        'context': '=\n'
                   "After Varys tells him that Sansa Stark's life is also at "
                   'stake, Eddard "Ned" Stark agrees to make a false '
                   'confession and swear loyalty to King Joffr'},
    {   'answer': 'Joffrey',
        'context': 'laying with one of his wooden toys.\n'
                   "After Eddard discovers the truth of Joffrey's paternity, "
                   'he tells Sansa that they will be heading back to Winterfe'},
    {   'answer': 'Robb',
        'context': 'allow the army to cross the river and to commit his troops '
                   'in return for Robb and Arya Stark marrying two of his '
        

In [15]:
print(prediction['question'])

father of Arya Stark


In [16]:
print(str(prediction['question']+'\n'), end='')

father of Arya Stark


In [17]:
for i in range(5):
    print('Answer ' + str(i + 1) + ' : ' + str(prediction['answers'][i]['answer']) + '\n' + 'Probabilty : ' + str(prediction['answers'][i]['probability']) + '\n' + 'Retriever Score : ' +  str(prediction['answers'][i]['score']) + '\n' + 'File Name : ' + str(prediction['answers'][i]['meta']['name']) + '\n' + 'Context : ' + str(prediction['answers'][i]['context']) + '\n------------------------------------------------------------------')

Answer 1 : Lord Eddard Stark
Probabilty : 0.8016192392167827
Retriever Score : 11.171564102172852
File Name : 332_Sansa_Stark.txt
Context : ark daughters.
During the Tourney of the Hand to honour her father Lord Eddard Stark, Sansa Stark is enchanted by the knights performing in the event.
------------------------------------------------------------------
Answer 2 : Eddard "Ned" Stark
Probabilty : 0.68251427376839
Retriever Score : 6.122804641723633
File Name : 450_Baelor.txt
Context : =
After Varys tells him that Sansa Stark's life is also at stake, Eddard "Ned" Stark agrees to make a false confession and swear loyalty to King Joffr
------------------------------------------------------------------
Answer 3 : Joffrey
Probabilty : 0.5400726989370918
Retriever Score : 1.2850825786590576
File Name : 332_Sansa_Stark.txt
Context : laying with one of his wooden toys.
After Eddard discovers the truth of Joffrey's paternity, he tells Sansa that they will be heading back to Winterfe
----------

In [17]:
prediction

{'question': 'Who is the father of Arya Stark?',
 'no_ans_gap': 10.62626028060913,
 'answers': [{'answer': 'Ned',
   'score': 13.965742111206055,
   'probability': 0.8514118724143873,
   'context': "\n====Season 1====\nArya accompanies her father Ned and her sister Sansa to King's Landing. Before their departure, Arya's half-brother Jon Snow gifts A",
   'offset_start': 46,
   'offset_end': 49,
   'offset_start_in_doc': 46,
   'offset_end_in_doc': 49,
   'document_id': '1f0175a9-7903-434a-b8f5-e9a1296ee61f',
   'meta': {'name': '43_Arya_Stark.txt'}},
  {'answer': 'Lord Eddard Stark',
   'score': 13.761412620544434,
   'probability': 0.8481515792507891,
   'context': 'ark daughters.\nDuring the Tourney of the Hand to honour her father Lord Eddard Stark, Sansa Stark is enchanted by the knights performing in the event.',
   'offset_start': 67,
   'offset_end': 84,
   'offset_start_in_doc': 659,
   'offset_end_in_doc': 676,
   'document_id': '82486106-301e-47af-b74b-ae7fd0f90750',
   'meta

In [24]:
start = time.time()
prediction1 = finder.get_answers(question="Who is the sister of Sansa?", top_k_retriever=top_k_retriever, top_k_reader=top_k_reader)
end = time.time()
print(end - start)

05/11/2021 17:13:06 - INFO - elasticsearch -   POST http://localhost:9200/document/_search [status:200 request:0.037s]
05/11/2021 17:13:06 - INFO - haystack.finder -   Got 5 candidates from retriever
05/11/2021 17:13:06 - INFO - haystack.finder -   Reader is looking for detailed answer in 13447 chars ...
Inferencing Samples: 100%|█████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.69s/ Batches]
Inferencing Samples: 100%|█████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.99 Batches/s]
Inferencing Samples: 100%|█████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.14s/ Batches]
Inferencing Samples: 100%|█████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.40s/ Batches]
Inferencing Samples: 100%|█████████████████████████████████████████████████████████| 1/1 [00:04<00:00,  4.37s/ Batches]

11.525610208511353





In [20]:
print_answers(prediction1, details="minimal")

[   {   'answer': 'Arya',
        'context': "denotes weakness. She doesn't have cool swordplay skills "
                   "like her sister Arya; she isn't a smart seductress like "
                   'Margaery Tyrell or a fierce queen lik'},
    {   'answer': 'Arya',
        'context': 'n contrast to "her universally (and rightly) adored tomboy '
                   'little sister Arya", stating that Sansa "arguably gets a '
                   'disproportionate amount of fan hat'},
    {   'answer': 'Arya Stark',
        'context': ' at the beginning of the series. She was raised with a '
                   'younger sister Arya Stark, two younger brothers Rickon '
                   'Stark and Bran Stark, as well as an olde'},
    {   'answer': 'Margaery',
        'context': "hom she hopes will ask for her hand. His sister, Joffrey's "
                   'new fiancé, Margaery, is also kind to her and takes her to '
                   'dine with her grandmother, Olenn'},
    {   'answe

In [21]:
for i in range(5):
    print('Answer ' + str(i + 1) + ' : ' + str(prediction1['answers'][i]['answer']) + '\n' + 'Probabilty : ' + str(prediction1['answers'][i]['probability']) + '\n' + 'Retriever Score : ' +  str(prediction1['answers'][i]['score']) + '\n' + 'File Name : ' + str(prediction1['answers'][i]['meta']['name']) + '\n' + 'Context : ' + str(prediction1['answers'][i]['context']) + '\n------------------------------------------------------------------')

Answer 1 : Arya
Probabilty : 0.8302002114387067
Retriever Score : 12.696374893188477
File Name : 332_Sansa_Stark.txt
Context : denotes weakness. She doesn't have cool swordplay skills like her sister Arya; she isn't a smart seductress like Margaery Tyrell or a fierce queen lik
------------------------------------------------------------------
Answer 2 : Arya
Probabilty : 0.8129394055716899
Retriever Score : 11.753791809082031
File Name : 332_Sansa_Stark.txt
Context : n contrast to "her universally (and rightly) adored tomboy little sister Arya", stating that Sansa "arguably gets a disproportionate amount of fan hat
------------------------------------------------------------------
Answer 3 : Arya Stark
Probabilty : 0.7988619658918419
Retriever Score : 11.033574104309082
File Name : 332_Sansa_Stark.txt
Context :  at the beginning of the series. She was raised with a younger sister Arya Stark, two younger brothers Rickon Stark and Bran Stark, as well as an olde
--------------------------

In [22]:
import time

In [32]:
start = time.time()
prediction2 = finder.get_answers(question="Who created the Dothraki vocabulary?",  top_k_retriever=top_k_retriever, top_k_reader=top_k_reader)
end = time.time()
print(end - start)


03/30/2021 21:10:01 - INFO - elasticsearch -   POST http://localhost:9200/document/_search [status:200 request:0.022s]
03/30/2021 21:10:01 - INFO - haystack.finder -   Got 5 candidates from retriever
03/30/2021 21:10:01 - INFO - haystack.finder -   Reader is looking for detailed answer in 3973 chars ...
Inferencing Samples: 100%|█████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.16 Batches/s]
Inferencing Samples: 100%|█████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.45 Batches/s]
Inferencing Samples: 100%|█████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.49 Batches/s]
Inferencing Samples: 100%|█████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.43 Batches/s]
Inferencing Samples: 100%|█████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.37 Batches/s]

4.305827856063843





In [25]:
print_answers(prediction2, details="minimal")

[   {   'answer': 'David J. Peterson',
        'context': "age for ''Game of Thrones''\n"
                   'The Dothraki vocabulary was created by David J. Peterson '
                   'well in advance of the adaptation. HBO hired the Language '
                   'Creatio'},
    {   'answer': 'David J. Peterson',
        'context': '\n'
                   '===Valyrian===\n'
                   'David J. Peterson, who created the Dothraki language for '
                   'the first season of the show, was entrusted by the '
                   'producers to design a new '},
    {   'answer': 'David J. Peterson',
        'context': ' but not developed beyond a few words. For the TV series, '
                   'linguist David J. Peterson created the High Valyrian '
                   'language, as well as the derivative lan'},
    {   'answer': 'Ghiscari',
        'context': 'anguages, descended from High Valyrian with the substrate '
                   'of the local Ghiscari language

In [26]:
for i in range(5):
    print('Answer ' + str(i + 1) + ' : ' + str(prediction2['answers'][i]['answer']) + '\n' + 'Probabilty : ' + str(prediction2['answers'][i]['probability']) + '\n' + 'Retriever Score : ' +  str(prediction2['answers'][i]['score']) + '\n' + 'File Name : ' + str(prediction2['answers'][i]['meta']['name']) + '\n' + 'Context : ' + str(prediction2['answers'][i]['context']) + '\n------------------------------------------------------------------')

Answer 1 : David J. Peterson
Probabilty : 0.8093757182427538
Retriever Score : 11.567670822143555
File Name : 214_Dothraki_language.txt
Context : age for ''Game of Thrones''
The Dothraki vocabulary was created by David J. Peterson well in advance of the adaptation. HBO hired the Language Creatio
------------------------------------------------------------------
Answer 2 : David J. Peterson
Probabilty : 0.8056866904115011
Retriever Score : 11.377784729003906
File Name : 87_Valar_Dohaeris.txt
Context : 
===Valyrian===
David J. Peterson, who created the Dothraki language for the first season of the show, was entrusted by the producers to design a new 
------------------------------------------------------------------
Answer 3 : David J. Peterson
Probabilty : 0.4723144998461879
Retriever Score : -0.8868430852890015
File Name : 213_Valyrian_languages.txt
Context :  but not developed beyond a few words. For the TV series, linguist David J. Peterson created the High Valyrian language, as well

In [23]:
start = time.time()
prediction4 = finder.get_answers(question="Who is the sister of Arya Stark?", top_k_retriever=top_k_retriever, top_k_reader=top_k_reader)
end = time.time()
print(end - start)

05/11/2021 17:12:45 - INFO - elasticsearch -   POST http://localhost:9200/document/_search [status:200 request:0.044s]
05/11/2021 17:12:45 - INFO - haystack.finder -   Got 5 candidates from retriever
05/11/2021 17:12:45 - INFO - haystack.finder -   Reader is looking for detailed answer in 5398 chars ...
Inferencing Samples: 100%|█████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.62 Batches/s]
Inferencing Samples: 100%|█████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.18s/ Batches]
Inferencing Samples: 100%|█████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.73 Batches/s]
Inferencing Samples: 100%|█████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.13s/ Batches]
Inferencing Samples: 100%|█████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.06s/ Batches]

7.141717910766602





In [26]:
prediction

{'question': 'father of Arya Stark',
 'no_ans_gap': 4.670291900634766,
 'answers': [{'answer': 'Lord Eddard Stark',
   'score': 11.171564102172852,
   'probability': 0.8016192392167827,
   'context': 'ark daughters.\nDuring the Tourney of the Hand to honour her father Lord Eddard Stark, Sansa Stark is enchanted by the knights performing in the event.',
   'offset_start': 67,
   'offset_end': 84,
   'offset_start_in_doc': 659,
   'offset_end_in_doc': 676,
   'document_id': '62455470-513d-4c52-ac9d-0f4e79bf0ac8',
   'meta': {'name': '332_Sansa_Stark.txt'}},
  {'answer': 'Eddard "Ned" Stark',
   'score': 6.122804641723633,
   'probability': 0.68251427376839,
   'context': '=\nAfter Varys tells him that Sansa Stark\'s life is also at stake, Eddard "Ned" Stark agrees to make a false confession and swear loyalty to King Joffr',
   'offset_start': 66,
   'offset_end': 84,
   'offset_start_in_doc': 89,
   'offset_end_in_doc': 107,
   'document_id': 'e30945ef-cbc1-454b-b775-74e6b0d9246f',
   'm

In [27]:
prediction1

{'question': 'Who is the sister of Sansa?',
 'no_ans_gap': 7.843256950378418,
 'answers': [{'answer': 'Arya',
   'score': 12.696374893188477,
   'probability': 0.8302002114387067,
   'context': "denotes weakness. She doesn't have cool swordplay skills like her sister Arya; she isn't a smart seductress like Margaery Tyrell or a fierce queen lik",
   'offset_start': 73,
   'offset_end': 77,
   'offset_start_in_doc': 2601,
   'offset_end_in_doc': 2605,
   'document_id': '77c6e201-e2eb-419e-853d-57d7d7b87c83',
   'meta': {'name': '332_Sansa_Stark.txt'}},
  {'answer': 'Arya',
   'score': 11.753791809082031,
   'probability': 0.8129394055716899,
   'context': 'n contrast to "her universally (and rightly) adored tomboy little sister Arya", stating that Sansa "arguably gets a disproportionate amount of fan hat',
   'offset_start': 73,
   'offset_end': 77,
   'offset_start_in_doc': 1402,
   'offset_end_in_doc': 1406,
   'document_id': '77c6e201-e2eb-419e-853d-57d7d7b87c83',
   'meta': {'name': 

In [28]:
prediction4

{'question': 'Who is the sister of Arya Stark?',
 'no_ans_gap': 9.28957462310791,
 'answers': [{'answer': 'Sansa',
   'score': 11.76846694946289,
   'probability': 0.8132181995598411,
   'context': 'k series.  She has five siblings: an older brother Robb, an older sister Sansa, two younger brothers Bran and Rickon, and an older illegitimate half-b',
   'offset_start': 73,
   'offset_end': 78,
   'offset_start_in_doc': 215,
   'offset_end_in_doc': 220,
   'document_id': '5ad029e3-e220-40ab-9238-bc2ab161f630',
   'meta': {'name': '43_Arya_Stark.txt'}},
  {'answer': 'Sansa',
   'score': 10.562017440795898,
   'probability': 0.7892236596594573,
   'context': "\n====Season 1====\nArya accompanies her father Ned and her sister Sansa to King's Landing. Before their departure, Arya's half-brother Jon Snow gifts A",
   'offset_start': 65,
   'offset_end': 70,
   'offset_start_in_doc': 65,
   'offset_end_in_doc': 70,
   'document_id': '169962e4-0dba-46c7-977d-9224095fa7d3',
   'meta': {'name': '