# Retriever, ranker, question answering

The purpose of this notebook is to create a straightforward pipeline for the extractive question answering task. The pipeline includes a retriever, a ranker, and a question-answering model, with the retriever serving as the initial filter in this architecture. Following that, the ranker filters the documents again based on the semantic similarity between the question and the documents. Finally, we'll utilize a question-answering model to extract the answer from the filtered documents.

Due to the slow nature of extractive question answering, even with document filtering, utilizing a GPU for this type of pipeline that employs QA models would be advantageous.

In [1]:
from cherche import data, rank, retrieve, qa
from sentence_transformers import SentenceTransformer
from transformers import pipeline

We can use the `towns` corpus for this example: we can ask questions about the cities of Bordeaux, Toulouse, Paris and Lyon.

In [2]:
documents = data.load_towns()
documents[:4]

[{'id': 0,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': 'Paris (French pronunciation: \u200b[paʁi] (listen)) is the capital and most populous city of France, with an estimated population of 2,175,601 residents as of 2018, in an area of more than 105 square kilometres (41 square miles).'},
 {'id': 1,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': "Since the 17th century, Paris has been one of Europe's major centres of finance, diplomacy, commerce, fashion, gastronomy, science, and arts."},
 {'id': 2,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': 'The City of Paris is the centre and seat of government of the region and province of Île-de-France, or Paris Region, which has an estimated population of 12,174,880, or about 18 percent of the population of France as of 2017.'},
 {'id': 3,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': 'The Paris Region had 

We start by creating a retriever whose mission will be to quickly filter the documents. This retriever will find documents based on the title and content of the article using `on` parameter.

In [3]:
retriever = retrieve.TfIdf(
    key="id", on=["title", "article"], documents=documents, k=100
)

We then add a ranker to the pipeline to filter the results according to the semantic similarity between the query and the retrieved documents. 
similarity between the query and the retriever's output documents. The ranker will be based on the content of the article.

In [4]:
ranker = rank.Encoder(
    key="id",
    on=["title", "article"],
    encoder=SentenceTransformer("sentence-transformers/all-mpnet-base-v2").encode,
    k=30,
)

In [5]:
question_answering = qa.QA(
    model=pipeline(
        "question-answering",
        model="deepset/roberta-base-squad2",
        tokenizer="deepset/roberta-base-squad2",
    ),
    on="article",
)

We initialise the pipeline and ask the retrievers to index the documents and the ranker to pre-compute the document embeddings. This step can take some time if you have a lot of documents. A GPU could speed up the process. Also the question answering model needs the documents fields and not only ids. To map ids to documents, we add the `documents` to our pipeline. 

In [6]:
search = retriever + ranker + documents + question_answering
search.add(documents)

Encoder ranker: 100%|████████| 2/2 [00:02<00:00,  1.34s/it]


TfIdf retriever
	key      : id
	on       : title, article
	documents: 105
Encoder ranker
	key       : id
	on        : title, article
	normalize : True
	embeddings: 105
Mapping to documents
Question Answering
	on: article

Paris Saint Germain is the name of the biggest football team of Paris. The Question Answering Pipeline provides the ranking-related similarity score called `similarity` and the question answering task-related score `qa_score`. The higher the `qa_score` the more likely the answer is. The answers are sorted from the most likely to the least likely.

In [7]:
search("What is the name of the football club of Paris?")

Question answering: 100%|████| 1/1 [00:01<00:00,  1.70s/it]


[{'id': 20,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': 'The football club Paris Saint-Germain and the rugby union club Stade Français are based in Paris.',
  'similarity': 0.7104821,
  'score': 0.9848365783691406,
  'start': 18,
  'end': 37,
  'answer': 'Paris Saint-Germain',
  'question': 'What is the name of the football club of Paris?'},
 {'id': 21,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': 'The 80,000-seat Stade de France, built for the 1998 FIFA World Cup, is located just north of Paris in the neighbouring commune of Saint-Denis.',
  'similarity': 0.46161494,
  'score': 0.8121969103813171,
  'start': 16,
  'end': 31,
  'answer': 'Stade de France',
  'question': 'What is the name of the football club of Paris?'},
 {'id': 40,
  'title': 'Toulouse',
  'url': 'https://en.wikipedia.org/wiki/Toulouse',
  'article': 'The city\'s unique architecture made of pinkish terracotta bricks has earned Toulouse the nickna

Toulouse in France is known as "The Pink City".

In [8]:
search("What is the color of Toulouse?")

Question answering: 100%|████| 1/1 [00:01<00:00,  1.57s/it]


[{'id': 40,
  'title': 'Toulouse',
  'url': 'https://en.wikipedia.org/wiki/Toulouse',
  'article': 'The city\'s unique architecture made of pinkish terracotta bricks has earned Toulouse the nickname La Ville Rose ("The Pink City").',
  'similarity': 0.640329,
  'score': 0.5257010459899902,
  'start': 39,
  'end': 46,
  'answer': 'pinkish',
  'question': 'What is the color of Toulouse?'},
 {'id': 64,
  'title': 'Bordeaux',
  'url': 'https://en.wikipedia.org/wiki/Bordeaux',
  'article': 'Crossed by the Garonne River and bordering the Atlantic Coast, the metropolis, a perfect example of the Age of Enlightment, has been showcasing since the 18th century its blond and golden facades, its courtyards and monumental squares, as well as its lively streets accompanied by its French-style gardens.',
  'similarity': 0.41179663,
  'score': 0.36229389905929565,
  'start': 171,
  'end': 176,
  'answer': 'blond',
  'question': 'What is the color of Toulouse?'},
 {'id': 35,
  'title': 'Toulouse',
  'ur

Bordeaux is known worldwide for its wine.

In [9]:
search("What is the speciality of Bordeaux ?")

Question answering: 100%|████| 1/1 [00:01<00:00,  1.61s/it]


[{'id': 65,
  'title': 'Bordeaux',
  'url': 'https://en.wikipedia.org/wiki/Bordeaux',
  'article': "Bordeaux is a world capital of wine, with its castles and vineyards of the Bordeaux region that stand on the hillsides of the Gironde and is home to the world's main wine fair, Vinexpo.",
  'similarity': 0.64157647,
  'score': 0.7739061713218689,
  'start': 31,
  'end': 35,
  'answer': 'wine',
  'question': 'What is the speciality of Bordeaux ?'},
 {'id': 74,
  'title': 'Bordeaux',
  'url': 'https://en.wikipedia.org/wiki/Bordeaux',
  'article': 'Bordeaux is also ranked as a Sufficiency city by the Globalization and World Cities Research Network.',
  'similarity': 0.5668111,
  'score': 0.7677939534187317,
  'start': 29,
  'end': 40,
  'answer': 'Sufficiency',
  'question': 'What is the speciality of Bordeaux ?'},
 {'id': 68,
  'title': 'Bordeaux',
  'url': 'https://en.wikipedia.org/wiki/Bordeaux',
  'article': 'The link with aviation dates back to 1910, the year the first airplane flew ov

Every year there is a silk festival in Lyon.

In [10]:
search("What is the speciality of Lyon ?")

Question answering: 100%|████| 1/1 [00:01<00:00,  1.58s/it]


[{'id': 52,
  'title': 'Lyon',
  'url': 'https://en.wikipedia.org/wiki/Lyon',
  'article': 'Economically, Lyon is a major centre for banking, as well as for the chemical, pharmaceutical and biotech industries.',
  'similarity': 0.69942987,
  'score': 0.6367450952529907,
  'start': 41,
  'end': 48,
  'answer': 'banking',
  'question': 'What is the speciality of Lyon ?'},
 {'id': 53,
  'title': 'Lyon',
  'url': 'https://en.wikipedia.org/wiki/Lyon',
  'article': 'The city contains a significant software industry with a particular focus on video games; in recent years it has fostered a growing local start-up sector.',
  'similarity': 0.5620864,
  'score': 0.5953128933906555,
  'start': 77,
  'end': 88,
  'answer': 'video games',
  'question': 'What is the speciality of Lyon ?'},
 {'id': 48,
  'title': 'Lyon',
  'url': 'https://en.wikipedia.org/wiki/Lyon',
  'article': "The city is recognised for its cuisine and gastronomy, as well as historical and architectural landmarks; as such, the dis