# Generative QA with "Retrieval-Augmented Generation"


While extractive QA highlights the span of text that answers a query,
generative QA can return a novel text answer that it has composed.
In this tutorial, you will learn how to set up a generative system using the
[RAG model](https://arxiv.org/abs/2005.11401) which conditions the
answer generator on a set of retrieved documents.


## Preparing the Colab Environment

- [Enable GPU Runtime](https://docs.haystack.deepset.ai/docs/enabling-gpu-acceleration#enabling-the-gpu-in-colab)


## Installing Haystack

To start, let's install the latest release of Haystack with `pip`:

In [None]:
%%bash

pip install --upgrade pip
pip install farm-haystack[colab,faiss]

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pip
  Downloading pip-23.1-py3-none-any.whl (2.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 22.9 MB/s eta 0:00:00
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.0.1
    Uninstalling pip-23.0.1:
      Successfully uninstalled pip-23.0.1
Successfully installed pip-23.1
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting farm-haystack[colab,faiss]
  Downloading farm_haystack-1.15.1-py3-none-any.whl (681 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 681.0/681.0 kB 10.9 MB/s eta 0:00:00
Collecting azure-ai-formrecognizer>=3.2.0b2 (from farm-haystack[colab,faiss])
  Downloading azure_ai_formrecognizer-3.3.0b1-py3-none-any.whl (299 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 299.9/299.9 kB 36.0 MB/s eta 0:00:00
Collecting boilerpy3 (from farm-haysta

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.12.0 requires protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3, but you have protobuf 3.20.2 which is incompatible.
tensorflow-metadata 1.13.1 requires protobuf<5,>=3.20.3, but you have protobuf 3.20.2 which is incompatible.


### Enabling Telemetry 
Knowing you're using this tutorial helps us decide where to invest our efforts to build a better product but you can always opt out by commenting the following line. See [Telemetry](https://docs.haystack.deepset.ai/docs/telemetry) for more details.

In [None]:
from haystack.telemetry import tutorial_running

tutorial_running(7)

## Logging

We configure how logging messages should be displayed and which log level should be used before importing Haystack.
Example log message:
INFO - haystack.utils.preprocessing -  Converting data/tutorial1/218_Olenna_Tyrell.txt
Default log level in basicConfig is WARNING so the explicit parameter is not necessary but can be changed easily:

In [None]:
import logging

logging.basicConfig(format="%(levelname)s - %(name)s -  %(message)s", level=logging.WARNING)
logging.getLogger("haystack").setLevel(logging.INFO)

Let's download a csv containing some sample text and preprocess the data.


In [None]:
import pandas as pd

# Create dataframe with columns "title" and "text"
DATASET_URL = 'https://gitlab.com/tangibleai/nlpia2/-/raw/main/src/nlpia2/data/nlpia_lines.csv'
df = pd.read_csv(DATASET_URL)
df = df[df['is_text']]
df['title'] = df['line_text']
df['text'] = df['line_text']

print(df.head())

    Unnamed: 0  \
14          14   
16          16   
17          17   
18          18   
19          19   

                                                         line_text  \
14                                             This chapter covers   
16                                   * The power of human language   
17     * How natural language processing (NLP) is changing society   
18          * The kinds of NLP tasks that machines can now do well   
19  * Why unleashing the NLP genie is profitable ... and dangerous   

    line_number  \
14           14   
16           16   
17           17   
18           18   
19           19   

                                                         filename  is_text  \
14  Chapter-01_Machines-that-can-read-and-write-NLP-overview.adoc     True   
16  Chapter-01_Machines-that-can-read-and-write-NLP-overview.adoc     True   
17  Chapter-01_Machines-that-can-read-and-write-NLP-overview.adoc     True   
18  Chapter-01_Machines-that-can-read-and-w

We can cast our data into Haystack Document objects.
Alternatively, we can also just use dictionaries with "text" and "meta" fields

In [None]:
from haystack import Document


# Use data to initialize Document objects
titles = list(df["line_text"].values)
texts = list(df["line_text"].values)
documents = []
for title, text in zip(titles, texts):
    documents.append(Document(content=text, meta={"name": title or ""}))
documents[0]

<Document: {'content': 'This chapter covers', 'content_type': 'text', 'score': None, 'meta': {'name': 'This chapter covers'}, 'id_hash_keys': ['content'], 'embedding': None, 'id': '77f5f4db2fc7e2ea9ccaa3ce7c9570dd'}>

Here we initialize the FAISSDocumentStore, DensePassageRetriever and RAGenerator.
FAISS is chosen here since it is optimized vector storage.

In [None]:
document_store.index


NameError: ignored

In [None]:
from haystack.document_stores import FAISSDocumentStore
from haystack.nodes import RAGenerator, DensePassageRetriever


# Initialize FAISS document store.
# Set `return_embedding` to `True`, so generator doesn't have to perform re-embedding
document_store = FAISSDocumentStore(faiss_index_factory_str="HNSW", return_embedding=True)

# Initialize DPR Retriever to encode documents, encode question and query documents
retriever = DensePassageRetriever(
    document_store=document_store,
    query_embedding_model="facebook/dpr-question_encoder-single-nq-base",
    passage_embedding_model="facebook/dpr-ctx_encoder-single-nq-base",
  #  use_gpu=True,
    embed_title=True,
)

# Initialize RAG Generator
generator = RAGenerator(
    model_name_or_path="facebook/rag-token-nq",
   # use_gpu=True,
    top_k=1,
    max_length=15,
    min_length=5,
    embed_title=True,
    num_beams=3,
)

INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1
  return self.fget.__get__(instance, owner)()
INFO:haystack.modeling.model.language_model:Auto-detected model language: english
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'DPRQuestionEncoderTokenizer'. 
The class this function is called from is 'DPRContextEncoderTokenizerFast'.
INFO:haystack.modeling.model.language_model:Auto-detected model language: english
INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'RagTokenizer'. 
The class this function is called from is 'DPRQuestionEncoderTokenizer'.
The tokenizer class you load f

We write documents to the DocumentStore, first by deleting any remaining documents then calling `write_documents()`.
The `update_embeddings()` method uses the retriever to create an embedding for each document.


In [None]:
# Delete existing documents in documents store
document_store.delete_documents()

# Write documents to document store
document_store.write_documents(documents)

# Add documents embeddings to index
document_store.update_embeddings(retriever=retriever)

Writing Documents:   0%|          | 0/6258 [00:00<?, ?it/s]

INFO:haystack.document_stores.faiss:Updating embeddings for 6198 docs...


Updating Embedding:   0%|          | 0/6198 [00:00<?, ? docs/s]

Create embeddings:   0%|          | 0/6208 [00:00<?, ? Docs/s]

Here are our questions:

In [71]:
QUESTIONS = [
  "how Word2vec compares to LSA", 
  "what is an intent", 
  "what is an embedding", 
  "who discovered LDiA", 
  "what are the most advanced NLP models", 
  "what are RNNs good for"
]

Now let's run our system!
The retriever will pick out a small subset of documents that it finds relevant.
These are used to condition the generator as it generates the answer.
What it should return then are novel text spans that form and answer to your question!

In [59]:
# Or alternatively use the Pipeline class
from haystack.pipelines import GenerativeQAPipeline
from haystack.utils import print_answers

pipe = GenerativeQAPipeline(generator=generator, retriever=retriever)
for question in QUESTIONS:
    res = pipe.run(query=question, params={"Generator": {"top_k": 1}, "Retriever": {"top_k": 5}})
    print(res)
    query = res['query']
    answer = res['answers'][0].answer
    context = ('\n').join(res['answers'][0].meta['content'])
    print(f'Query:{query}')
    print(f'Answer:{answer}')
    print(f'Context:{context}\n\n')

{'query': 'how Word2vec compares to LSA', 'answers': [<Answer {'answer': ' four times more accurate', 'type': 'generative', 'score': None, 'context': None, 'offsets_in_document': None, 'offsets_in_context': None, 'document_ids': ['307689de4509f7a27a2c6459966f0b59', '8279a94278e5fa84a57d1ec7bbe0eca3', '67a51b6af48fcb0629db65aa1a7321ca', '1aff30118a46f0fe278585812e9a8112', '2bd0ded6ce6fee52c19b98735439152a'], 'meta': {'doc_scores': [0.6825931242164534, 0.6808143416594381, 0.6808041951368268, 0.6794595810068014, 0.6780510771618863], 'content': ['Word2Vec high dimensionality and continuous values for each dimension enable it to capture the full range of meaning for any given word.', 'Both LSA and Word2Vec algorithms allow adding new documents to your corpus and adjusting your existing word vectors to account for the co-occurrences in the new documents.', "Word2Vec embeddings were four times more accurate (45%) compared to equivalent LSA models (11%) at answering analogy questions like thos

In [None]:
from haystack.nodes import EmbeddingRetriever
extractive_retriever = EmbeddingRetriever(
    document_store = document_store, 
    embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1"
)

# Add documents embeddings to index
document_store.update_embeddings(retriever=extractive_retriever)

INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1


Downloading (…)ce_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

INFO:haystack.nodes.retriever.dense:Init retriever using embeddings of model sentence-transformers/multi-qa-mpnet-base-dot-v1


Downloading (…)16ebc/.gitattributes:   0%|          | 0.00/737 [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading (…)b6b5d16ebc/README.md:   0%|          | 0.00/8.65k [00:00<?, ?B/s]

Downloading (…)b5d16ebc/config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading (…)ebc/data_config.json:   0%|          | 0.00/25.5k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

Downloading (…)16ebc/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

Downloading (…)6ebc/train_script.py:   0%|          | 0.00/13.9k [00:00<?, ?B/s]

Downloading (…)b6b5d16ebc/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)5d16ebc/modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

  return self.fget.__get__(instance, owner)()
INFO:haystack.document_stores.faiss:Updating embeddings for 6198 docs...


Updating Embedding:   0%|          | 0/6198 [00:00<?, ? docs/s]

Batches:   0%|          | 0/194 [00:00<?, ?it/s]

In [None]:
pipe = GenerativeQAPipeline(generator=generator, retriever=retriever)
for question in QUESTIONS:
    res = pipe.run(query=question, params={"Generator": {"top_k": 1}, "Retriever": {"top_k": 10}})
    print_answers(res, details='medium')

'Query: how Word2vec compares to LSA'
'Answers:'
[   <Answer {'answer': ', the, the', 'type': 'generative', 'score': None, 'context': None, 'offsets_in_document': None, 'offsets_in_context': None, 'document_ids': ['bbea66846de914267528a41997194821', 'aa141332722516d0ca3e823a914e1046', '4c4f9065b7a82b9acda387b44aa41c7a', '267d25166d07b866def2f04e940fb06d', '4cee7333469b80f6d9b20de3d8be8e', 'cc4dc4956cb0be1d1352503fc4ed03c5', 'fa437d19070a4527c2277010dad96ff1', 'c06189a9a67a329ae451645f877fa632', '95002469a98945d02a02e3dbbfce5f75', '5c3f2516de24a14641f60b1cab73a094'], 'meta': {'doc_scores': [0.5167562917607736, 0.515840420410238, 0.514013829627047, 0.5127316985439093, 0.5125569815839454, 0.5124796096405463, 0.5121842683957405, 0.5121568450591585, 0.5114822009568091, 0.5113401500936152], 'content': ['No matter how long your text is, a CNN or feedforward neural network will have to do the exact same number of multiplications to compute the output prediction.', 'That way convolution can wor

## Extractive QA with Haystack

In [67]:
from haystack.nodes import EmbeddingRetriever

document_store.delete_documents()

# Write documents to document store
document_store.write_documents(documents)

extractive_retriever = EmbeddingRetriever(
    document_store = document_store, 
    embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1", 
    use_gpu=True
)

document_store.update_embeddings(retriever=extractive_retriever)

reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True)

Writing Documents:   0%|          | 0/6258 [00:00<?, ?it/s]

INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1
INFO:haystack.nodes.retriever.dense:Init retriever using embeddings of model sentence-transformers/multi-qa-mpnet-base-dot-v1
  return self.fget.__get__(instance, owner)()
INFO:haystack.document_stores.faiss:Updating embeddings for 6198 docs...


Updating Embedding:   0%|          | 0/6198 [00:00<?, ? docs/s]

Batches:   0%|          | 0/194 [00:00<?, ?it/s]

INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1
INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1
INFO:haystack.modeling.model.language_model: * LOADING MODEL: 'deepset/roberta-base-squad2' (Roberta)
INFO:haystack.modeling.model.language_model:Auto-detected model language: english
INFO:haystack.modeling.model.language_model:Loaded 'deepset/roberta-base-squad2' (Roberta model) from model hub.
INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1


In [74]:
from haystack.nodes import TransformersReader
reader = TransformersReader(model_name_or_path="deepset/roberta-base-squad2")


INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1


In [75]:
from haystack.pipelines import ExtractiveQAPipeline
p = ExtractiveQAPipeline(reader, extractive_retriever)
for question in QUESTIONS:
    res = p.run(query=question, params={"Reader": {"top_k": 1}, "Retriever": {"top_k": 10}})
    print_answers(res, details='minimum')

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

'Query: how Word2vec compares to LSA'
'Answers:'
[   {   'answer': 'four times more accurate',
        'context': 'Word2Vec embeddings were four times more accurate (45%) '
                   'compared to equivalent LSA models (11%) at answering '
                   'analogy qu'}]


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

'Query: what is an intent'
'Answers:'
[   {   'answer': 'to carry out some task or command',
        'context': 'Another intent might be to carry out some task or command, '
                   'such as a "translate" command or the query "How do I say '
                   "'Hello' in "}]


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

'Query: what is an embedding'
'Answers:'
[   {   'answer': 'vectors that represent the meaning (semantics) of words',
        'context': 'Word embeddings are vectors that represent the meaning '
                   '(semantics) of words.'}]


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

'Query: who discovered LDiA'
'Answers:'
[   {   'answer': 'geneticists in the UK',
        'context': 'The LDiA approach was developed in 2000 by geneticists in '
                   'the UK to help them "infer population structure" from '
                   'sequences of genes.foo'}]


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

'Query: what are the most advanced NLP models'
'Answers:'
[   {   'answer': 'CNNs',
        'context': 'And for NLP, CNNs are still the most efficient models for '
                   'many advanced natural languag'}]


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

'Query: what are RNNs good for'
'Answers:'
[   {   'answer': 'tagging and writing text',
        'context': 'So RNNs are great not only for reading text, but also for '
                   'tagging and writing text.'}]
