# Load and Prepare Dataset

Our source data will be taken from the Wiki Snippets dataset, which contains over 17 million passages from Wikipedia. But, since indexing the entire dataset may take some time, we will only utilize 50,000 passages in this demo that include "History" in the "section title" column. If you want, you may utilize the complete dataset. Pinecone vector database can effortlessly manage millions of documents for you.

In [2]:
from datasets import load_dataset

# load the dataset from huggingface in streaming mode and shuffle it
wiki_data = load_dataset(
    'vblagoje/wikipedia_snippets_streamed',
    split='train',
    streaming=True
).shuffle(seed=960)

We are loading the dataset in the streaming mode so that we don't have to wait for the whole dataset to download (which is over 9GB). Instead, we iteratively download records one at a time.

In [3]:
# show the contents of a single document in the dataset
next(iter(wiki_data))

{'wiki_id': 'Q7649565',
 'start_paragraph': 20,
 'start_character': 272,
 'end_paragraph': 24,
 'end_character': 380,
 'article_title': 'Sustainable Agriculture Research and Education',
 'section_title': "2000s & Evaluation of the program's effectiveness",
 'passage_text': "preserving the surrounding prairies. It ran until March 31, 2001.\nIn 2008, SARE celebrated its 20th anniversary. To that date, the program had funded 3,700 projects and was operating with an annual budget of approximately $19 million. Evaluation of the program's effectiveness As of 2008, 64% of farmers who had received SARE grants stated that they had been able to earn increased profits as a result of the funding they received and utilization of sustainable agriculture methods. Additionally, 79% of grantees said that they had experienced a significant improvement in soil quality though the environmentally friendly, sustainable methods that they were"}

In [4]:
history = wiki_data.filter(lambda x: x['section_title'] == 'History')


In [5]:
first_history = next(iter(history))


In [6]:
first_history

{'wiki_id': 'Q2644349',
 'start_paragraph': 10,
 'start_character': 397,
 'end_paragraph': 10,
 'end_character': 534,
 'article_title': 'Taupo District',
 'section_title': 'History',
 'passage_text': 'was not until the 1950s that the region started to develop, with forestry and the construction of the Wairakei geothermal power station.'}

Let's iterate through the dataset and apply our filter to select the 50,000 historical passages. We will extract `article_title`, `section_title` and `passage_text` from each document.

In [7]:
from tqdm.auto import tqdm  # progress bar

total_doc_count = 10
counter = 0
docs = []

# iterate through the dataset and apply our filter
for d in tqdm(history, total=total_doc_count):
    # extract the fields we need - article, section, and passage
    doc = {
        'article': d['article_title'],
        'section': d['section_title'],
        'passage': d['passage_text']
    }
    docs.append(doc)
    
    # increase the counter on every iteration
    counter += 1
    
    # Stop after collecting total_doc_count documents
    if counter >= total_doc_count:
        break

  0%|          | 0/10 [00:00<?, ?it/s]

In [8]:
import pandas as pd
df = pd.DataFrame(docs)


In [9]:
df

Unnamed: 0,article,section,passage
0,Taupo District,History,was not until the 1950s that the region starte...
1,The Bishop Wand Church of England School,History,The Bishop Wand Church of England School Histo...
2,Surface Hill Uniting Church,History,in perpetual reminder that work and worship go...
3,The Electras (band),History,"as its B-side. However, copies of the single, ..."
4,Swanton House,History,it. Lane provided funds for restoration by the...
5,Takashinohama Line,History,Takashinohama Line The Takashinohama Line (高師浜...
6,Tamil Methodist Church,History,Tamil Methodist Church History The church was ...
7,Star Music,History,in order to strengthen its production base and...
8,Terai,History,timber reserves.\nIndian immigration increased...
9,Te Atatū (New Zealand electorate),History,first openly gay member of Parliament.\nWith t...


# Initialize Pinecone Index

The Pinecone index stores vector representations of our historical passages which we can retrieve later using another vector (query vector). To build our vector index, we must first establish a connection with Pinecone. For this, we need an API from Pinecone. You can get one for free from [here](https://app.pinecone.io/), and after that, we initialize the connection as follows:

In [14]:
pip install pinecone-client


Collecting pinecone-client
  Downloading pinecone_client-5.0.1-py3-none-any.whl.metadata (19 kB)
Collecting pinecone-plugin-inference<2.0.0,>=1.0.3 (from pinecone-client)
  Downloading pinecone_plugin_inference-1.1.0-py3-none-any.whl.metadata (2.2 kB)
Collecting pinecone-plugin-interface<0.0.8,>=0.0.7 (from pinecone-client)
  Downloading pinecone_plugin_interface-0.0.7-py3-none-any.whl.metadata (1.2 kB)
Downloading pinecone_client-5.0.1-py3-none-any.whl (244 kB)
Downloading pinecone_plugin_inference-1.1.0-py3-none-any.whl (85 kB)
Downloading pinecone_plugin_interface-0.0.7-py3-none-any.whl (6.2 kB)
Installing collected packages: pinecone-plugin-interface, pinecone-plugin-inference, pinecone-client
Successfully installed pinecone-client-5.0.1 pinecone-plugin-inference-1.1.0 pinecone-plugin-interface-0.0.7
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [48]:
from dotenv import load_dotenv
load_dotenv("C:/Users/adete/OneDrive/Desktop/ironhack bootcamp files/lab-abstractive-question-answering/.env")


True

In [52]:
from dotenv import load_dotenv

# Load the .env file
load_dotenv(dotenv_path)

# Access variables
api_key = os.getenv('PINECONE_API_KEY')



In [53]:
import os
from pinecone import Pinecone

# initialize connection to pinecone (get API key at app.pinecone.io)
api_key = os.environ.get('PINECONE_API_KEY') or 'PINECONE_API_KEY'

# configure client
pc = Pinecone(api_key=api_key)

Now we setup our index specification, this allows us to define the cloud provider and region where we want to deploy our index. You can find a list of all [available providers and regions here](https://docs.pinecone.io/docs/projects).

In [57]:
from pinecone import ServerlessSpec

cloud = os.environ.get('PINECONE_CLOUD') or 'aws'
region = os.environ.get('PINECONE_REGION') or 'us-east-1'

spec = ServerlessSpec(cloud=cloud, region=region)

Now we create a new index. We will name it "abstractive-question-answering" — you can name it anything we want. We specify the metric type as "cosine" and dimension as 768 because the retriever we use to generate context embeddings is optimized for cosine similarity and outputs 768-dimension vectors.

In [58]:
index_name = "wiki-ironhack-qa"

In [59]:
api_key = os.environ.get('PINECONE_API_KEY')


In [60]:
import time

# check if index already exists (it shouldn't if this is first time)
if index_name not in pc.list_indexes():
    # Create the index
    pc.create_index(
        name=index_name,
        metric="cosine",
        dimension=768,  # for bert-base models
        spec=spec
    )
    # Wait for index to be ready
    while not pc.describe_index(index_name).status['ready']:
        time.sleep(1)

# Connect to the index
index = pc.Index(index_name)

# Verify index is empty
stats = index.describe_index_stats()
print(f"Index statistics: {stats}")

Index statistics: {'dimension': 768,
 'index_fullness': 0.0,
 'namespaces': {},
 'total_vector_count': 0}


# Initialize Retriever

Next, we need to initialize our retriever. The retriever will mainly do two things:

- Generate embeddings for all historical passages (context vectors/embeddings)
- Generate embeddings for our questions (query vector/embedding)

The retriever will create embeddings such that the questions and passages that hold the answers to our queries are close to one another in the vector space. We will use a SentenceTransformer model based on Microsoft's MPNet as our retriever. This model performs quite well for comparing the similarity between queries and documents. We can use Cosine Similarity to compute the similarity between query and context vectors generated by this model (Pinecone automatically does this for us).

In [72]:
!pip install sentence-transformers



Collecting sentence-transformers
  Downloading sentence_transformers-3.3.1-py3-none-any.whl.metadata (10 kB)
Downloading sentence_transformers-3.3.1-py3-none-any.whl (268 kB)
Installing collected packages: sentence-transformers
Successfully installed sentence-transformers-3.3.1



[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [76]:
!pip install tf-keras



Collecting tf-keras
  Downloading tf_keras-2.18.0-py3-none-any.whl.metadata (1.6 kB)
Collecting tensorflow<2.19,>=2.18 (from tf-keras)
  Using cached tensorflow-2.18.0-cp312-cp312-win_amd64.whl.metadata (3.3 kB)
Collecting tensorflow-intel==2.18.0 (from tensorflow<2.19,>=2.18->tf-keras)
  Using cached tensorflow_intel-2.18.0-cp312-cp312-win_amd64.whl.metadata (4.9 kB)
Collecting tensorboard<2.19,>=2.18 (from tensorflow-intel==2.18.0->tensorflow<2.19,>=2.18->tf-keras)
  Using cached tensorboard-2.18.0-py3-none-any.whl.metadata (1.6 kB)
Downloading tf_keras-2.18.0-py3-none-any.whl (1.7 MB)
   ---------------------------------------- 0.0/1.7 MB ? eta -:--:--
   ------------------------ --------------- 1.0/1.7 MB 6.3 MB/s eta 0:00:01
   ---------------------------------------- 1.7/1.7 MB 5.2 MB/s eta 0:00:00
Using cached tensorflow-2.18.0-cp312-cp312-win_amd64.whl (7.5 kB)
Using cached tensorflow_intel-2.18.0-cp312-cp312-win_amd64.whl (390.3 MB)
Using cached tensorboard-2.18.0-py3-none-any

  You can safely remove it manually.

[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [78]:
!pip install keras==2.11


Collecting keras==2.11
  Downloading keras-2.11.0-py2.py3-none-any.whl.metadata (1.4 kB)
Downloading keras-2.11.0-py2.py3-none-any.whl (1.7 MB)
   ---------------------------------------- 0.0/1.7 MB ? eta -:--:--
   ------------------------ --------------- 1.0/1.7 MB 7.1 MB/s eta 0:00:01
   ---------------------------------------- 1.7/1.7 MB 7.6 MB/s eta 0:00:00
Installing collected packages: keras
  Attempting uninstall: keras
    Found existing installation: keras 3.5.0
    Uninstalling keras-3.5.0:
      Successfully uninstalled keras-3.5.0
Successfully installed keras-2.11.0


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-intel 2.18.0 requires keras>=3.5.0, but you have keras 2.11.0 which is incompatible.

[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [81]:
!pip install tensorflow==2.11 keras==2.11
!pip install transformers --upgrade



ERROR: Could not find a version that satisfies the requirement tensorflow==2.11 (from versions: 2.16.0rc0, 2.16.1, 2.16.2, 2.17.0rc0, 2.17.0rc1, 2.17.0, 2.17.1, 2.18.0rc0, 2.18.0rc1, 2.18.0rc2, 2.18.0)

[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip
ERROR: No matching distribution found for tensorflow==2.11


Collecting transformers
  Downloading transformers-4.47.1-py3-none-any.whl.metadata (44 kB)
Collecting tokenizers<0.22,>=0.21 (from transformers)
  Downloading tokenizers-0.21.0-cp39-abi3-win_amd64.whl.metadata (6.9 kB)
Downloading transformers-4.47.1-py3-none-any.whl (10.1 MB)
   ---------------------------------------- 0.0/10.1 MB ? eta -:--:--
   ------ --------------------------------- 1.6/10.1 MB 7.6 MB/s eta 0:00:02
   ------------- -------------------------- 3.4/10.1 MB 8.4 MB/s eta 0:00:01
   ----------------- ---------------------- 4.5/10.1 MB 7.7 MB/s eta 0:00:01
   ------------------------ --------------- 6.3/10.1 MB 7.6 MB/s eta 0:00:01
   ---------------------------------- ----- 8.7/10.1 MB 8.4 MB/s eta 0:00:01
   ---------------------------------------- 10.1/10.1 MB 8.7 MB/s eta 0:00:00
Downloading tokenizers-0.21.0-cp39-abi3-win_amd64.whl (2.4 MB)
   ---------------------------------------- 0.0/2.4 MB ? eta -:--:--
   ----------------------------------- ---- 2.1/2.4 MB 1

  You can safely remove it manually.

[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [83]:
import tensorflow as tf
import keras
from transformers import AutoTokenizer, AutoModel

print(f"TensorFlow version: {tf.__version__}")
print(f"Keras version: {keras.__version__}")


TensorFlow version: 2.17.0
Keras version: 3.5.0


In [84]:
!pip uninstall tensorflow keras -y


Found existing installation: tensorflow 2.18.0
Uninstalling tensorflow-2.18.0:
  Successfully uninstalled tensorflow-2.18.0
Found existing installation: keras 2.11.0
Uninstalling keras-2.11.0:
  Successfully uninstalled keras-2.11.0


In [85]:
!pip install tensorflow==2.11 keras==2.11


ERROR: Could not find a version that satisfies the requirement tensorflow==2.11 (from versions: 2.16.0rc0, 2.16.1, 2.16.2, 2.17.0rc0, 2.17.0rc1, 2.17.0, 2.17.1, 2.18.0rc0, 2.18.0rc1, 2.18.0rc2, 2.18.0)

[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip
ERROR: No matching distribution found for tensorflow==2.11


In [86]:
!python -m pip install --upgrade pip

Collecting pip
  Using cached pip-24.3.1-py3-none-any.whl.metadata (3.7 kB)
Using cached pip-24.3.1-py3-none-any.whl (1.8 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 24.2
    Uninstalling pip-24.2:
      Successfully uninstalled pip-24.2
Successfully installed pip-24.3.1


In [87]:
!pip install tensorflow keras

Collecting tensorflow
  Using cached tensorflow-2.18.0-cp312-cp312-win_amd64.whl.metadata (3.3 kB)
Collecting keras
  Downloading keras-3.7.0-py3-none-any.whl.metadata (5.8 kB)
Using cached tensorflow-2.18.0-cp312-cp312-win_amd64.whl (7.5 kB)
Downloading keras-3.7.0-py3-none-any.whl (1.2 MB)
   ---------------------------------------- 0.0/1.2 MB ? eta -:--:--
   ---------------------------------------- 0.0/1.2 MB ? eta -:--:--
   -------- ------------------------------- 0.3/1.2 MB ? eta -:--:--
   ----------------- ---------------------- 0.5/1.2 MB 1.2 MB/s eta 0:00:01
   ------------------------- -------------- 0.8/1.2 MB 1.3 MB/s eta 0:00:01
   ---------------------------------------- 1.2/1.2 MB 1.3 MB/s eta 0:00:00
Installing collected packages: keras, tensorflow
Successfully installed keras-3.7.0 tensorflow-2.18.0


In [88]:
!pip install transformers==4.28.0

Collecting transformers==4.28.0
  Downloading transformers-4.28.0-py3-none-any.whl.metadata (109 kB)
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers==4.28.0)
  Downloading tokenizers-0.13.3.tar.gz (314 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Downloading transformers-4.28.0-py3-none-any.whl (7.0 MB)
   ---------------------------------------- 0.0/7.0 MB ? eta -:--:--
   ------- -------------------------------- 1.3/7.0 MB 7.5 MB/s eta 0:00:01
   ------------------ --------------------- 3.1/7.0 MB 8.0 MB/s eta 0:00:01
   ------------------------------- -------- 5.5/7.0 MB 8.8 MB/s eta 0:00:01
   ---------------------------------------- 7.0/7.0 MB 9.1 MB/s eta 0:00:00
Building wheels f

  error: subprocess-exited-with-error
  
  × Building wheel for tokenizers (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [49 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build\lib.win-amd64-cpython-312\tokenizers
      copying py_src\tokenizers\__init__.py -> build\lib.win-amd64-cpython-312\tokenizers
      creating build\lib.win-amd64-cpython-312\tokenizers\models
      copying py_src\tokenizers\models\__init__.py -> build\lib.win-amd64-cpython-312\tokenizers\models
      creating build\lib.win-amd64-cpython-312\tokenizers\decoders
      copying py_src\tokenizers\decoders\__init__.py -> build\lib.win-amd64-cpython-312\tokenizers\decoders
      creating build\lib.win-amd64-cpython-312\tokenizers\normalizers
      copying py_src\tokenizers\normalizers\__init__.py -> build\lib.win-amd64-cpython-312\tokenizers\normalizers
      creating build\lib.win-amd64-cpython-312\tokenizers\pre_tokenizers
      copying py_src

In [93]:
import torch
from sentence_transformers import SentenceTransformer

# set device to GPU if available
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# load the retriever model from huggingface model hub
retriever = SentenceTransformer('flax-sentence-embeddings/all_datasets_v3_mpnet-base')
retriever.to(device)  # Move model to GPU if available
retriever

RuntimeError: Failed to import transformers.integrations.integration_utils because of the following error (look up to see its traceback):
Failed to import transformers.modeling_tf_utils because of the following error (look up to see its traceback):
Your currently installed version of Keras is Keras 3, but this is not yet supported in Transformers. Please install the backwards-compatible tf-keras package with `pip install tf-keras`.

# Generate Embeddings and Upsert

Next, we need to generate embeddings for the context passages. We will do this in batches to help us more quickly generate embeddings and upload them to the Pinecone index. When passing the documents to Pinecone, we need an id (a unique value), context embedding, and metadata for each document representing context passages in the dataset. The metadata is a dictionary containing data relevant to our embeddings, such as the article title, section title, passage text, etc.

In [95]:
# we will use batches of 64
batch_size = 64

# Create embeddings for the passage_text and include metadata in each batch
for i in tqdm(range(0, len(docs), batch_size)):
    # find end of batch
    i_end = min(i + batch_size, len(docs))
    
    # extract batch
    batch = docs[i:i_end]
    
    # generate embeddings for batch
    embeddings = retriever.encode([doc['passage'] for doc in batch]).tolist()
    
    # create metadata and upsert batch
    metadata = [
        {
            'article': doc['article'],
            'section': doc['section'],
            'text': doc['passage']
        } for doc in batch
    ]
    
    # create unique IDs
    ids = [f"doc_{i + j}" for j in range(len(batch))]
    
    # create upsert list
    to_upsert = list(zip(ids, embeddings, metadata))
    
    # upsert to Pinecone
    index.upsert(vectors=to_upsert)

# check that we have all vectors in index
print("Final index statistics:")
print(index.describe_index_stats())

  0%|          | 0/1 [00:00<?, ?it/s]

NameError: name 'retriever' is not defined

# Initialize Generator

We will use ELI5 BART for the generator which is a Sequence-To-Sequence model trained using the ‘Explain Like I’m 5’ (ELI5) dataset. Sequence-To-Sequence models can take a text sequence as input and produce a different text sequence as output.

The input to the ELI5 BART model is a single string which is a concatenation of the query and the relevant documents providing the context for the answer. The documents are separated by a special token &lt;P>, so the input string will look as follows:

>question: What is a sonic boom? context: &lt;P> A sonic boom is a sound associated with shock waves created when an object travels through the air faster than the speed of sound. &lt;P> Sonic booms generate enormous amounts of sound energy, sounding similar to an explosion or a thunderclap to the human ear. &lt;P> Sonic booms due to large supersonic aircraft can be particularly loud and startling, tend to awaken people, and may cause minor damage to some structures. This led to prohibition of routine supersonic flight overland.

More detail on how the ELI5 dataset was built is available [here](https://arxiv.org/abs/1907.09190) and how ELI5 BART model was trained is available [here](https://yjernite.github.io/lfqa.html).

Let's initialize the BART model using transformers.

In [96]:
from transformers import BartTokenizer, BartForConditionalGeneration

# load bart tokenizer and model from huggingface
tokenizer = BartTokenizer.from_pretrained('vblagoje/bart_lfqa')
generator = BartForConditionalGeneration.from_pretrained('vblagoje/bart_lfqa').to(device)

tokenizer_config.json:   0%|          | 0.00/27.0 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.32k [00:00<?, ?B/s]

  chat_template = self.get_chat_template(chat_template, tools)


model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

NameError: name 'device' is not defined

All the components of our abstract QA system are complete and ready to be queried. But first, let's write some helper functions to retrieve context passages from Pinecone index and to format the query in the way the generator expects the input.

In [97]:
def query_pinecone(query, top_k, format_results=True):
    # generate embeddings for the query
    xq = retriever.encode(query).tolist()
    
    # search pinecone index for context passage with the answer
    xc = index.query(
        vector=xq,
        top_k=top_k,
        include_metadata=True
    )
    
    if format_results:
        formatted_results = []
        for match in xc['matches']:
            formatted_results.append({
                'score': round(match['score'], 3),
                'article': match['metadata']['article'],
                'text': match['metadata']['text']
            })
        return formatted_results
    
    return xc

In [98]:
def format_query(query, context):
    # extract passage_text from Pinecone search result and add the <P> tag
    # Check if context is already a list or a Pinecone response
    if isinstance(context, dict) and 'matches' in context:
        # If it's a Pinecone response
        context = [f"<P> {m['metadata']['text']}" for m in context['matches']]
    else:
        # If it's already a list
        context = [f"<P> {m['metadata']['text']}" for m in context]
    
    # concatenate all context passages
    context = " ".join(context)
    
    # concatenate the query and context passages
    query = f"Question: {query} Context: {context}"
    
    return query

Let's test the helper functions. We will query the Pinecone index function we created earlier with the `query_pinecone` to get context passages and pass them to the `format_query` function.

In [100]:
query = "when was the first electric power system built?"
result = query_pinecone(query, top_k=1)
result

NameError: name 'retriever' is not defined

In [101]:
result

NameError: name 'result' is not defined

In [102]:
from pprint import pprint

In [38]:
query = format_query(query, result)
pprint(query)

('Question: Question: what was NASAs most expensive project? Context: <P> was '
 'not until the 1950s that the region started to develop, with forestry and '
 'the construction of the Wairakei geothermal power station. <P> as its '
 'B-side. However, copies of the single, which were issued on the subsidiary '
 "Date Records, were recalled as it was discovered that the Electras' name was "
 'copyrighted by another group who released an album in 1961 that included '
 'future Massachusetts Senator and Secretary of State, John Kerry.  Without '
 "the band's input, Kendrick changed their name to 'Twas Brillig, and "
 'rereleased "Dirty Old Man" in February 1967.  Elfving was drafted for '
 'service in Vietnam soon after, which consequently resulted in the group '
 'losing advertising support from Columbia.  The band recorded two more '
 'singles under the moniker, but <P> in order to strengthen its production '
 'base and gain entry into the Metro Manila market. It first signed a 3-year '
 

The output looks great. Now let's write a function to generate answers.

In [103]:
def generate_answer(query):
    # tokenize the query to get input_ids
    inputs = tokenizer([query], max_length=1024, return_tensors="pt").to(device)
    # use generator to predict output ids
    ids = generator.generate(inputs["input_ids"], num_beams=2, min_length=20, max_length=40)
    # use tokenizer to decode the output ids
    answer = tokenizer.batch_decode(ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
    return pprint(answer)

In [104]:
generate_answer(query)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


NameError: name 'device' is not defined

As we can see, the generator used the provided context to answer our question. Let's run some more queries.

In [105]:
def format_query(query, context):
    # Format context with <P> tags
    formatted_context = [f"<P> {m['text']}" for m in context]
    
    # Join all context passages
    joined_context = " ".join(formatted_context)
    
    # Create final query string
    formatted_query = f"Question: {query} Context: {joined_context}"
    
    return formatted_query

# Now use it in sequence
query = "How was the first wireless message sent?"

# Get context from Pinecone
context = query_pinecone(query, top_k=5)

# Handle both possible return types
if isinstance(context, dict) and "matches" in context:
    formatted_query = format_query(query, context["matches"])
else:
    formatted_query = format_query(query, context)

# Generate answer
answer = generate_answer(formatted_query)

NameError: name 'retriever' is not defined

In [106]:
print("Context structure:", context[0] if isinstance(context, list) else context["matches"][0] if isinstance(context, dict) else "Unknown structure")

NameError: name 'context' is not defined

To confirm that this answer is correct, we can check the contexts used to generate the answer.

In [109]:
for doc in context:
    print(doc["text"], end='\n---\n')

NameError: name 'context' is not defined

In this case, the answer looks correct. If we ask a question and no relevant contexts are retrieved, the generator will typically return nonsensical or false answers, like with this question about COVID-19:

In [108]:
for doc in context:
    print(doc["text"], end='\n---\n')

NameError: name 'context' is not defined

Let’s finish with a final few questions.

In [107]:
query = "what was NASAs most expensive project?"
context = query_pinecone(query, top_k=3)
query = format_query(query, context)
generate_answer(query)

NameError: name 'retriever' is not defined

As we can see, the model can generate some decent answers.