[![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/generation/aws/sagemaker/sagemaker-pinecone-rag.ipynb)

# Module 02 - Query data and generate a simple RAG response


This module contains notebook code to:
* Query the data from Pinecone Index
* Perform semantic search
* Generate RAG response using Amazon Bedrock 


*******************************************************************************************************************

### Install required libraries

In [68]:
%pip install --quiet --upgrade \
    pinecone \
    boto3 \
    botocore

Note: you may need to restart the kernel to use updated packages.


To begin, we will initialize Amazon Bedrock and Pinecone that we'll need to use throughout the walkthrough.

In [69]:
# Standard library imports
import getpass
import json
import getpass

# Pinecone library
from pinecone import Pinecone

# AWS Library imports
import boto3
from botocore.config import Config

### Initialize Pinecone

In [None]:
PINECONE_API_KEY = getpass.getpass("Enter Pinecone API Key")
pc = Pinecone(api_key=PINECONE_API_KEY, source_tag="pinecone:agentic_ai_with_pinecone_and_aws:notebooks:2_data_query_pipeline")

In [None]:
index_name = 'agentic-ai-with-pinecone-and-aws'
namespace = '__default__' # Namespaces are used to organize data in Pinecone, like for multi-tenant applications. We'll use the default namespace for this workshop.

### Initialize Bedrock

If you're running this workshop as part of a Pinecone-hosted event, your environment already has access to `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`, so there is no need to set these up. The Bedrock initialization code below will automatically reference them.

In [72]:
config = Config(connect_timeout=5, read_timeout=60, retries={"total_max_attempts": 20, "mode": "adaptive"})
region = 'us-east-1'

bedrock = boto3.client(
                service_name='bedrock-runtime',
                region_name=region,
                endpoint_url=f'https://bedrock-runtime.{region}.amazonaws.com',
                                    config=config)

embedding_model_id = "amazon.titan-embed-text-v2:0"
generation_model_id = "anthropic.claude-3-haiku-20240307-v1:0"

### Initialize Pinecone Index and describe stats

In [74]:
index = pc.Index(index_name)
print(index.describe_index_stats())

{'dimension': 1024,
 'index_fullness': 0.0,
 'metric': 'cosine',
 'namespaces': {'agentic-rag': {'vector_count': 880}},
 'total_vector_count': 880,
 'vector_type': 'dense'}


### Query the data

Now we're ready begin querying our LLM with a **R**etrieval **A**ugmented **G**eneration (RAG) pipeline. Let's see how this will work step-by-step first.

- Step 1. Generate vector embedding of the text query using Amazon Bedrock
- Step 2. Perform semantic search, retrieving the records that are most similar in meaning and context to the query
- Step 3. Rerank the results to further refine them
- Step 4. Results are combined with the original user query to create a prompt for the LLM
- Step 5. This curated context is then used to generate output, using the context to drive a more accurate and relevant response

TODO: IMAGE

#### Step 1: Generate vector embedding of the query

First, we define a function to create our _query embedding_ so we can query the Pinecone index.

In [75]:
def embed_query(query: str) -> float:
    """
    Generate text embedding by using Amazon Bedrock.
    Args:
        query: string of text to embed.
    Returns:
        dict: embedding in float type.
    """

    body = json.dumps({"inputText": query})

    response = bedrock.invoke_model(
        body=body,
        modelId=embedding_model_id,
    )

    response_body = json.loads(response.get('body').read())
    embedding = response_body.get('embedding')

    return embedding

#### Step 2: Perform semantic search

Next, we define a function to perform semantic search against the Pinecone index. This function will use the query embedding to retrieve the records that are most similar in meaning and context to the query.

In [76]:
def semantic_search(query: list) -> list:
    """
    Query the Pinecone index with an optional rerank.

    Args:
        query (str): The query string.

    Returns:
        list: A list of hit records from the index.
    """
    query_embedding = embed_query(query)

    search_results = index.query(
        namespace=namespace,
        vector=query_embedding,
        fields=["chunk_text"],
        top_k=20
    )

    document_ids = []
    
    for result in search_results['matches']:
        document_ids.append(result['id'])

    fetch_results = index.fetch(ids=document_ids, namespace=namespace)

    documents_retrieved = []

    for document_id in document_ids:
        text = fetch_results.vectors[document_id]['metadata']['chunk_text']
        documents_retrieved.append({"id": document_id, "text": text})

    return documents_retrieved

Now, we can query the database with the text query.

In [77]:
query = "Changes in Compaq's product offerings and their impacts on sales"

documents_retrieved = semantic_search(query)

And let's inspect the first three records retrieved:

In [78]:
documents_retrieved[:3]

[{'id': 'compaq_1999#text46',
  'text': "ts, which is related to\ncomponent costs, is a critical variable in predicting customer decisions to move\nto  the  next  generation of products. Because of the lead times associated with\nits  volume  production,  should  Compaq  be unable to gauge the rate of product\ntransitions  accurately,  there  could be an adverse impact on inventory levels,\ncash,  and  profitability.  In  addition,  as a result of the Tandem and Digital\nacquisitions,  Compaq  is  engaged  in  direct  sales  of  computer systems with\nsoftware  developed  to meet customers' specific needs.  The long-term nature of\nsuch  contracts  exposes Compaq to risks associated with changing customer needs\nand  expectations.\n\n     Product  Transitions.  In  each product cycle, Compaq confronts the risk of\ndelays  in production that could impact sales of newer products while it manages\nthe inventory of older products and facilitates the sale of older inventory held\nby  resell

#### Step 3: Rerank the results

We could further refine these 20 results by reranking them. To rerank the results, we send the original query and the results to a reranking model. The reranking model scores the results based on their semantic relevance to the query and returns a new, more accurate ranking. This approach is one of the simplest methods for improving quality in retrieval-augmented generation (RAG) pipelines.

In [79]:
def rerank_results(query: str, docs: list):
    reranked_results = pc.inference.rerank(
        model="bge-reranker-v2-m3",
        query=query,
        documents=docs,
        top_n=3,
        rank_fields=["text"],
        return_documents=True,
        parameters={
            "truncate": "END"
        }
    )

    return reranked_results

In [80]:
reranked_documents = rerank_results(query, documents_retrieved)

In [81]:
reranked_documents.data

[{
     index=12,
     score=0.91398406,
     document={
         id='compaq_2001#text43',
         text=" FORM FACTORS INTRODUCE UNCERTAINTY INTO THE MARKET. The increasing\nreliance on the Internet is creating new dynamics in the computer industry. As\nbusinesses and consumers turn to the Internet, speed and connectivity may become\nmore critical than stand-alone power for client devices. Compaq is introducing a\nnew generation of Internet devices built around simple form factors, customized\nfunctions and wireless mobility. Compaq's products will vie for customer\nacceptance and market share against those of computer companies as well as\nconsumer electronics and telecommunications companies. Hardware products, which\nare Compaq's traditional area of strength, may become less important than\nservice offerings in attracting and retaining customers. In addition, as new\nform factors are adopted, sales of traditional personal computers may decline.\n\n      CHANGES IN THE SERVICES BUSI

#### Step 4: Augmentation

Now, we'll combine the results with the original user query to create a prompt for the LLM. Let's format them into a search template using techniques Claude has been trained with and add the formatted descriptions to a prompt. We'll use this prompt to send the search results as context to the generation step.

In [82]:
# Formatting search results
def format_results(extracted: list[str]) -> str:
        result = "\n".join(
            [
                f'<item index="{i+1}">\n<page_content>\n{r["document"]["text"]}\n</page_content>\n</item>'
                for i, r in enumerate(extracted.data)
            ]
        )
    
        return f"\n<search_results>\n{result}\n</search_results>"

def augment_prompt(results_list, question):
    return f"""\n\nHuman: {format_results(results_list)} Using the search results provided within the <search_results></search_results> tags, please answer the following question <question>{question}</question>. Do not reference the search results in your answer.\n\nAssistant:"""

augmented_prompt = augment_prompt(reranked_documents, query)

print(augmented_prompt)



Human: 
<search_results>
<item index="1">
<page_content>
 FORM FACTORS INTRODUCE UNCERTAINTY INTO THE MARKET. The increasing
reliance on the Internet is creating new dynamics in the computer industry. As
businesses and consumers turn to the Internet, speed and connectivity may become
more critical than stand-alone power for client devices. Compaq is introducing a
new generation of Internet devices built around simple form factors, customized
functions and wireless mobility. Compaq's products will vie for customer
acceptance and market share against those of computer companies as well as
consumer electronics and telecommunications companies. Hardware products, which
are Compaq's traditional area of strength, may become less important than
service offerings in attracting and retaining customers. In addition, as new
form factors are adopted, sales of traditional personal computers may decline.

      CHANGES IN THE SERVICES BUSINESS COULD ADVERSELY AFFECT EARNINGS. Compaq's
Global Servi

#### Step 5: Generation with Claude

Finally, let's ask the original user's question and get our answer from Claude.

In [83]:
def generate_answer(prompt: str) -> str:
    body = json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "messages": [
            {
                "role": "user",
                "content": augmented_prompt
            }
        ],
        "max_tokens": 1000,
        "temperature": 0.1
    })

    response = bedrock.invoke_model(
        body=body,
        modelId=generation_model_id,
    )

    response_body = json.loads(response.get('body').read())

    # Extract the response text from Claude's response format
    response_text = response_body['content'][0]['text']
    
    return response_text

print(generate_answer(augmented_prompt))

Compaq is facing several changes in its product offerings and their potential impacts on sales:

1. The increasing reliance on the Internet is creating new dynamics in the computer industry, where speed and connectivity may become more critical than stand-alone power for client devices. Compaq is introducing a new generation of Internet devices with simple form factors, customized functions, and wireless mobility. These new product offerings may vie for customer acceptance and market share against those of computer companies, consumer electronics, and telecommunications companies.

2. As new form factors are adopted, sales of traditional personal computers may decline. This shift in customer preferences could make Compaq's traditional hardware products less important than its service offerings in attracting and retaining customers.

3. The trend for design and implementation of systems is moving from proprietary environments to industry-standard products. Compaq needs to retrain its se

### Putting it all together

Let's put it all together! Below, we search, rerank, augment the prompt with the results, and generate an answer to complete a simple RAG pipeline.

In [85]:
query = "Changes in Compaq's product offerings and their impacts on sales"

# Search
documents_retrieved = semantic_search(query)

# Rerank
reranked_documents = rerank_results(query, documents_retrieved)

# Augment
augmented_prompt = augment_prompt(reranked_documents, query)

# Generate
answer = generate_answer(augmented_prompt)

print(answer)

Compaq is facing several changes in its product offerings and their potential impacts on sales:

1. The increasing reliance on the Internet is creating new dynamics in the computer industry, where speed and connectivity may become more critical than stand-alone power for client devices. Compaq is introducing a new generation of Internet devices with simple form factors, customized functions, and wireless mobility. These new form factors may lead to a decline in sales of traditional personal computers.

2. Compaq's traditional area of strength has been hardware products, but the company recognizes that service offerings may become more important in attracting and retaining customers as the industry shifts.

3. The trend for design and implementation of systems is moving from proprietary environments to industry-standard products. This requires Compaq to retrain its services personnel to compete in the new environment, and there is no assurance that the company will be able to successful