# Task 3: Test the Query Reformulation process supported by Amazon Bedrock Knowledge Bases

In this task you understand and test the Query Reformulation process supported by Amazon Bedrock Knowledge Bases. With query reformulation, you take a complex input prompt and break it down into multiple sub-queries. These sub-queries separately go through their own retrieval steps for relevant chunks. The resulting chunks are then pooled and ranked together before passing them to the Foundational Model to generate a response. Query reformulation is another tool that helps in increasing the accuracy for complex queries that your application may face in production.

## Task 3.1: Setup the environment

In this task, you setup the notebook environment by import the necessary packages.

1. Run the following two code cells to import the necessary packages and setup your environment:

In [None]:
import boto3
import botocore
import os
import json
import logging
import os

# confirm we are at boto3 version 1.34.143 or above
print(boto3.__version__)

In [None]:
#Clients
s3_client = boto3.client('s3')
sts_client = boto3.client('sts')
session = boto3.session.Session()
region =  session.region_name
account_id = sts_client.get_caller_identity()["Account"]
bedrock_agent_client = boto3.client('bedrock-agent')
bedrock_agent_runtime_client = boto3.client('bedrock-agent-runtime') 
logging.basicConfig(format='[%(asctime)s] p%(process)s {%(filename)s:%(lineno)d} %(levelname)s - %(message)s', level=logging.INFO)
logger = logging.getLogger(__name__)
region, account_id

2. Run the following code cell to verify the ID for the existing Knowledge Base in Amazon Bedrock:

In [3]:
import botocore

session = boto3.Session()
bedrock_client = session.client('bedrock-agent')

try:
    response = bedrock_client.list_knowledge_bases(
        maxResults=1  # We only need to retrieve the first Knowledge Base
    )
    knowledge_base_summaries = response.get('knowledgeBaseSummaries', [])

    if knowledge_base_summaries:
        kb_id = knowledge_base_summaries[0]['knowledgeBaseId']
        print(f"Knowledge Base ID: {kb_id}")
    else:
        print("No Knowledge Base summaries found.")
        
except botocore.exceptions.ClientError as e:
    print(f"Error: {e}")

3. Run the following code cell to define the FM to be used for this notebook:

In [None]:
# Define FM to be used for generations 
foundation_model ='anthropic.claude-3-sonnet-20240229-v1:0'  # we will be using Anthropic Claude 3 Sonnet throughout the notebook

## Task 3.2: Demonstrate Query Reformulation

In this task, you investigate a simple and a more complex query that could benefit from query reformulation and see how it affects the generated responses.

#### Complex prompt

To demonstrate the functionality, lets take a look at a query that has a few asks being made about some information contained in the AnyCompany 10K financial document. This query contains a few asks that are not semantically related. When this query is embedded during the retrieval step, some aspects of the query may become diluted and therefore the relevant chunks returned may not address all components of this complex query.

To query your Knowledge Base and generate a response, you use the *RetrieveAndGenerate* API call. To use the query reformulation feature, the following information is included in your knowledge base:

```
'orchestrationConfiguration': {
        'queryTransformationConfiguration': {
            'type': 'QUERY_DECOMPOSITION'
        }
    }
```

<i aria-hidden="true" class="fas fa-sticky-note" style="color:#563377"></i> **Note:** The output response structure is same as the normal *retrieve_and_generate* without query reformulation.

### Task 3.2.1: Generate results without Query Reformulation

In this task, you see how the generated result looks like for the following query without using query reformulation. You use the following query: *Where is the AnyCompany company waterfront building located and how does the whistleblower scandal hurt the company and its image?*

4. Run the following three code cells to generated results without query reformulation:

In [4]:
query = "What is AnyCompany tower and how does the whistleblower scandal hurt the company and its image?"

In [None]:
response_ret = bedrock_agent_runtime_client.retrieve_and_generate(
    input={
        "text": query
    },
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            'knowledgeBaseId': kb_id,
            "modelArn": "arn:aws:bedrock:{}::foundation-model/{}".format(region, foundation_model),
            "retrievalConfiguration": {
                "vectorSearchConfiguration": {
                    "numberOfResults":5
                } 
            }
        }
    }
)


# generated text output

print(response_ret['output']['text'],end='\n'*2)

In [None]:
response_without_qr = response_ret['citations'][0]['retrievedReferences']
print("# of citations or chunks used to generate the response: ", len(response_without_qr))
def citations_rag_print(response_ret):
#structure 'retrievalResults': list of contents. Each list has content, location, score, metadata
    for num,chunk in enumerate(response_ret,1):
        print(f'Chunk {num}: ',chunk['content']['text'],end='\n'*2)
        print(f'Chunk {num} Location: ',chunk['location'],end='\n'*2)
        print(f'Chunk {num} Metadata: ',chunk['metadata'],end='\n'*2)

citations_rag_print(response_without_qr) 

As seen from the above citations, your retrieval with the complex query did not return any chunks relevant to the building, instead focusing on embeddings that was most similar to the whistleblower incident.

This may indicate the embedding of the query resulted in some dilution of the semantics of that part of the query.

### Task 3.2.2: Generate results with Query Reformulation

In this task, you see how query reformulation can benefit the more aligned context retrieval, which in turn, will enhance the accuracy of response generation.

5. Run the following two code cells to generated results with query reformulation:

In [None]:
response_ret = bedrock_agent_runtime_client.retrieve_and_generate(
    input={
        "text": query
    },
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            'knowledgeBaseId': kb_id,
            "modelArn": "arn:aws:bedrock:{}::foundation-model/{}".format(region, foundation_model),
            "retrievalConfiguration": {
                "vectorSearchConfiguration": {
                    "numberOfResults":5
                } 
            },
            'orchestrationConfiguration': {
                'queryTransformationConfiguration': {
                    'type': 'QUERY_DECOMPOSITION'
                }
            }
        }
    }
)


# generated text output

print(response_ret['output']['text'],end='\n'*2)

Let's take a look at the retrieved chunks with query reformulation:

In [None]:
response_with_qr = response_ret['citations'][0]['retrievedReferences']
print("# of citations or chunks used to generate the response: ", len(response_with_qr))


citations_rag_print(response_with_qr)

As you can see, with query reformulation turned on, the chunks that have been retrieved now provide context for the whistlblower scandal and the location of the waterfront property components.

<i aria-hidden="true" class="far fa-thumbs-up" style="color:#008296"></i> **Task complete:** You have completed this notebook. To move to the next part of the lab, do the following:

- Close this notebook file.
- Return to the lab session and continue with Task 4.