# Retrieval and Generation with Bedrock Foundational Models

### Overview  
This notebook demonstrates how to perform retrieval-augmented generation (RAG) using Amazon Bedrock's foundational models. It covers retrieving relevant documents from a knowledge base and generating responses based on the retrieved context.

### Build your own Retrieval Augmented Generation (RAG) system
When constructing your own retrieval augmented generation (RAG) system, you can leverage a retriever system and a generator system. The retriever can be an embedding model that identifies the relevant chunks from the vector database based on similarity scores. The generator can be a Large Language Model (LLM) that utilizes the model's capability to answer questions based on the retrieved results (also known as chunks). In the following sections, we will provide additional tips on how to optimize the prompts for your RAG system.

In [1]:
import advanced_rag_utils
import json
import importlib

# Reload module
importlib.reload(advanced_rag_utils)

# Re-import all functions
from advanced_rag_utils import *

from datetime import datetime, timedelta, UTC

notebook_start_time = datetime.now(UTC)
# Load variables from JSON file
with open("../variables.json", "r") as f:
    variables = json.load(f)

variables

{'accountNumber': '270597685972',
 'regionName': 'us-west-2',
 'collectionArn': 'arn:aws:aoss:us-west-2:270597685972:collection/3ethft3xms9as2092ulg',
 'collectionId': '3ethft3xms9as2092ulg',
 'vectorIndexName': 'ws-index-',
 'bedrockExecutionRoleArn': 'arn:aws:iam::270597685972:role/advanced-rag-workshop-bedrock_execution_role-us-west-2',
 's3Bucket': '270597685972-us-west-2-advanced-rag-workshop',
 'kbFixedChunk': 'SN9KSOQPOV',
 'kbSemanticChunk': 'KMZYCTNSWW',
 'kbHierarchicalChunk': 'V8EJKFPYTK',
 'kbCustomChunk': 'G8P2D7M28S',
 'sagemakerLLMEndpoint': 'endpoint-llama-3-2-3b-instruct-2025-05-02-18-22-06'}

## RAG with a simple question

##### We will ask the question "In text-to-sql, what are the stages in data generation process?" <br/>
##### We should expect a response from a PDF shown below that includes the three stages shown in picture below.
![Image](./image01.png)

### Configuration

In [2]:
# Knowledge Base ID - Choose from different chunking strategies (Fixed, Hierarchical, or Semantic)
kb_id = variables["kbFixedChunk"] 

# Get the Bedrock Model ARN
model_id = get_model_arn(
    base_model_id="us.amazon.nova-lite-v1:0",
    #base_model_id="us.amazon.nova-pro-v1:0",
    account_number=variables['accountNumber'],
    region_name=variables['regionName']
)

# Number of relevant documents to retrieve for RAG
number_of_results = 5

# Create default generation configuration
generation_config = get_default_generation_config(
    max_tokens=4096,
    temperature=0,
    top_p=0.5
)

### Retrieve and Generate with a simple query

In [3]:
# Define the query
# recollect in notebook 1.3.1 we executed the same query ? 
query  = "Who is the CEO, CFO, and CTO of Amazon? While answering the question, only use the data in context. If for any part of the question, you dont find the information in the context, please say I dont know for that part of the question."
# Perform retrieval-augmented generation (RAG)
response = retrieve_and_generate(
    query=query,
    kb_id=kb_id,
    model_id=model_id,
    number_of_results=number_of_results,
    generation_config=generation_config,
    region_name=variables['regionName']
)

# Display the results with citations
display_rag_results(response, show_citations=True)

----------------- Answer ---------------------
The CEO of Amazon is Andrew R. Jassy. The CFO of Amazon is Brian T. Olsavsky. I don't know who the CTO of Amazon is.

----------------- Citations ------------------
{
  "ResponseMetadata": {
    "RequestId": "1ae2f497-502d-46b2-9323-b663283184c4",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Sat, 03 May 2025 02:37:05 GMT",
      "content-type": "application/json",
      "content-length": "6063",
      "connection": "keep-alive",
      "x-amzn-requestid": "1ae2f497-502d-46b2-9323-b663283184c4"
    },
    "RetryAttempts": 0
  },
  "citations": [
    {
      "generatedResponsePart": {
        "textResponsePart": {
          "span": {
            "end": 36,
            "start": 0
          },
          "text": "The CEO of Amazon is Andrew R. Jassy"
        }
      },
      "retrievedReferences": [
        {
          "content": {
            "text": "We promptly make available on this website, free of charge, the reports that

### Comparison between chunking strategies: Fixed vs Semantic

##### Now, Let's ask a more nuanced question that needs to extract information from a table in the PDF. Also, let's ask it to do some analysis. <br/>
##### We will also compare the response quality when you use fixed size chunking vs Semantic chunking.
![image02](image02.png)

#### A nuanced query with a Fixed-sized chunking strategy

##### We will ask question that should answer how net income changed rom 2022 to 2023 to 20234.
![image03](image03.png)

In [4]:
# Configuration for fixed chunking strategy
kb_id_fixed = variables["kbFixedChunk"]

# Model ID remains the same
model_id = get_model_arn(
    base_model_id="us.amazon.nova-lite-v1:0",
    account_number=variables['accountNumber'],
    region_name=variables['regionName']
)

In [5]:
# Define the query for comparing net income changes
query = "In CONSOLIDATED STATEMENTS OF CASH FLOWS, How much did net income change in years 2022, 2023, 2024?"
# Perform RAG with fixed chunking strategy
response_fixed = retrieve_and_generate(
    query=query,
    kb_id=kb_id_fixed,
    model_id=model_id,
    number_of_results=number_of_results,
    generation_config=generation_config,
    region_name=variables['regionName']
)

# Display the results
display_rag_results(response_fixed)

----------------- Answer ---------------------
Based on the retrieved results, the net income for the years 2022, 2023, and 2024 are as follows:

- 2022: $(2,722) million - 2023: $30,425 million - 2024: $59,248 million The net income increased by $33,147 million from 2022 to 2023 and by $28,823 million from 2023 to 2024.

----------------- Citations ------------------
{
  "ResponseMetadata": {
    "RequestId": "4153347b-4c4e-46b2-924d-4e9aa80303b4",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Sat, 03 May 2025 02:37:13 GMT",
      "content-type": "application/json",
      "content-length": "7015",
      "connection": "keep-alive",
      "x-amzn-requestid": "4153347b-4c4e-46b2-924d-4e9aa80303b4"
    },
    "RetryAttempts": 0
  },
  "citations": [
    {
      "generatedResponsePart": {
        "textResponsePart": {
          "span": {
            "end": 123,
            "start": 0
          },
          "text": "Based on the retrieved results, the net income for the yea

#### The response above might not be accurate with what it should be.The accurate response should be:

> Year 2022 to Year 2023: \\$33,147 increase<br/>
Year 2023 to Year 2024: \\$28,823 increase 

#### Now Let's execute the same question while using the KB with Semantic Chunking.

In [6]:
# Configuration for semantic chunking strategy
kb_id_semantic = variables["kbSemanticChunk"]

In [7]:
# Enhance the query to request explanation of the calculation
query_with_explanation = "In CONSOLIDATED STATEMENTS OF CASH FLOWS, How much did net income change in years 2022, 2023, 2024? Show me how you did the math."
# Perform RAG with semantic chunking strategy
response_semantic = retrieve_and_generate(
    query=query_with_explanation,
    kb_id=kb_id_semantic,
    model_id=model_id,
    number_of_results=number_of_results,
    generation_config=generation_config,
    region_name=variables['regionName']
)

# Display the results
display_rag_results(response_semantic)

----------------- Answer ---------------------
Answer: Here is the change in net income for the years 2022, 2023, and 2024:

- 2022: Net income was $33,364 million in 2021 and $-2,722 million in 2022. So the change was $33,364 - (-2,722) = $36,086 million.
- 2023: Net income was $-2,722 million in 2022 and $30,425 million in 2023. So the change was $30,425 - (-2,722) = $33,147 million.
- 2024: Net income was $30,425 million in 2023 and $59,248 million in 2024. So the change was $59,248 - 30,425 = $28,823 million.

So the changes in net income for the years 2022, 2023, and 2024 were $36,086 million, $33,147 million, and $28,823 million, respectively.

----------------- Citations ------------------
{
  "ResponseMetadata": {
    "RequestId": "e71dff20-d667-4e2f-aa36-69cd02d42295",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Sat, 03 May 2025 02:37:18 GMT",
      "content-type": "application/json",
      "content-length": "6000",
      "connection": "keep-alive",
      "x

Compare the above results with the accurate response that should be:
> Year 2022 to Year 2023: \\$33,147 increase <br/>
> Year 2023 to Year 2024: \\$28,823 increase

As you can see here, Semantic Chunking was able to deliver accurate response as compared to Fixed Size chunking.

## Improve RAG quality with Enhanced Prompts

### Importance of Prompt Engineering
Prompt engineering refers to the practice of optimizing textual input to a large language model (LLM) to improve output and receive the responses you want. Prompting helps an LLM perform a wide variety of tasks, including classification, question answering, code generation, creative writing, and more. The quality of prompts that you provide to a LLM can impact the quality of the model's responses. <br/>
 

### Useful techniques to improve prompts for Amazon Nova models
Please refer [link](https://docs.aws.amazon.com/nova/latest/userguide/prompting.html) for the best practice of prompt engineering with Amazon Nova models. Fllowings are a few highlights:
* Create precise prompts. Provide contextual information, speficy the output format and style, and provide clear prompt sections.
* Use system propmts to define how the model will repond.
* Give Amazon Nova time to think. For example, add ```"Think step-by-step."``` at the end of your query.
* Provide examples.

### Tips for using prompts in RAG
* Provide Prompt Template: As with other functionalities, enhancing the system prompt can be beneficial. You can define the RAG Systems description in the system prompt, outlining the desired persona and behavior for the model.
* Use Model Instructions: Additionally, you can include a dedicated ```"Model Instructions:"``` section within the system prompt, where you can provide specific guidelines for the model to follow. For instance, you can list instructions such as: ```In this example session, the model has access to search results and a user's question, its job is to answer the user's question using only information from the search results.```
* Avoid Hallucination by restricting the instructions: Bring more focus to instructions by clearly mentioning "DO NOT USE INFORMATION THAT IS NOT IN SEARCH RESULTS!" as a model instruction so the answers are grounded in the provided context.


#### Without a Prompt Template

In [8]:
# Define the query about Amazon's financial results
query = "Show me the amazon financial results for 2023"

# Perform RAG without prompt template
response_no_template = retrieve_and_generate(
    query=query,
    kb_id=kb_id,
    model_id=model_id,
    number_of_results=number_of_results,
    generation_config=generation_config,
    region_name=variables['regionName']
)

# Display the results
display_rag_results(response_no_template)

----------------- Answer ---------------------
Based on the obtained results, Amazon's net sales for 2023 were $574.36 billion, with an operating income of $33.36 billion and a net income of $30.43 billion.

----------------- Citations ------------------
{
  "ResponseMetadata": {
    "RequestId": "1ba5c2d4-8a80-47c9-ba73-3601a80e628f",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Sat, 03 May 2025 02:37:21 GMT",
      "content-type": "application/json",
      "content-length": "5336",
      "connection": "keep-alive",
      "x-amzn-requestid": "1ba5c2d4-8a80-47c9-ba73-3601a80e628f"
    },
    "RetryAttempts": 0
  },
  "citations": [
    {
      "generatedResponsePart": {
        "textResponsePart": {
          "span": {
            "end": 158,
            "start": 0
          },
          "text": "Based on the obtained results, Amazon's net sales for 2023 were $574.36 billion, with an operating income of $33.36 billion and a net income of $30.43 billion"
        }
    

#### Using a Prompt Template

In [9]:
# Define a prompt template for financial analysis
prompt_template = """
You are a professional financial analyst. 
Based on the retrieved content from Amazon's 10-K filings, provide clear, concise, and insightful answers to user questions. 
When summarizing financial results, respond in bullet points highlighting key metrics, trends, and takeaways. 
Ensure your answers are accurate, data-driven, and easy to understand.
Format the output as Markdown document.

$Query$
Resource: $search_results$
"""

# Perform RAG with the prompt template
response_with_template = retrieve_and_generate(
    query=query,
    kb_id=kb_id,
    model_id=model_id,
    number_of_results=number_of_results,
    generation_config=generation_config,
    prompt_template=prompt_template,
    region_name=variables['regionName']
)

# Display the results as Markdown
# display_rag_results(response_with_template, format_as_markdown=True)
print('----------------- Answer ---------------------')
from IPython.display import display, Markdown
display(Markdown(response_with_template['output']['text'].replace("$", "USD ")))

----------------- Answer ---------------------


# Amazon Financial Results for 2023

## Overview

Amazon's financial results for 2023 reflect the company's performance across various segments, including North America, International, and AWS (Amazon Web Services). Below are the key metrics, trends, and takeaways from Amazon's 2023 financial results.

## Key Metrics

- **Net Sales**: USD 574.79 billion
- **Operating Income**: USD 30.43 billion
- **Net Income**: USD 30.43 billion
- **Earnings Per Share (EPS)**: USD 6.09
- **Free Cash Flow**: USD 44.94 billion

## Trends

- **Revenue Growth**: Amazon's net sales grew by 9% compared to 2022, driven by strong performance in AWS and International segments.
- **Operating Income**: Operating income increased by 11% year-over-year, reflecting improved operational efficiency and cost management.
- **Net Income**: Net income saw a significant increase of 16% compared to 2022, primarily due to higher operating income and lower interest expenses.
- **Free Cash Flow**: Free cash flow grew by 12% in 2023, indicating strong cash generation capabilities.

## Takeaways

- **AWS Performance**: AWS continued to be a major driver of Amazon's revenue growth, with net sales increasing by 12% in 2023.
- **International Segment**: The International segment showed robust growth, with net sales rising by 15% compared to 2022.
- **Cost Management**: Effective cost management strategies contributed to the increase in operating income and net income.
- **Investment in Future Growth**: Amazon continued to invest in new initiatives and technologies, which is expected to drive long-term growth.

## Conclusion

Amazon's 2023 financial results demonstrate the company's resilience and ability to deliver strong performance despite challenging economic conditions. The growth in key metrics such as net sales, operating income, and free cash flow highlights Amazon's strategic focus on innovation, customer satisfaction, and operational efficiency.

#### Change the prompt to produce JSON output

In [10]:
# Modify the prompt template to request JSON output
json_prompt_template = """
You are a professional financial analyst. 
Based on the retrieved content from Amazon's 10-K filings, provide clear, concise, and insightful answers to user questions. 
When summarizing financial results, respond in bullet points highlighting key metrics, trends, and takeaways. 
Ensure your answers are accurate, data-driven, and easy to understand.
Format the output as JSON document.

$Query$
Resource: $search_results$
"""

# Perform RAG with JSON prompt template
response = retrieve_and_generate(
    query=query,
    kb_id=kb_id,
    model_id=model_id,
    number_of_results=number_of_results,
    generation_config=generation_config,
    prompt_template=json_prompt_template,
    region_name=variables['regionName']
)

# Display the results as Markdown to properly format the JSON
print('----------------- Answer ---------------------')
from IPython.display import display, Markdown
display(Markdown(response['output']['text'].replace("$", "\\$")))
#display_rag_results(response_json, format_as_markdown=True)

----------------- Answer ---------------------


```json
{
  "Amazon_Financial_Results_2023": {
    "First_Quarter_2023_Guidance": {
      "Net_Sales": {
        "Expected_Range": "\$121.0 billion to \$126.0 billion",
        "Growth_Compared_to_2022": "4% to 8%"
      },
      "Operating_Income": {
        "Expected_Range": "\$0 to \$4.0 billion",
        "Comparison_to_2022": "\$3.7 billion"
      },
      "Assumptions": [
        "No additional business acquisitions, restructurings, or legal settlements"
      ]
    },
    "Balance_Sheet_as_of_December_31_2023": {
      "Cash_and_Cash_Equivalents": "\$10,383 million",
      "Total_Assets": "\$201,875 million",
      "Total_Liabilities": "\$113,618 million",
      "Shareholders_Equity": "\$88,257 million"
    },
    "Income_Statement_2023": {
      "Net_Income": "\$30,425 million",
      "Other_Comprehensive_Income_(Loss)": "\$1,447 million",
      "Stock_Based_Compensation_and_Issuance_of_Employee_Benefit_Plan_Stock": "\$23,960 million"
    }
  }
}
```

### Cost Summary for Running This Notebook
In this notebook, we used two LLMs: 1) Embedding 2) Text Generation.

Let us breakdown the cost

In [11]:
embedding_model_id = "amazon.titan-embed-text-v2:0"
inference_model_id = "us.amazon.nova-lite-v1:0"

# Mark end of query executions here:
notebook_end_time = datetime.now(UTC)

In [12]:
print(notebook_start_time, notebook_end_time)

embedding_cost = get_bedrock_token_based_cost(embedding_model_id, notebook_start_time, notebook_end_time)
inference_cost = get_bedrock_token_based_cost(inference_model_id, notebook_start_time, notebook_end_time)

2025-05-03 02:36:59.451040+00:00 2025-05-03 02:37:35.640823+00:00


In [13]:
embedding_cost

 'model_id': 'amazon.titan-embed-text-v2:0',
 'start_time': '2025-05-03T02:36:59.451040+00:00',
 'end_time': '2025-05-03T02:37:35.640823+00:00',
 'duration in minutes': 0.6031630499999999,
 'input_tokens': 164,
 'output_tokens': 0,
 'invocation_count': 6,
 'per million input token costs': 0.02,
 'per million output token costs': 0.0,
 'input token costs': 3.28e-06,
 'output token costs': 0.0,
 'total token costs': 3.28e-06,
 'average token costs per invocation': 5.466666666666667e-07,
 'token costs per MILLION such invocations': 0.5466666666666667}

In [14]:
inference_cost

 'model_id': 'us.amazon.nova-lite-v1:0',
 'start_time': '2025-05-03T02:36:59.451040+00:00',
 'end_time': '2025-05-03T02:37:35.640823+00:00',
 'duration in minutes': 0.6031630499999999,
 'input_tokens': 11208,
 'output_tokens': 1336,
 'invocation_count': 6,
 'per million input token costs': 0.12,
 'per million output token costs': 0.36,
 'input token costs': 0.0013449599999999999,
 'output token costs': 0.00048095999999999995,
 'total token costs': 0.0018259199999999998,
 'average token costs per invocation': 0.00030431999999999996,
 'token costs per MILLION such invocations': 304.31999999999994}