# Re-ranking

Amazon Bedrock provides access to reranker models that you can use when querying to improve the relevance of the retrieved results.  reranker model calculates the relevance of chunks to a query and reorders the results based on the scores that it calculates. By using a reranker model, you can return responses that are better suited to answering the query. 

Reranker models are trained to identify relevance signals based on a query and then use those signals to rank documents. Because of this, the models can provide more relevant, more accurate results.

If you're using `Amazon Bedrock Knowledge Bases` for building your Retrieval Augmented Generation (RAG) application, use a reranker model while calling the `Retrieve` or `RetrieveAndGenerate operation`. The results from reranking override the default ranking that Amazon Bedrock Knowledge Bases determines.

This notebook demonstrates the use of **reranking model** with Amazon Bedrock Knowledge Bases, through the Rerank API which will help to further improve the accuracy and relevance of RAG applications. With a reranker model, you can retrieve fewer, but more relevant, results. By feeding these results to the foundation model that you use to generate a response, you can also decrease cost and latency.

Let's explore how to implement and utilize reranking models with Amazon Bedrock Knowledge Bases for an example use case.

## 1. Setup
Before running the rest of this notebook, you'll need to run the cells below to (ensure necessary libraries are installed and) connect to Bedrock.

Please ignore any pip dependency error (if you see any while installing libraries)

In [None]:
# %pip install --force-reinstall -q -r utils/requirements.txt --quiet

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
aiobotocore 2.12.0 requires botocore<1.34.52,>=1.34.41, but you have botocore 1.35.79 which is incompatible.
sagemaker 2.208.0 requires attrs<24,>=23.1.0, but you have attrs 24.2.0 which is incompatible.
tokenizers 0.14.1 requires huggingface_hub<0.18,>=0.16.4, but you have huggingface-hub 0.26.5 which is incompatible.[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.


In [None]:
# %pip install --upgrade boto3

Note: you may need to restart the kernel to use updated packages.


In [None]:
# # restart kernel
# from IPython.core.display import HTML
# HTML("<script>Jupyter.notebook.kernel.restart()</script>")

In [1]:
import boto3
print(boto3.__version__)

1.35.79


In [2]:
import warnings
warnings.filterwarnings('ignore')

In [3]:
import os
import time
import boto3
import logging
import pprint
import json

from utils.knowledge_base import BedrockKnowledgeBase

In [4]:
#Clients
s3_client = boto3.client('s3')
sts_client = boto3.client('sts')
session = boto3.session.Session(region_name = 'us-west-2')
region =  session.region_name
account_id = sts_client.get_caller_identity()["Account"]
bedrock_agent_client = boto3.client('bedrock-agent')
bedrock_agent_runtime_client = boto3.client('bedrock-agent-runtime') 
logging.basicConfig(format='[%(asctime)s] p%(process)s {%(filename)s:%(lineno)d} %(levelname)s - %(message)s', level=logging.INFO)
logger = logging.getLogger(__name__)
region, account_id

('us-west-2', '533267284022')

In [5]:
import time

# Get the current timestamp
current_time = time.time()

# Format the timestamp as a string
timestamp_str = time.strftime("%Y%m%d%H%M%S", time.localtime(current_time))[-7:]
# Create the suffix using the timestamp
suffix = f"{timestamp_str}"
knowledge_base_name = 'reranking-kb'
knowledge_base_description = "Knowledge Base for re-ranking."
bucket_name = f'{knowledge_base_name}-{suffix}'
foundation_model = "anthropic.claude-3-sonnet-20240229-v1:0"

## 2 - Create knowledge bases with fixed chunking strategy
Let's start by creating a [Knowledge Base for Amazon Bedrock](https://aws.amazon.com/bedrock/knowledge-bases/) to store video games data in csv format. Knowledge Bases allow you to integrate with different vector databases including [Amazon OpenSearch Serverless](https://aws.amazon.com/opensearch-service/features/serverless/), [Amazon Aurora](https://aws.amazon.com/rds/aurora/), [Pinecone](http://app.pinecone.io/bedrock-integration), [Redis Enterprise]() and [MongoDB Atlas](). For this example, we will integrate the knowledge base with Amazon OpenSearch Serverless. To do so, we will use the helper class `BedrockKnowledgeBase` which will create the knowledge base and all of its pre-requisites:
1. IAM roles and policies
2. S3 bucket
3. Amazon OpenSearch Serverless encryption, network and data access policies
4. Amazon OpenSearch Serverless collection
5. Amazon OpenSearch Serverless vector index
6. Knowledge base
7. Knowledge base data source

We will create a knowledge base using fixed chunking strategy. 

You can chhose different chunking strategies by changing the below parameter values: 
```
"chunkingStrategy": "FIXED_SIZE | NONE | HIERARCHICAL | SEMANTIC"
```

In [6]:
knowledge_base_metadata = BedrockKnowledgeBase(
    kb_name=f'{knowledge_base_name}-{suffix}',
    kb_description=knowledge_base_description,
    data_bucket_name=bucket_name, 
    chunking_strategy = "FIXED_SIZE", 
    suffix = suffix
)

[2024-12-12 10:42:38,838] p22908 {credentials.py:1278} INFO - Found credentials in shared credentials file: ~/.aws/credentials
[2024-12-12 10:42:39,638] p22908 {credentials.py:1278} INFO - Found credentials in shared credentials file: ~/.aws/credentials



 Region:  us-west-2
Step 1 - Creating or retrieving S3 bucket(s) for Knowledge Base documents
['reranking-kb-2104237']
Creating bucket reranking-kb-2104237
Step 2 - Creating Knowledge Base Execution Role (AmazonBedrockExecutionRoleForKnowledgeBase_2104237) and Policies
Step 3 - Creating OSS encryption, network and data access policies
Step 4 - Creating OSS Collection (this step takes a couple of minutes to complete)
{ 'ResponseMetadata': { 'HTTPHeaders': { 'connection': 'keep-alive',
                                         'content-length': '318',
                                         'content-type': 'application/x-amz-json-1.0',
                                         'date': 'Thu, 12 Dec 2024 18:42:42 '
                                                 'GMT',
                                         'x-amzn-requestid': 'e74d26ca-e057-416d-a3d8-ff6424cfee6d'},
                        'HTTPStatusCode': 200,
                        'RequestId': 'e74d26ca-e057-416d-a3d8-ff6424cfee6d

[2024-12-12 10:44:13,606] p22908 {base.py:258} INFO - PUT https://lah8kz6ynshwbl1n05ci.us-west-2.aoss.amazonaws.com:443/bedrock-sample-rag-index-2104237 [status:200 request:0.641s]



Creating index:
{ 'acknowledged': True,
  'index': 'bedrock-sample-rag-index-2104237',
  'shards_acknowledged': True}
Step 6 - Will create Lambda Function if chunking strategy selected as CUSTOM
Not creating lambda function as chunking strategy is FIXED_SIZE
Step 7 - Creating Knowledge Base
Creating KB with chunking strategy - FIXED_SIZE
 {'chunkingConfiguration': {'chunkingStrategy': 'FIXED_SIZE', 'fixedSizeChunkingConfiguration': {'maxTokens': 300, 'overlapPercentage': 20}}}
{ 'createdAt': datetime.datetime(2024, 12, 12, 18, 45, 14, 37067, tzinfo=tzutc()),
  'description': 'Knowledge Base for re-ranking.',
  'knowledgeBaseArn': 'arn:aws:bedrock:us-west-2:533267284022:knowledge-base/TKMKXY16MW',
  'knowledgeBaseConfiguration': { 'type': 'VECTOR',
                                  'vectorKnowledgeBaseConfiguration': { 'embeddingModelArn': 'arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-embed-text-v2:0'}},
  'knowledgeBaseId': 'TKMKXY16MW',
  'name': 'reranking-kb-2104237',
 

### 2.1 Download Amazon 2019, 2020, 2021, 2022, & 2023 annual reports and upload it to Amazon S3

Now that we have created the knowledge base, let's populate it with the `sec-10-k reports` dataset to KB. This data is being downloaded from [here](https://ir.aboutamazon.com/annual-reports-proxies-and-shareholder-letters/default.aspx). This data is about Amazon's annual reports, proxies and shareholder letters.

In [7]:
import os

def create_directory(directory_name):    
    if not os.path.exists(directory_name):
        os.makedirs(directory_name)
        print(f"Directory '{directory_name}' created successfully.")
    else:
        print(f"Directory '{directory_name}' already exists.")

# Call the function to create the directory
create_directory("sec-10-k")

Directory 'sec-10-k' created successfully.


In [8]:
import requests

def download_file(url, filename):
    # Send a GET request to the URL
    response = requests.get(url)
    
    # Check if the request was successful
    if response.status_code == 200:
        # Open the file in write-binary mode
        with open(filename, 'wb') as file:
            # Write the content of the response to the file
            file.write(response.content)
        print(f"File downloaded successfully: {filename}")
    else:
        print(f"Failed to download file. Status code: {response.status_code}")

# URL of the files to download
urls = ["https://s2.q4cdn.com/299287126/files/doc_financials/2024/ar/Amazon-com-Inc-2023-Annual-Report.pdf",
        "https://s2.q4cdn.com/299287126/files/doc_financials/2023/ar/Amazon-2022-Annual-Report.pdf",
        "https://s2.q4cdn.com/299287126/files/doc_financials/2022/ar/Amazon-2021-Annual-Report.pdf",
        "https://s2.q4cdn.com/299287126/files/doc_financials/2021/ar/Amazon-2020-Annual-Report.pdf",
        "https://s2.q4cdn.com/299287126/files/doc_financials/2020/ar/2019-Annual-Report.pdf"]


for url in urls:
    # Name for the downloaded file
    filename = url.split('/')[-1]

    # Path to save the downloaded file
    filepath = f"./sec-10-k/{filename}"

    # Call the function to download the file
    download_file(url, filepath)

File downloaded successfully: ./sec-10-k/Amazon-com-Inc-2023-Annual-Report.pdf
File downloaded successfully: ./sec-10-k/Amazon-2022-Annual-Report.pdf
File downloaded successfully: ./sec-10-k/Amazon-2021-Annual-Report.pdf
File downloaded successfully: ./sec-10-k/Amazon-2020-Annual-Report.pdf
File downloaded successfully: ./sec-10-k/2019-Annual-Report.pdf


Let's upload the annual reports data available in the `sec-10-k` folder to s3.

In [9]:
def upload_directory(path, bucket_name):
        for root,dirs,files in os.walk(path):
            for file in files:
                if not file.startswith('.DS_Store'):
                    file_to_upload = os.path.join(root,file)
                    print(f"uploading file {file_to_upload} to {bucket_name}")
                    s3_client.upload_file(file_to_upload,bucket_name,file)

# upload metadata file to S3
upload_directory("sec-10-k", bucket_name)

uploading file sec-10-k/Amazon-2022-Annual-Report.pdf to reranking-kb-2104237
uploading file sec-10-k/2019-Annual-Report.pdf to reranking-kb-2104237
uploading file sec-10-k/Amazon-2020-Annual-Report.pdf to reranking-kb-2104237
uploading file sec-10-k/Amazon-com-Inc-2023-Annual-Report.pdf to reranking-kb-2104237
uploading file sec-10-k/Amazon-2021-Annual-Report.pdf to reranking-kb-2104237


Now start the ingestion job. Since, we are using the same documents as used for fixed chunking, we are skipping the step to upload documents to s3 bucket. 

In [10]:
# ensure that the kb is available
time.sleep(30)
# sync knowledge base
knowledge_base_metadata.start_ingestion_job()

{ 'dataSourceId': 'NDIJLSHLGN',
  'ingestionJobId': 'ANSYU44T6G',
  'knowledgeBaseId': 'TKMKXY16MW',
  'startedAt': datetime.datetime(2024, 12, 12, 18, 49, 39, 148718, tzinfo=tzutc()),
  'statistics': { 'numberOfDocumentsDeleted': 0,
                  'numberOfDocumentsFailed': 0,
                  'numberOfDocumentsScanned': 0,
                  'numberOfMetadataDocumentsModified': 0,
                  'numberOfMetadataDocumentsScanned': 0,
                  'numberOfModifiedDocumentsIndexed': 0,
                  'numberOfNewDocumentsIndexed': 0},
  'status': 'STARTING',
  'updatedAt': datetime.datetime(2024, 12, 12, 18, 49, 39, 148718, tzinfo=tzutc())}
{ 'dataSourceId': 'NDIJLSHLGN',
  'ingestionJobId': 'ANSYU44T6G',
  'knowledgeBaseId': 'TKMKXY16MW',
  'startedAt': datetime.datetime(2024, 12, 12, 18, 49, 39, 148718, tzinfo=tzutc()),
  'statistics': { 'numberOfDocumentsDeleted': 0,
                  'numberOfDocumentsFailed': 0,
                  'numberOfDocumentsScanned': 5,
     

Finally we save the Knowledge Base Id to test the solution at a later stage. 

In [11]:
kb_id = knowledge_base_metadata.get_knowledge_base_id()

'TKMKXY16MW'


## 3. Evaluate the relevance of query responses with and without Re-ranking (using Ragas)

Define models for generation, evaluation and re-ranking

In [12]:
from langchain.llms.bedrock import Bedrock
from langchain_aws import ChatBedrock
from langchain_aws import BedrockEmbeddings

bedrock_client = boto3.client('bedrock-runtime')

TEXT_GENERATION_MODEL_ID = "anthropic.claude-3-haiku-20240307-v1:0"
EVALUATION_MODEL_ID = "anthropic.claude-3-sonnet-20240229-v1:0"
EMBEDDING_MODEL_ID = "amazon.titan-embed-text-v2:0"

# Reranker model: there are two reranker models available at launch
AMAZON_RERANKER_MODEL_ID = "amazon.rerank-v1:0"
COHERE_RERANKER_MODEL_ID = "cohere.rerank-v3-5:0"


llm_for_evaluation = ChatBedrock(model_id="anthropic.claude-3-sonnet-20240229-v1:0", client=bedrock_client)
bedrock_embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v2:0", client=bedrock_client)


#### 3.1 Update Knowledge Bases execution role

In [13]:
# Before using autogenerated filters - update the knowledge base execution IAM role with right permissions

iam = boto3.resource('iam')
client = boto3.client('iam')

def get_attached_policies(role_name):
    response = client.list_attached_role_policies(RoleName=role_name)
    attached_policies = response['AttachedPolicies']
    return attached_policies

# get the knowledge base IAM role name
get_kb_response = bedrock_agent_client.get_knowledge_base(knowledgeBaseId = kb_id)
role_arn = get_kb_response['knowledgeBase']['roleArn']
role_name = role_arn.split('/')[-1]

# get attached policies
attached_policies = get_attached_policies(role_name)
attached_policies

def update_kb_execution_role(attached_policies, region_name):
    
    for policy in attached_policies:

        print(policy['PolicyArn'])
        policy_name = policy['PolicyName']
        policy_arn = policy['PolicyArn']

        if 'FoundationModel' in policy_arn:
            print('Updating FoundationModel policy: ',policy_arn)
            policy = iam.Policy(policy_arn)
            version = policy.default_version
            policyJson = version.document
            policyJson['Statement'][0]['Resource'].append(f'arn:aws:bedrock:{region}::foundation-model/{TEXT_GENERATION_MODEL_ID}')
            policyJson['Statement'][0]['Resource'].append(f'arn:aws:bedrock:{region}::foundation-model/{EVALUATION_MODEL_ID}')  
            policyJson['Statement'][0]['Resource'].append(f'arn:aws:bedrock:{region}::foundation-model/{AMAZON_RERANKER_MODEL_ID}') 
            policyJson['Statement'][0]['Resource'].append(f'arn:aws:bedrock:{region}::foundation-model/{COHERE_RERANKER_MODEL_ID}') 
        
            client.detach_role_policy(RoleName=role_name,
                PolicyArn=policy_arn)
            
            response = client.delete_policy(
                PolicyArn=policy_arn
            )
            print(response)
           
            response = client.create_policy(
            PolicyName= policy_name,
            PolicyDocument=json.dumps(policyJson)
            )
            print(response)
        
        client.attach_role_policy(
            RoleName=role_name,
            PolicyArn=policy_arn
        )

In [14]:
update_kb_execution_role(attached_policies, region)
# time.sleep(30)

arn:aws:iam::533267284022:policy/AmazonBedrockS3PolicyForKnowledgeBase_2104237
arn:aws:iam::533267284022:policy/AmazonBedrockCloudWatchPolicyForKnowledgeBase_2104237
arn:aws:iam::533267284022:policy/AmazonBedrockFoundationModelPolicyForKnowledgeBase_2104237
Updating FoundationModel policy:  arn:aws:iam::533267284022:policy/AmazonBedrockFoundationModelPolicyForKnowledgeBase_2104237
{'ResponseMetadata': {'RequestId': '23c79873-d0f8-46c1-a9ab-6b7372237bbb', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Thu, 12 Dec 2024 18:53:40 GMT', 'x-amzn-requestid': '23c79873-d0f8-46c1-a9ab-6b7372237bbb', 'content-type': 'text/xml', 'content-length': '204'}, 'RetryAttempts': 0}}
{'Policy': {'PolicyName': 'AmazonBedrockFoundationModelPolicyForKnowledgeBase_2104237', 'PolicyId': 'ANPAXYKJVAA3F672VRYT7', 'Arn': 'arn:aws:iam::533267284022:policy/AmazonBedrockFoundationModelPolicyForKnowledgeBase_2104237', 'Path': '/', 'DefaultVersionId': 'v1', 'AttachmentCount': 0, 'PermissionsBoundaryUsageCount': 0, 'I

#### 3.2 Customize retrieve and generate configuraion

In [15]:
def retrieve_and_generate(query, reranker_model=None, kb_id=None, TEXT_GENERATION_MODEL_ID=None, metadata_filters=None):
    
    # Prepare retrieval configuration
    retrieval_config = {
        "vectorSearchConfiguration": {
            "numberOfResults": 30 if reranker_model else 3
        }
    }

    if reranker_model:
        retrieval_config["vectorSearchConfiguration"]["rerankingConfiguration"] = {
            "type": "BEDROCK_RERANKING_MODEL",
            "bedrockRerankingConfiguration": {
                "modelConfiguration": {
                    "modelArn": f'arn:aws:bedrock:{region}::foundation-model/{reranker_model}',
                },
                "numberOfRerankedResults": 3
            }
        }

        if metadata_filters:
            retrieval_config["vectorSearchConfiguration"]["rerankingConfiguration"]["bedrockRerankingConfiguration"]["metadataConfiguration"] = {
                                                                "selectionMode" : "SELECTIVE",
                                                                "selectiveModeConfiguration" : {
                                                                    "fieldsToInclude": [{
                                                                        "fieldName": "year",
                                                                    }]
                                                                }
                                                            }
                    

    # Call the retrieve and generate API
    start = time.time()
    response = bedrock_agent_runtime_client.retrieve_and_generate(
        input={'text': query},
        retrieveAndGenerateConfiguration={
            'type': 'KNOWLEDGE_BASE',
            'knowledgeBaseConfiguration': {
                'knowledgeBaseId': kb_id,
                'modelArn': f'arn:aws:bedrock:{region}::foundation-model/{TEXT_GENERATION_MODEL_ID}',
                'retrievalConfiguration': retrieval_config,
            },
        }
    )
    time_spent = time.time() - start

    print(f"[Response] : {response['output']['text']}\n")
    print(f"[Invocation time] : {time_spent}\n")

    return response


#### 3.3 Prepare dataset for evaluation

In [73]:
from datasets import Dataset
from ragas import evaluate
from ragas.metrics import (
    context_relevancy,
    context_recall,
    context_precision,
    context_entity_recall,
    answer_correctness,
)

#specify the metrics here
metrics = [
    context_relevancy,
    context_recall,
    context_precision,
    context_entity_recall,
    answer_correctness,
]

# questions = [
#     "How many days has Amazon asked employees to come to work in office in 2022?",
#     "By what percentage did AWS revenue grow year-over-year in 2022?",
#     "Compared to Graviton2 processors, what performance improvement did Graviton3 chips deliver according to the passage?",
#     "Which was the first inference chip launched by AWS according to the passage?",
#     "According to the context, in what year did Amazon's annual revenue increase from $245B to $434B?"
# ]
# ground_truths = [
#     "Amazon has asked corporate employees to come back to office at least three days a week beginning May 2022.",
#     "AWS had a 29% year-over-year ('YoY') revenue in 2022 on $62B revenue base.",
#     "In 2022, AWS delivered their Graviton3 chips, providing 25% better performance than the Graviton2 processors.",
#     "AWS launched their first inference chips (“Inferentia”) in 2019, and they have saved companies like Amazon over a hundred million dollars in capital expense.",
#     "Amazon's annual revenue increased from $245B in 2019 to $434B in 2022."

# ]

In [None]:
# questions = [
#     "How did the pandemic reshape Amazon's fulfillment network, and what were the operational investments made in 2021 to handle extraordinary demand?",
#     "How did Amazon's progress on sustainability initiatives, including renewable energy and carbon reduction, align with its operational goals in 2023?",
# ]

# ground_truths = [
#     "During the pandemic, Amazon's fulfillment network experienced unprecedented growth and transformation to meet extraordinary demand. By 2021, Amazon doubled its fulfillment center footprint within just 24 months—a feat that matched the scale of what it had built over the prior 25 years. This expansion included 253 fulfillment centers, 110 sortation centers, and 467 delivery stations in North America, along with 157 fulfillment centers and 588 delivery stations globally. The company also scaled its last-mile delivery network to a size comparable to UPS, while enhancing its logistics infrastructure to support one-day and same-day deliveries, which had temporarily slowed during the pandemic. These investments and adjustments helped Amazon maintain its commitment to customer satisfaction despite significant challenges.",
#     "In 2023, Amazon made significant progress on its sustainability goals, aligning them closely with its operational objectives. The company advanced its target of powering operations with 100% renewable energy to 2025, moving ahead of its original 2030 timeline. To reduce environmental impact and improve efficiency, Amazon regionalized its fulfillment network, optimizing inventory placement and trimming transportation distances. This initiative not only reduced delivery times and costs but also helped lower emissions. Additionally, Amazon invested in over 100,000 electric delivery vans as part of its commitment to The Climate Pledge, aiming for net-zero carbon emissions by 2040. These efforts reflect Amazon's dedication to balancing operational growth with environmental responsibility."
#     ]


In [None]:
# questions = [
#     "What are the primary factors that Amazon identifies as contributing to fluctuations in foreign exchange rates?",
#     "How many Availability Zones did AWS offer globally at the end of 2023, and in how many geographic Regions were these zones located?",
#     "What initiatives did Amazon undertake to improve the lives of its fulfillment network employees in 2021?",
#     "What were the stated amounts of Amazon's inventory valuation allowance as of December 31st for the years 2021 and 2022?",
#     "How has Amazon's understanding of, and response to, competition in the retail sector evolved over the past five years?",
#     "What are the primary considerations that Amazon takes into account when making investment decisions?",
#     "Evaluate Amazon's overall financial performance over the past four years. What are the key trends and factors driving the company's profitability or losses?",
#     "Compare and contrast Amazon's approach to innovation in its retail business versus its cloud computing business (AWS). Identify similarities and differences in their strategies and challenges.",
#     "Describe how Amazon defines operational excellence and provide examples of how it strives to achieve operational excellence in its various businesses.",
#     "How does Amazon's commitment to 'long-term thinking' manifest in its strategic decisions and investment priorities, as described in the sources?"
# ]

# ground_truths = [
#     "Amazon states that fluctuations in foreign exchange rates can significantly impact its financial performance, especially concerning its international operations. The company generates revenue and incurs expenses in various currencies, primarily Euros, British Pounds, and Japanese Yen. As these exchange rates fluctuate against the US dollar, it can create gains or losses upon consolidation. For instance, in 2021, changes in exchange rates compared to the previous year resulted in a $3 billion increase in International segment net sales. Amazon acknowledges that foreign exchange fluctuations are inherently unpredictable and can be influenced by various global economic factors.",
#     "The provided sources do not contain information about the number of AWS Availability Zones offered globally at the end of 2023, or the geographic Regions these zones were located in.",
#     "The provided sources do not specifically detail employee initiatives undertaken by Amazon in 2021. However, the 2021 annual report mentions that Amazon recognizes its employees as a primary customer set. The report also highlights the significant growth of Amazon's fulfillment network, including the addition of numerous fulfillment centers, sortation centers, and delivery stations globally, indicating a substantial investment in infrastructure and potentially, by extension, employee support.",
#     "The provided sources do not explicitly state the amounts for Amazon's inventory valuation allowance as of December 31st for the years 2021 and 2022.",
#     "Amazon's 2022 and 2023 annual reports provide insights into its approach to competition. In 2022, Amazon acknowledged the dynamic nature of the global retail market and the presence of 'many capable and well-funded competitors.' The company emphasized its customer-centric approach, highlighting its continuous evolution from a books-only retailer to a global platform offering diverse products and services. The 2023 report underscores Amazon's focus on customer experience and innovation as key competitive differentiators. The report highlights initiatives like the introduction of new premium brands and the expansion of services such as Buy with Prime, designed to enhance customer convenience and value. These reports suggest that Amazon's understanding of competition has evolved to recognize the importance of continuous innovation, a broad selection, and customer-centric services in a dynamic market with increasingly sophisticated competitors.",
#     "Amazon's investment philosophy, consistently articulated across multiple annual reports, emphasizes a long-term, customer-centric approach. The company prioritizes investments that align with its four guiding principles: customer obsession, passion for invention, commitment to operational excellence, and long-term thinking. While not claiming this as the definitive 'right' investment philosophy, Amazon believes in being transparent about its approach. The company balances strategic flexibility with a willingness to take calculated risks in pursuit of innovation and customer satisfaction. Amazon's investment decisions are driven by a deep understanding of customer needs and a commitment to building enduring franchises that reinvent what it means to serve customers. Specific examples include fulfillment network expansion, investments in AWS, Amazon Business, healthcare, Kuiper, and generative AI. These demonstrate Amazon's focus on leveraging existing capabilities while venturing into new areas.",
#     "Amazon's financial performance over the past four years has been marked by both strong growth and challenges. In 2021, the company reported strong profitability, with net income of $33.36 billion, driven by increased demand for e-commerce and cloud services during the pandemic. In 2022, Amazon faced a net loss of $2.72 billion due to inflation, higher transportation and utility costs, rising wages, and a return to more normal demand patterns. By 2023, Amazon rebounded with a net income of $30.42 billion, supported by e-commerce growth, strong AWS performance, and cost optimization. Key trends include moderated e-commerce growth post-pandemic, AWS's consistent contributions, inflationary pressures, and Amazon's investments in innovation and efficiency improvements.",
#     "Both Amazon's retail and AWS businesses share a focus on customer obsession, long-term vision, and an iterative approach to innovation. Retail innovation focuses on enhancing the shopping experience through a broad selection, faster delivery, and personalized recommendations. AWS innovation centers on expanding cloud capabilities, improving performance, and reducing costs. AWS operates in a rapidly evolving tech landscape, requiring a faster pace of innovation compared to retail. Challenges include supply chain management and labor practices in retail versus competition and regulatory compliance for AWS. Despite differences, both rely on experimentation and customer feedback to refine their strategies.",
#     "Amazon defines operational excellence as delivering exceptional customer experiences through efficient, reliable, and scalable operations. Examples include its fulfillment network, which has reduced fulfillment time from 18 hours in the early 2000s to 2 hours in 2021, and AWS's reliable, scalable cloud infrastructure. Amazon's customer service, cost management, and data-driven decision-making also exemplify operational excellence. These practices enable the company to deliver a wide range of products and services at scale while maintaining high standards of quality and customer satisfaction.",
#     "Amazon's commitment to 'long-term thinking' manifests in its strategic decisions and investment priorities, including heavy investments in AWS, fulfillment network expansion, emerging technologies like AI, healthcare, and satellite broadband. The company focuses on building enduring franchises, enhancing customer experience, and sustainability initiatives like its Climate Pledge. These strategies emphasize creating lasting value over short-term gains, shaping Amazon's growth trajectory for the future."
# ]


In [117]:
questions = [
    "How many Prime members did Amazon report in 2020?",
    "What was Amazon's operating income in 2023?",
    "Which new feature did Amazon introduce in AWS to optimize customer costs in 2023?",
    "What were the main macroeconomic challenges Amazon faced in 2022?",
    "How many fulfillment centers did Amazon operate globally by the end of 2021?",
    "What was the year-over-year growth rate of Amazon’s North America revenue in 2023?",
    "When did Amazon launch the Graviton2 chip, and what performance improvement did it offer?",
    "How much did Amazon invest in Career Choice for employees as of 2019?",
    "What was Amazon’s total net income in 2020?",
    "How many jobs did Amazon add in 2020?",
    "How did Amazon’s approach to AWS change during the 2008 financial crisis compared to the COVID-19 pandemic?",
    "What major structural changes were made to Amazon's U.S. fulfillment network in 2022?",
    "How has Amazon leveraged Prime membership to influence consumer behavior over the years?",
    "What role did AWS play in assisting organizations during the COVID-19 pandemic?",
    "How has Amazon’s strategy for renewable energy evolved from 2019 to 2023?",
    "Compare the year-over-year revenue growth for AWS in 2021 and 2022.",
    "How has Amazon adapted its advertising strategies to grow its revenue in 2023?",
    "What were the main sustainability milestones Amazon achieved from 2019 to 2021?",
    "How did the regionalization of Amazon’s fulfillment network impact delivery times and costs in 2023?",
    "What lessons from Amazon’s early retail model influenced its later development of AWS?",
    "Evaluate the effectiveness of Amazon's regionalized fulfillment strategy in addressing logistical challenges.",
    "Analyze the trade-offs Amazon faced when balancing long-term AWS investments with short-term profitability pressures.",
    "How did Amazon's response to the COVID-19 pandemic reflect its Day 1 philosophy?",
    "Assess the impact of Amazon’s Career Choice program on workforce development and retention.",
    "Compare Amazon’s sustainability efforts in AWS operations versus its retail logistics.",
    "How has Amazon's approach to product delivery speed evolved from 2019 to 2023?",
    "Evaluate the strategic significance of the Graviton and Trainium chip developments for AWS.",
    "Discuss the impact of Amazon's advertising business on its overall revenue diversification.",
    "How has Amazon addressed employee concerns in the wake of challenges such as unionization and layoffs?",
    "To what extent has Amazon succeeded in making its Prime Video a standalone profitable business?",
    "What are the recurring themes in Amazon’s shareholder letters from 2019 to 2023?",
    "How has Amazon’s strategy evolved in response to inflationary pressures and rising operational costs?",
    "Compare the ways Amazon has managed its global footprint in emerging versus established markets.",
    "How has Amazon utilized artificial intelligence to improve customer experiences across different business segments?",
    "What trends can be observed in Amazon's investments in infrastructure from 2020 to 2023?",
    "How has Amazon balanced profitability with sustainability initiatives in its AWS operations?",
    "Evaluate the evolution of Amazon's approach to corporate social responsibility (CSR) over the years.",
    "How have macroeconomic challenges influenced Amazon's strategies for inventory and logistics management?",
    "In what ways has Amazon supported small and medium-sized businesses through its platform?",
    "How does Amazon's innovation philosophy align with its long-term financial performance?",
    "Analyze the relationship between Amazon's investment in proprietary chips and its competitive positioning in the cloud market.",
    "How has Amazon’s investment in green energy influenced its overall operational efficiency?",
    "Evaluate the impact of Amazon's shift to a regionalized network on its sustainability goals.",
    "Discuss the significance of Amazon’s Project Kuiper as a potential growth area.",
    "How has Amazon’s focus on customer experience shaped its financial strategies across its business units?",
    "Assess the scalability of Amazon’s advertising model compared to its traditional retail and AWS segments.",
    "Analyze Amazon’s ability to respond to supply chain disruptions based on its reported adaptations.",
    "How does Amazon's 'Day 1' philosophy drive its response to global economic trends and challenges?"
]

ground_truths = [
    "Amazon reported having more than 200 million Prime members worldwide in 2020. This milestone reflects significant growth from earlier years, emphasizing the value and global reach of the program.",
    "Amazon reported an operating income of $36.9 billion in 2023, representing a 201% year-over-year increase compared to $12.2 billion in 2022.",
    "In 2023, AWS introduced features such as the S3 Intelligent Tiering storage class, which uses AI to automatically detect less frequently accessed data and move it to lower-cost storage tiers. This initiative aimed to enhance cost-efficiency for customers.",
    "In 2022, Amazon faced significant macroeconomic challenges, including rising inflation, supply chain disruptions, and heightened fuel costs following Russia’s invasion of Ukraine. These challenges increased operational costs and necessitated efficiency improvements.",
    "By the end of 2021, Amazon operated 410 fulfillment centers globally, including 253 in North America and 157 in other regions, supported by a vast network of delivery stations and sortation centers.",
    "Amazon’s North America revenue grew by 12% year-over-year in 2023, increasing from $316 billion in 2022 to $353 billion.",
    "Amazon launched the Graviton2 chip in 2020, offering up to 40% better price-performance compared to previous generation x86 processors. This innovation highlighted AWS’s commitment to advancing cloud computing efficiency.",
    "By 2019, Amazon had invested $700 million in Career Choice programs aimed at training over 100,000 employees in high-demand fields such as healthcare, cloud computing, and machine learning.",
    "Amazon’s total net income for 2020 was $21.3 billion. This figure marked a significant increase, reflecting strong demand for e-commerce and cloud services during the pandemic.",
    "Amazon added 500,000 jobs in 2020, bringing its total workforce to approximately 1.3 million employees worldwide.",
    "During the 2008 financial crisis, Amazon doubled down on AWS investments despite internal and external skepticism, viewing it as a long-term opportunity. In contrast, during the COVID-19 pandemic, AWS emphasized helping customers optimize costs while maintaining rapid innovation, reflecting its matured business model.",
    "In 2022, Amazon transitioned from a single national fulfillment network to a regionalized model comprising eight interconnected regions. This change reduced transportation distances, improved delivery times, and lowered operational costs.",
    "Amazon used Prime’s benefits—such as free delivery, exclusive deals, and streaming services—to drive customer loyalty and increase purchase frequency. Innovations like same-day delivery further entrenched Prime’s value proposition, particularly during periods like the COVID-19 pandemic.",
    "AWS supported healthcare providers, governments, and educational institutions by enabling scalable cloud computing solutions. Notable efforts included creating data lakes for COVID-19 research, supporting remote learning, and enabling contactless operations.",
    "Amazon committed to achieving net-zero carbon emissions by 2040 under The Climate Pledge. It invested in 100% renewable energy targets by 2030 (later accelerated to 2025), expanded wind and solar projects, and integrated electric delivery vehicles into its fleet.",
    "AWS revenue grew by 37% in 2021, driven by accelerated cloud adoption post-pandemic. In 2022, growth moderated to 29% as macroeconomic challenges prompted cost optimizations among customers.",
    "In 2023, Amazon expanded its advertising offerings, including Sponsored TV campaigns and introducing ads in Prime Video. These initiatives contributed to 24% year-over-year growth in advertising revenue, reaching $47 billion.",
    "Key milestones included co-founding The Climate Pledge, committing to 100% renewable energy by 2025, and deploying over 100,000 electric delivery vehicles. These efforts aligned with broader carbon reduction goals.",
    "Regionalization shortened delivery distances, enabling Amazon to achieve its fastest delivery speeds to date while reducing costs per unit globally for the first time since 2018.",
    "Amazon’s experience with scalability and modularity in its retail operations inspired AWS’s emphasis on primitives—flexible, foundational services—enabling rapid iteration and broad functionality.",
    "The regionalized fulfillment strategy proved effective in reducing delivery times and costs, mitigating supply chain disruptions, and improving inventory placement accuracy. These improvements enhanced customer satisfaction and operational efficiency.",
    "Amazon prioritized long-term value by continuing AWS investments, including proprietary chips and AI tools, despite near-term revenue impacts. This approach sustained AWS’s market leadership and customer loyalty.",
    "Amazon’s pandemic response—including rapid hiring, operational adjustments, and innovative AWS solutions—illustrated its Day 1 philosophy by emphasizing adaptability, customer obsession, and long-term investment.",
    "Amazon’s Career Choice program has significantly contributed to workforce development by offering pre-paid tuition for certifications in high-demand fields like healthcare and machine learning. By 2019, it supported over 25,000 employees, demonstrating Amazon’s commitment to upskilling while enhancing employee retention.",
    "Amazon’s AWS sustainability efforts focus on energy efficiency, including a 100% renewable energy target and custom chip innovations like Graviton, which reduce energy use. In retail, Amazon emphasizes reducing emissions through electric delivery fleets and optimized packaging. Both sectors align with The Climate Pledge but differ in their execution strategies.",
    "Amazon’s product delivery speed improved significantly due to investments in its fulfillment and logistics network. By 2023, Amazon delivered over 7 billion items on the same or next day globally, driven by the regionalization of fulfillment centers and expanded same-day delivery facilities. This strategy combined infrastructure upgrades with enhanced inventory placement and predictive algorithms to meet growing customer expectations.",
    "Graviton and Trainium chips represent critical advancements in AWS’s strategy to deliver cost-efficient, high-performance computing. Graviton chips offer 40% better price-performance compared to traditional processors, while Trainium accelerates machine learning tasks at reduced costs. These innovations strengthen AWS’s competitiveness and cater to customer demands for specialized, scalable, and cost-effective cloud solutions.",
    "Advertising has become a significant revenue stream for Amazon, growing 24% year-over-year in 2023 to reach $47 billion. This diversification reduces dependency on retail sales and AWS, leveraging the company’s extensive customer insights and platforms like Prime Video and Twitch to attract advertisers. The introduction of Sponsored TV ads further expands this segment’s potential.",
    "Amazon implemented various programs to address employee concerns, including wage increases, expanded benefits, and training initiatives like Career Choice. Despite criticisms related to layoffs and working conditions, the company has continued to emphasize safety protocols, professional growth opportunities, and transparent communication to foster employee satisfaction.",
    "Amazon’s Prime Video has shown strong potential as a standalone profitable business, supported by exclusive content, integration with advertising, and subscription models. Initiatives like 'Thursday Night Football' and global hits such as 'The Boys' and 'Reacher' enhance its appeal, while growing marketplace revenues and ad placements underscore its profitability trajectory.",
    "Common themes include long-term thinking, customer obsession, innovation, sustainability, and adaptability. Amazon consistently highlights its commitment to meeting customer needs through investment in infrastructure, employee development, and technological innovation, while addressing macroeconomic challenges and aligning with The Climate Pledge.",
    "Amazon responded to inflationary pressures by optimizing logistics through regionalization, adopting cost-saving technologies like Graviton chips, and implementing dynamic pricing strategies. These measures, combined with process streamlining and reduced cost-per-unit in delivery, helped maintain profitability despite economic headwinds.",
    "In established markets, Amazon focuses on enhancing delivery speeds, increasing Prime membership benefits, and deepening advertising integration. In emerging markets like India and Brazil, the company prioritizes expanding infrastructure, local partnerships, and offering affordable services to penetrate and grow market share sustainably.",
    "Amazon employs AI across AWS, retail, and logistics to enhance customer experiences. AWS AI services support innovations in healthcare and predictive analytics, while retail leverages AI for personalized recommendations and inventory management. AI-driven tools like S3 Intelligent Tiering further optimize operational efficiency and cost savings.",
    "Key trends include the doubling of fulfillment center capacity, regionalization of networks, expansion of same-day delivery hubs, and ongoing investments in renewable energy projects. These efforts reflect Amazon’s focus on scalability, sustainability, and customer satisfaction through faster and cost-effective delivery.",
    "AWS integrates sustainability by investing in renewable energy, such as wind and solar projects, achieving 100% renewable energy targets ahead of schedule. Innovations like energy-efficient Graviton chips further reduce carbon footprints, aligning with profitability goals by lowering operational costs while adhering to The Climate Pledge.",
    "Amazon’s CSR has expanded significantly, encompassing initiatives like The Climate Pledge, Career Choice, and community programs such as the Amazon Relief Fund. These efforts address environmental, social, and economic issues, reflecting a broader commitment to sustainability and workforce development alongside profitability.",
    "Rising inflation and supply chain disruptions prompted Amazon to adopt regional fulfillment networks, reduce transportation distances, and invest in predictive inventory tools. These strategies enhanced delivery speeds, lowered costs, and improved resilience against external pressures.",
    "Amazon empowers small and medium-sized businesses through its third-party marketplace, comprising 60% of retail sales. Programs like 'Fulfilled by Amazon,' AWS credits, and targeted advertising tools enable these businesses to scale operations and reach global audiences.",
    "Amazon’s innovation philosophy—centered on customer obsession and experimentation—drives investments in areas like AWS, Prime, and AI. This focus has delivered sustained financial growth by fostering loyalty, reducing costs, and creating new revenue streams, ensuring long-term shareholder value.",
    "Proprietary chips like Graviton and Trainium solidify AWS’s leadership by offering cost-efficiency and performance gains. These advancements attract a broader customer base seeking specialized solutions, reinforcing Amazon’s dominance amid growing competition in the cloud computing industry.",
    "Amazon’s investment in renewable energy, including wind and solar projects, has reduced operational costs by offsetting energy expenses. Achieving 100% renewable energy usage ahead of schedule demonstrates how sustainability initiatives align with profitability and environmental stewardship.",
    "The regionalized network reduced transportation distances, lowering carbon emissions and delivery costs. This structural shift aligns with Amazon’s broader sustainability goals under The Climate Pledge, highlighting the synergy between operational efficiency and environmental impact.",
    "Project Kuiper aims to provide global broadband access, targeting underserved regions and enterprise markets. This initiative represents a multi-billion-dollar opportunity, diversifying Amazon’s revenue streams while enhancing its technological footprint in satellite communications.",
    "Customer-centric innovations, such as faster delivery, Prime benefits, and AI-driven personalization, have driven revenue growth and loyalty. These efforts underpin financial strategies that prioritize reinvestment in technology and infrastructure to sustain long-term competitiveness.",
    "Amazon’s advertising model is highly scalable, leveraging its retail ecosystem and AWS platforms to deliver targeted ads. With growing revenue and diverse channels, such as Sponsored TV, the model complements traditional segments by maximizing user engagement and monetization opportunities.",
    "Amazon’s rapid shift to regional fulfillment networks and investment in predictive analytics highlight its agility in managing supply chain challenges. These adaptations minimized delays, reduced costs, and maintained customer satisfaction during global disruptions.",
    "Amazon’s 'Day 1' philosophy emphasizes continuous innovation, adaptability, and customer focus. This mindset enabled proactive responses to challenges like the pandemic and inflation, reinforcing resilience and long-term growth through strategic investments."
]


In [118]:
# query = "How did the pandemic reshape Amazon's fulfillment network, and what were the operational investments made in 2021 to handle extraordinary demand?"
# query = "How did Amazon's progress on sustainability initiatives, including renewable energy and carbon reduction, align with its operational goals in 2023?"
# query = "How has the revenue growth of Amazon's AWS, North America, and International segments evolved between 2021 and 2023? Highlight the specific year-on-year percentages."

In [67]:
from IPython.display import Markdown, display

def print_markdown(text):
    display(Markdown(text))

def evaluate_single_query_and_print_as_markdown(query, kb_id=None, TEXT_GENERATION_MODEL_ID=None, reranker_model=None):
    # Prepare results dictionary
    results = {}

    # Retrieve and generate response without the reranker model
    response_without_reranker = retrieve_and_generate(
        query=query,
        kb_id=kb_id,
        TEXT_GENERATION_MODEL_ID=TEXT_GENERATION_MODEL_ID,
        reranker_model=None  # No reranker
    )
    final_response_without_reranker = response_without_reranker["output"]["text"]
    contexts_without_reranker = [
        ref["content"]["text"]
        for citation in response_without_reranker["citations"]
        for ref in citation["retrievedReferences"]
        if "content" in ref and "text" in ref["content"]
    ]

    # Store results without reranker
    results["without_reranker"] = {
        "query": query,
        "final_response": final_response_without_reranker,
        "contexts": contexts_without_reranker,
    }

    # Retrieve and generate response with the reranker model, if provided
    if reranker_model is not None:
        response_with_reranker = retrieve_and_generate(
            query=query,
            kb_id=kb_id,
            TEXT_GENERATION_MODEL_ID=TEXT_GENERATION_MODEL_ID,
            reranker_model=reranker_model  # With reranker
        )
        final_response_with_reranker = response_with_reranker["output"]["text"]
        contexts_with_reranker = [
            ref["content"]["text"]
            for citation in response_with_reranker["citations"]
            for ref in citation["retrievedReferences"]
            if "content" in ref and "text" in ref["content"]
        ]

        # Store results with reranker
        results["with_reranker"] = {
            "query": query,
            "final_response": final_response_with_reranker,
            "contexts": contexts_with_reranker,
        }

    # Print results as Markdown using print_markdown
    markdown_text = "### Results Without Reranker\n\n"
    markdown_text += f"**Query:**\n{results['without_reranker']['query']}\n\n"
    markdown_text += f"**Final Response:**\n{results['without_reranker']['final_response']}\n\n"
    markdown_text += "**Contexts:**\n"
    for context in results['without_reranker']['contexts']:
        markdown_text += f"- {context}\n"

    if "with_reranker" in results:
        markdown_text += "\n### Results With Reranker\n\n"
        markdown_text += f"**Query:**\n{results['with_reranker']['query']}\n\n"
        markdown_text += f"**Final Response:**\n{results['with_reranker']['final_response']}\n\n"
        markdown_text += "**Contexts:**\n"
        for context in results['with_reranker']['contexts']:
            markdown_text += f"- {context}\n"

    print_markdown(markdown_text)


In [72]:
# Evaluate with and without reranker model
evaluate_single_query_and_print_as_markdown(
    query,
    kb_id=kb_id,
    TEXT_GENERATION_MODEL_ID=TEXT_GENERATION_MODEL_ID,
    reranker_model=AMAZON_RERANKER_MODEL_ID
)

[Response] : According to the search results, Amazon's revenue growth in its different segments evolved as follows between 2021 and 2023:

- North America sales increased 18% in 2021 compared to the prior year. (Source 1)
- International sales increased 22% in 2021 compared to the prior year. (Source 1)
- AWS sales increased 37% in 2021 compared to the prior year. (Source 1)
- In 2023, North America revenue increased 12% year-over-year, International revenue grew 11% year-over-year, and AWS revenue increased 13% year-over-year. (Source 3)

[Invocation time] : 2.1790802478790283

[Response] : According to the search results, the year-over-year revenue growth for Amazon's segments was as follows:

AWS:
2021 to 2022: 37% growth
2022 to 2023: 13% growth

North America:
2021 to 2022: 13% growth 
2022 to 2023: 12% growth

International:
2021 to 2022: -8% decline
2022 to 2023: 11% growth

[Invocation time] : 4.949108362197876



### Results Without Reranker

**Query:**
How has the revenue growth of Amazon's AWS, North America, and International segments evolved between 2021 and 2023? Highlight the specific year-on-year percentages.

**Final Response:**
According to the search results, Amazon's revenue growth in its different segments evolved as follows between 2021 and 2023:

- North America sales increased 18% in 2021 compared to the prior year. (Source 1)
- International sales increased 22% in 2021 compared to the prior year. (Source 1)
- AWS sales increased 37% in 2021 compared to the prior year. (Source 1)
- In 2023, North America revenue increased 12% year-over-year, International revenue grew 11% year-over-year, and AWS revenue increased 13% year-over-year. (Source 3)

**Contexts:**
- North America sales increased 18% in 2021, compared to the prior year. The sales growth primarily reflects increased unit sales, including sales by third-party sellers, and advertising sales. Increased unit sales were driven largely by our continued efforts to reduce prices for our customers, including from our shipping offers, and increased demand, partially offset by fulfillment network inefficiencies and supply chain constraints. We expect our North America sales growth rate to decelerate in Q1 2022 compared to the increase we experienced in Q1 2021.     International sales increased 22% in 2021, compared to the prior year. The sales growth primarily reflects increased unit sales, including sales by third-party sellers, and advertising sales. Increased unit sales were driven largely by our continued efforts to reduce prices for our customers, including from our shipping offers, and increased demand, partially offset by fulfillment network inefficiencies and supply chain constraints. We expect our International sales growth rate to decelerate in Q1 2022 compared to the increase we experienced in Q1 2021. Changes in foreign currency exchange rates impacted International net sales by $1.7 billion and $3.0 billion in 2020 and 2021.     AWS sales increased 37% in 2021, compared to the prior year. The sales growth primarily reflects increased customer usage, partially offset by pricing changes.
- A N N U A L R E P O R T     2 0 2 3Dear Shareholders:     Last year at this time, I shared my enthusiasm and optimism for Amazon’s future. Today, I have even more. The reasons are many, but start with the progress we’ve made in our financial results and customer experiences, and extend to our continued innovation and the remarkable opportunities in front of us.     In 2023, Amazon’s total revenue grew 12% year-over-year (“YoY”) from $514B to $575B. By segment, North America revenue increased 12% YoY from $316B to $353B, International revenue grew 11% YoY from $118B to $131B, and AWS revenue increased 13% YoY from $80B to $91B.     Further, Amazon’s operating income and Free Cash Flow (“FCF”) dramatically improved. Operating income in 2023 improved 201% YoY from $12.2B (an operating margin of 2.4%) to $36.9B (an operating margin of 6.4%). Trailing Twelve Month FCF adjusted for equipment finance leases improved from -$12.8B in 2022 to $35.5B (up $48.3B).     While we’ve made meaningful progress on our financial measures, what we’re most pleased about is the continued customer experience improvements across our businesses.

### Results With Reranker

**Query:**
How has the revenue growth of Amazon's AWS, North America, and International segments evolved between 2021 and 2023? Highlight the specific year-on-year percentages.

**Final Response:**
According to the search results, the year-over-year revenue growth for Amazon's segments was as follows:

AWS:
2021 to 2022: 37% growth
2022 to 2023: 13% growth

North America:
2021 to 2022: 13% growth 
2022 to 2023: 12% growth

International:
2021 to 2022: -8% decline
2022 to 2023: 11% growth

**Contexts:**
- Service sales primarily represent third-party seller fees, which includes commissions and any related fulfillment and shipping fees, AWS sales, advertising services, Amazon Prime membership fees, and certain digital content subscriptions. Net sales information is as follows (in millions):      Year Ended December 31, 2021 2022     Net Sales: North America $ 279,833 $ 315,880 International 127,787 118,007 AWS 62,202 80,096     Consolidated $ 469,822 $ 513,983 Year-over-year Percentage Growth (Decline):     North America 18 % 13 % International 22 (8) AWS 37 29     Consolidated 22 9 Year-over-year Percentage Growth, excluding the effect of foreign exchange rates:     North America 18 % 13 % International 20 4 AWS 37 29     Consolidated 21 13 Net sales mix:     North America 60 % 61 % International 27 23 AWS 13 16     Consolidated 100 % 100 %     Sales increased 9% in 2022, compared to the prior year. Changes in foreign currency exchange rates reduced net sales by $15.5 billion in 2022. For a discussion of the effect of foreign exchange rates on sales growth, see “Effect of Foreign Exchange Rates” below.     North America sales increased 13% in 2022, compared to the prior year.
- Service sales primarily represent third-party seller fees, which includes commissions and any related fulfillment and shipping fees, AWS sales, advertising services, Amazon Prime membership fees, and certain digital media content subscriptions. Net sales information is as follows (in millions):      Year Ended December 31, 2022 2023     Net Sales: North America $ 315,880 $ 352,828 International 118,007 131,200 AWS 80,096 90,757     Consolidated $ 513,983 $ 574,785 Year-over-year Percentage Growth (Decline):     North America 13 % 12 % International (8) 11 AWS 29 13     Consolidated 9 12 Year-over-year Percentage Growth, excluding the effect of foreign exchange rates:     North America 13 % 12 % International 4 11 AWS 29 13     Consolidated 13 12 Net Sales Mix:     North America 61 % 61 % International 23 23 AWS 16 16     Consolidated 100 % 100 %     Sales increased 12% in 2023, compared to the prior year. Changes in foreign exchange rates reduced net sales by $71 million in 2023. For a discussion of the effect of foreign exchange rates on sales growth, see “Effect of Foreign Exchange Rates” below.     North America sales increased 12% in 2023, compared to the prior year.


In [119]:
def prepare_eval_dataset(questions, ground_truths, kb_id=None, TEXT_GENERATION_MODEL_ID=None, reranker_model=None, metadata_filters = None):
    answers = []
    contexts = []
    
    for query in questions:
        response = retrieve_and_generate(
            query,
            reranker_model=reranker_model,
            kb_id=kb_id,
            TEXT_GENERATION_MODEL_ID=TEXT_GENERATION_MODEL_ID,
            metadata_filters=metadata_filters
        )
        
        answers.append(response["output"]["text"])
        
        context_group = []
        for citation in response["citations"]:
            context_group.extend([
                ref["content"]["text"]
                for ref in citation["retrievedReferences"]
                if "content" in ref and "text" in ref["content"]
            ])
        contexts.append(context_group)
        time.sleep(15)

    # Create dictionary
    data = {
        "question": questions,
        "answer": answers,
        "contexts": contexts,
        "ground_truth": ground_truths
    }

    # Convert dict to dataset
    dataset = Dataset.from_dict(data)
    return dataset


#### 3.4 Evaluate dataset - without re-ranker

In [120]:
without_reranker_dataset = prepare_eval_dataset(questions, ground_truths, kb_id, TEXT_GENERATION_MODEL_ID, reranker_model=None)

[Response] : According to the search results, Amazon reported having more than 200 million Prime members worldwide in 2020.

[Invocation time] : 1.5082077980041504

[Response] : According to the search results, Amazon's operating income in 2023 was $36.9 billion, which represented an operating margin of 6.4%.

[Invocation time] : 1.545361042022705

[Response] : According to the search results, one new feature Amazon introduced in AWS to optimize customer costs in 2023 was S3 Intelligent Tiering. This storage class uses AI to detect objects accessed less frequently and store them in less expensive storage layers, helping customers optimize their AWS spend.

[Invocation time] : 1.4212617874145508

[Response] : According to the search results, the main macroeconomic challenges Amazon faced in 2022 included:
- Inflation
- Increased interest rates
- Significant capital market volatility
- The prolonged COVID-19 pandemic
- Global supply chain constraints
- Global economic and geopolitical de

In [121]:
without_reranker_result = evaluate(
    dataset=without_reranker_dataset,
    metrics=metrics,
    llm=llm_for_evaluation,
    embeddings=bedrock_embeddings,
)

without_reranker_result_df = without_reranker_result.to_pandas()

Evaluating:   0%|          | 0/240 [00:00<?, ?it/s]



#### 3.5 Evaluate dataset - with re-ranker

In [122]:
with_reranker_dataset = prepare_eval_dataset(questions, ground_truths, kb_id, TEXT_GENERATION_MODEL_ID, reranker_model=AMAZON_RERANKER_MODEL_ID)

[Response] : According to the search results, Amazon reported having more than 200 million Prime members worldwide in 2020.

[Invocation time] : 1.8518791198730469

[Response] : Amazon's operating income in 2023 was $36.9 billion, which represented a 201% increase from the prior year's operating income of $12.2 billion.

[Invocation time] : 2.710169792175293

[Response] : According to the search results, one new feature that Amazon introduced in AWS to optimize customer costs in 2023 is Graviton chips. The search results state that Graviton2-based compute instances can deliver up to 40% better price-performance than comparable x86-based instances, and in 2022 Amazon delivered its Graviton3 chips which provide 25% better performance than the Graviton2 processors.

[Invocation time] : 2.30049991607666

[Response] : According to the search results, the main macroeconomic challenges Amazon faced in 2022 included:
- Inflation
- Increased interest rates
- Significant capital market volatilit

In [123]:
with_reranker_result = evaluate(
dataset=with_reranker_dataset,
metrics=metrics,
llm=llm_for_evaluation,
embeddings=bedrock_embeddings,
)

with_reranker_result_df = with_reranker_result.to_pandas()

Evaluating:   0%|          | 0/240 [00:00<?, ?it/s]



#### 3.4 Evaluate dataset - with re-ranker + metadata configuration

##### 3.4.1 Prepare metadata for ingestion


In [94]:
# import json
# import re

# def generate_matadata(data_dir):
    
#     # Loop through all PDF files in the directory
#     for filename in os.listdir(data_dir):
#         if not filename.startswith('.DS_Store'):
#             # Define the metadata dictionary
#             metadata ={}
            
#             filename= f'{data_dir}/{filename}'
#             print(filename)
            
#             # Create metadata
#             metadata["company"] = "Amazon"
#             metadata["ticker"] = "AMZN"
#             metadata["year"] = re.search(r'\d+', filename.split('/')[-1]).group(0)

#             # Create a JSON object
#             json_data = {"metadataAttributes": metadata}

#             # print(json_data)

#             # Write the JSON object to a file
#             with open(f"{filename.replace('.pdf', '.pdf.metadata.json')}", "w") as f:
#                 json.dump(json_data, f)


In [95]:
# data_dir = './sec-10-k'
# generate_matadata(data_dir)

In [96]:
# upload metadata file to S3
# upload_directory("sec-10-k", bucket_name)

##### 3.4.2 Ingest metadata into Knowledge Bases


Now start the ingestion job. Since, we are using the same documents as used for fixed chunking, we are skipping the step to upload documents to s3 bucket. 

In [97]:
# # ensure that the kb is available
# time.sleep(30)
# # sync knowledge base
# knowledge_base_metadata.start_ingestion_job()

In [124]:
with_reranker_metadata_filters_dataset = prepare_eval_dataset(questions, ground_truths, kb_id, TEXT_GENERATION_MODEL_ID, reranker_model=AMAZON_RERANKER_MODEL_ID, metadata_filters=True)

[Response] : According to the search results, Amazon reported having more than 200 million Prime members worldwide in 2020.

[Invocation time] : 2.1910560131073

[Response] : According to the search results, Amazon's operating income in 2023 was $36.9 billion.

[Invocation time] : 1.795898199081421

[Response] : According to the search results, one new feature Amazon introduced in AWS to optimize customer costs in 2023 was S3 Intelligent Tiering. This is a storage class that uses AI to detect objects accessed less frequently and store them in less expensive storage layers, helping customers use the cloud more efficiently.

[Invocation time] : 2.4006829261779785

[Response] : According to the search results, the main macroeconomic challenges Amazon faced in 2022 included:
- Inflation
- Increased interest rates
- Significant capital market volatility
- The prolonged COVID-19 pandemic
- Global supply chain constraints
- Global economic and geopolitical developments
These factors contribut

In [125]:
with_reranker_metadata_filters_result = evaluate(
dataset=with_reranker_metadata_filters_dataset,
metrics=metrics,
llm=llm_for_evaluation,
embeddings=bedrock_embeddings,
)

with_reranker_metadata_filters_result_df = with_reranker_result.to_pandas()

Evaluating:   0%|          | 0/240 [00:00<?, ?it/s]



#### 3.5 Prepare Comparison data frame

In [126]:
import pandas as pd

# Create the side-by-side DataFrame
comparison_df = pd.DataFrame({
    'question': without_reranker_result_df['question'],
    'without_reranker_answer': without_reranker_result_df['answer'],
    'with_reranker_answer': with_reranker_result_df['answer'],
    'with_reranker_metadata_answer': with_reranker_metadata_filters_result_df['answer'],
    
    'without_reranker_answer_correctness': without_reranker_result_df['answer_correctness'],
    'with_reranker_answer_correctness': with_reranker_result_df['answer_correctness'],
    'with_reranker_metadata_correctness': with_reranker_metadata_filters_result_df['answer_correctness'],
    })

In [127]:
pd.options.display.max_colwidth = 1000
comparison_df

Unnamed: 0,question,without_reranker_answer,with_reranker_answer,with_reranker_metadata_answer,without_reranker_answer_correctness,with_reranker_answer_correctness,with_reranker_metadata_correctness
0,How many Prime members did Amazon report in 2020?,"According to the search results, Amazon reported having more than 200 million Prime members worldwide in 2020.","According to the search results, Amazon reported having more than 200 million Prime members worldwide in 2020.","According to the search results, Amazon reported having more than 200 million Prime members worldwide in 2020.",0.609803,0.609803,0.609803
1,What was Amazon's operating income in 2023?,"According to the search results, Amazon's operating income in 2023 was $36.9 billion, which represented an operating margin of 6.4%.","Amazon's operating income in 2023 was $36.9 billion, which represented a 201% increase from the prior year's operating income of $12.2 billion.","Amazon's operating income in 2023 was $36.9 billion, which represented a 201% increase from the prior year's operating income of $12.2 billion.",0.529192,0.996996,0.996996
2,Which new feature did Amazon introduce in AWS to optimize customer costs in 2023?,"According to the search results, one new feature Amazon introduced in AWS to optimize customer costs in 2023 was S3 Intelligent Tiering. This storage class uses AI to detect objects accessed less frequently and store them in less expensive storage layers, helping customers optimize their AWS spend.","According to the search results, one new feature that Amazon introduced in AWS to optimize customer costs in 2023 is Graviton chips. The search results state that Graviton2-based compute instances can deliver up to 40% better price-performance than comparable x86-based instances, and in 2022 Amazon delivered its Graviton3 chips which provide 25% better performance than the Graviton2 processors.","According to the search results, one new feature that Amazon introduced in AWS to optimize customer costs in 2023 is Graviton chips. The search results state that Graviton2-based compute instances can deliver up to 40% better price-performance than comparable x86-based instances, and in 2022 Amazon delivered its Graviton3 chips which provide 25% better performance than the Graviton2 processors.",0.974043,0.121428,0.121428
3,What were the main macroeconomic challenges Amazon faced in 2022?,"According to the search results, the main macroeconomic challenges Amazon faced in 2022 included:\n- Inflation\n- Increased interest rates\n- Significant capital market volatility\n- The prolonged COVID-19 pandemic\n- Global supply chain constraints\n- Global economic and geopolitical developments\nThese factors contributed to increases in Amazon's operating costs, particularly across its North America and International segments, due to a return to more normal, seasonal demand volumes, increased transportation and utility costs, and increased wage rates. The search results also mention that rising fuel, utility, and food costs, rising interest rates, and recessionary fears may have impacted customer demand and Amazon's ability to forecast consumer spending patterns. Additionally, the current macroeconomic environment and enterprise customer cost optimization efforts were expected to impact AWS revenue growth rates.","According to the search results, the main macroeconomic challenges Amazon faced in 2022 included:\n- Inflation\n- Increased interest rates\n- Significant capital market volatility\n- The prolonged COVID-19 pandemic\n- Global supply chain constraints\n- Global economic and geopolitical developments\nThese factors contributed to increases in Amazon's operating costs, particularly across its North America and International segments, due to a return to more normal, seasonal demand volumes, increased transportation and utility costs, and increased wage rates. The search results also indicate that rising fuel, utility, and food costs, rising interest rates, and recessionary fears may have impacted customer demand and Amazon's ability to forecast consumer spending patterns. Additionally, the current macroeconomic environment and enterprise customer cost optimization efforts were expected to impact Amazon's AWS revenue growth rates.","According to the search results, the main macroeconomic challenges Amazon faced in 2022 included:\n- Inflation\n- Increased interest rates\n- Significant capital market volatility\n- The prolonged COVID-19 pandemic\n- Global supply chain constraints\n- Global economic and geopolitical developments\nThese factors contributed to increases in Amazon's operating costs, particularly across its North America and International segments, due to a return to more normal, seasonal demand volumes, increased transportation and utility costs, and increased wage rates. The search results also indicate that rising fuel, utility, and food costs, rising interest rates, and recessionary fears may have impacted customer demand and Amazon's ability to forecast consumer spending patterns. Additionally, the current macroeconomic environment and enterprise customer cost optimization efforts were expected to impact Amazon's AWS revenue growth rates.",0.66851,0.668319,0.668319
4,How many fulfillment centers did Amazon operate globally by the end of 2021?,"According to the search results, by the end of 2021 Amazon had 253 fulfillment centers in North America and 157 fulfillment centers in other parts of the world, for a total of 410 fulfillment centers globally.","According to the search results, by the end of 2021, Amazon had 253 fulfillment centers in North America and an additional 157 fulfillment centers across the globe, for a total of 410 fulfillment centers globally.","According to the search results, by the end of 2021, Amazon had 253 fulfillment centers in North America and an additional 157 fulfillment centers across the globe, for a total of 410 fulfillment centers globally.",0.888536,0.888676,0.888676
5,What was the year-over-year growth rate of Amazon’s North America revenue in 2023?,"According to the search results, Amazon's North America revenue grew 12% year-over-year from $316 billion in 2022 to $353 billion in 2023.","According to the search results, Amazon's North America revenue increased 12% in 2023 compared to the prior year.","According to the search results, Amazon's North America revenue increased 12% in 2023 compared to the prior year.",0.997076,0.601075,0.601075
6,"When did Amazon launch the Graviton2 chip, and what performance improvement did it offer?","According to the search results, Amazon announced the Graviton2 chip in 2020. The Graviton2 chip provided up to 40% better price-performance than the comparable latest generation x86 processors.","According to the search results, Amazon launched the Graviton2 chip in 2020. The Graviton2 chip provided up to 40% better price-performance than the comparable latest generation x86 processors.","According to the search results, Amazon launched the Graviton2 chip in 2020. The Graviton2 chip provided up to 40% better price-performance than the comparable latest generation x86 processors.",0.829754,0.833871,0.833871
7,How much did Amazon invest in Career Choice for employees as of 2019?,"According to the search results, Amazon has invested $700 million to provide more than 100,000 Amazonians access to training programs, including the Career Choice program. The Career Choice program pays up to 95% of tuition and fees for fulfillment center associates to pursue certificates or diplomas in high-demand occupations. Since its launch in 2012, more than 25,000 Amazonians have received training through the Career Choice program.","According to the search results, Amazon has invested over $700 million to provide more than 100,000 of its employees access to training programs through its Career Choice program. The search results state that since 2012, Amazon has offered the Career Choice program, which is a pre-paid tuition program for fulfillment center associates looking to move into high-demand occupations. Amazon pays up to 95% of tuition and fees toward a certificate or diploma in qualified fields of study.","According to the search results, Amazon has invested over $700 million to provide more than 100,000 of its employees access to training programs through its Career Choice program. The search results state that since 2012, Amazon has offered the Career Choice program, which is a pre-paid tuition program for fulfillment center associates looking to move into high-demand occupations. Amazon pays up to 95% of tuition and fees toward a certificate or diploma in qualified fields of study.",0.888869,0.459265,0.459265
8,What was Amazon’s total net income in 2020?,"According to the search results, Amazon's net income in 2020 was $21,331 million.","According to the search results, Amazon's net income in 2020 was $21,331 million.","According to the search results, Amazon's net income in 2020 was $21,331 million.",0.609658,0.609658,0.609658
9,How many jobs did Amazon add in 2020?,"According to the search results, in March 2020, Amazon opened 100,000 new positions across its fulfillment and delivery network. Later in 2020, Amazon announced it was creating another 75,000 jobs to respond to customer demand.","According to the search results, Amazon directly employed 1.3 million people around the world in 2020, which is an increase of 500,000 employees from the previous year.","According to the search results, Amazon directly employed 1.3 million people around the world in 2020, which is an increase of 500,000 employees from the previous year.",0.195704,0.991776,0.991776


In [129]:
# output the results to a csv file
comparison_df.to_csv('comparison_df.csv', index=False)

In [128]:
# Calculate average correctness
without_reranker_avg_correctness = without_reranker_result_df['answer_correctness'].mean()
with_reranker_avg_correctness = with_reranker_result_df['answer_correctness'].mean()
with_reranker_metadata_avg_correctness = with_reranker_metadata_filters_result_df['answer_correctness'].mean()

print(f"\nAverage Correctness without Reranker: {without_reranker_avg_correctness:.4f}")
print(f"Average Correctness with Reranker: {with_reranker_avg_correctness:.4f}")
print(f"Average Correctness with Reranker and metadata filter: {with_reranker_metadata_avg_correctness:.4f}")


Average Correctness without Reranker: 0.5933
Average Correctness with Reranker: 0.6017
Average Correctness with Reranker and metadata filter: 0.6017


### 2.7 Clean up
Please make sure to uncomment and run below cells to delete the resources created in this notebook.

In [130]:
# delete local directory
import shutil

dir_path = "sec-10-k" # Replace with the actual path

try:
    shutil.rmtree(dir_path)
    print(f"Directory '{dir_path}' and its contents have been deleted successfully.")
except FileNotFoundError:
    print(f"Directory '{dir_path}' not found.")
except Exception as e:
        print(f"An error occurred: {e}")

Directory 'sec-10-k' and its contents have been deleted successfully.


In [131]:
## Empty and delete S3 Bucket

objects = s3_client.list_objects(Bucket=bucket_name)  
if 'Contents' in objects:
    for obj in objects['Contents']:
        s3_client.delete_object(Bucket=bucket_name, Key=obj['Key']) 
s3_client.delete_bucket(Bucket=bucket_name)

{'ResponseMetadata': {'RequestId': 'QXH31Q2SNMXCSABX',
  'HostId': 'IkkPFoeT0OL2BWFen3Sv/Dayn6A73W4jvR+sqWXxlDrZe10l6mbPFCPZN88ZYwna8XHcYhMr94I=',
  'HTTPStatusCode': 204,
  'HTTPHeaders': {'x-amz-id-2': 'IkkPFoeT0OL2BWFen3Sv/Dayn6A73W4jvR+sqWXxlDrZe10l6mbPFCPZN88ZYwna8XHcYhMr94I=',
   'x-amz-request-id': 'QXH31Q2SNMXCSABX',
   'date': 'Fri, 13 Dec 2024 04:41:37 GMT',
   'server': 'AmazonS3'},
  'RetryAttempts': 0}}

In [132]:
# print("===============================Knowledge base==============================")
knowledge_base_metadata.delete_kb(delete_s3_bucket=True, delete_iam_roles_and_policies=True)

An error occurred (NoSuchBucket) when calling the ListObjects operation: The specified bucket does not exist
No intermediate bucket found
Found role AmazonBedrockExecutionRoleForKnowledgeBase_2104237
 [{'PolicyName': 'AmazonBedrockS3PolicyForKnowledgeBase_2104237', 'PolicyArn': 'arn:aws:iam::533267284022:policy/AmazonBedrockS3PolicyForKnowledgeBase_2104237'}, {'PolicyName': 'AmazonBedrockFoundationModelPolicyForKnowledgeBase_2104237', 'PolicyArn': 'arn:aws:iam::533267284022:policy/AmazonBedrockFoundationModelPolicyForKnowledgeBase_2104237'}, {'PolicyName': 'AmazonBedrockCloudWatchPolicyForKnowledgeBase_2104237', 'PolicyArn': 'arn:aws:iam::533267284022:policy/AmazonBedrockCloudWatchPolicyForKnowledgeBase_2104237'}, {'PolicyName': 'AmazonBedrockOSSPolicyForKnowledgeBase_2104237', 'PolicyArn': 'arn:aws:iam::533267284022:policy/AmazonBedrockOSSPolicyForKnowledgeBase_2104237'}]
Detached policy AmazonBedrockS3PolicyForKnowledgeBase_2104237 from role AmazonBedrockExecutionRoleForKnowledgeBase