# Retrieval Augmented Question & Answering with Amazon Bedrock using LangChain - Semantic search with metadata filtering with self-querying with ChromaDB - with Anthropic Claude LLM
### Context
LLM question answering (Q+A) typically involves retrieval of documents relevant to the question followed by synthesis of the retrieved chunks into an answer by an LLM.  In practice, the retrieval step is necessary because the LLM context window is limited relative to the size of most text corpus of interest (e.g., LLM context windows range is limited to certain tokens). Anthropic recently released a Claude model with a 100k token context window.  With the advent of models with larger context windows, it is reasonable to wonder whether the document retrieval stage is necessary for many Q+A or chat use-cases.

One such retriever architectures with this retriever-less option is Semantic search with metadata filtering with self-querying option using Chroma database as vector database. 

### Pattern

Semantic with metadata filtering refers to the process of finding items that are both semantically relevant and meet specific filter conditions, such as the availability of a product in stock. In addition to the vector representation, the raw media format (e.g., document or image) contains a list of metadata, such as author and date.

To illustrate this, let's consider an example of image retrieval, specifically finding shoe images released after a certain date. Embedding the date directly into the vector representation is challenging, but we can overcome this obstacle by employing a combination of vector search and metadata filtering.

In this approach, we utilize the vector representation to perform similarity searches, looking for images that are visually similar to the query (shoe images). However, to fulfill the additional requirement of being released after a specific date, we leverage the metadata filtering step. By applying the date filter to the list of image metadata, we can identify and retrieve only the shoe images that match the desired release date criteria. This way, we effectively solve the problem of finding shoe images released after a particular date using a combination of vector search and metadata filtering.

Let's assume we have a music library with each song represented by a feature vector (encoded by a deep learning model) and accompanied by metadata such as "artist," "genre," and "release date." We'll explore three strategies for combining vector-based search and metadata filtering in this music library:

* Pre-selection filtering: In this approach, we first apply metadata filtering to identify a subset of songs that meet certain criteria, such as a specific artist or genre. We then mark these songs as eligible for vector-based similarity search. During the vector search process, we only consider the eligible songs and return the most similar songs from that subset.

* Post-selection filtering: Here, we begin by conducting a vector-based similarity search on the entire music library, finding a subset of songs that are semantically relevant to the query song. Once we have this subset, we then apply metadata filtering to further refine the results and obtain the final list of songs that match both the similarity and metadata criteria.

* Simultaneous search with filtering: In this strategy, we combine the vector search and metadata filtering steps during the search process itself. As we perform the similarity search on the feature vectors, we immediately filter out any songs that do not meet the specified metadata criteria. The search process continues until we collect the required number of songs that satisfy both similarity and metadata conditions.


A self-querying retriever is a retriever that possesses the unique capability of querying itself. True to its name, this retriever can take any natural language query and use a query-constructing Language Model (LLM) chain to generate a structured query. It then applies this structured query to its underlying VectorStore. Consequently, the retriever can not only perform semantic similarity comparisons between the user's input query and the contents of stored documents but also extract filters from the user's query based on the metadata of these stored documents. Additionally, the retriever can execute these extracted filters to further refine the search results.



In this notebook we explain how to approach the retriever pattern of semantice search with metadata filtering leveraging self quering with ChromaDB. 

Chroma is the open-source embedding database. Chroma makes it easy to build LLM apps by making knowledge, facts, and skills pluggable for LLMs.

Chroma gives you the tools to:

* store embeddings and their metadata
* embed documents and queries
* search embeddings

Chroma prioritizes:

* simplicity and developer productivity
* analysis on top of search
* it also happens to be very quick

## Usecase
#### Dataset
In this example, you will use several years of Amazon's Letter to Shareholders as a text corpus to perform Q&A on.

#### Persona
Let's assume a persona of a customer trying to ask various questions  based on different criterias.

## Implementation
In order to follow the RAG approach this notebook is using the LangChain framework where it has integrations with different services and tools that allow efficient building of patterns such as RAG. We will be using the following tools:

- **LLM (Large Language Model)**: Anthropic Claude V1 available through Amazon Bedrock

  This model will be used to understand the document chunks and provide an answer in human friendly manner.
- **Embeddings Model**: Amazon Titan Embeddings available through Amazon Bedrock

  This model will be used to generate a numerical representation of the textual documents
- **Document Loader**: PDF Loader available through LangChain

  This is the loader that can load the documents from a source, for the sake of this notebook we are loading the sample files from a local path. This could easily be replaced with a loader to load documents from enterprise internal systems.

- **Vector Store**: ChromaDB available through LangChain

  In this notebook we are using ChromaDB vector-store to store both the embeddings and the documents. 
- **Index**: VectorIndex

  The index helps to compare the input embedding and the document embeddings to find relevant document
- **Wrapper**: wraps index, vector store, embeddings model and the LLM to abstract away the logic from the user.

### Setup
To run this notebook you would need to install 2 more dependencies, Lark and ChromaDB

Then begin with instantiating the LLM and the Embeddings model. Here we are using Antrophic Claude to demonstrate the use case.

Note: It is possible to choose other models available with Bedrock. You can replace the `model_id` as follows to change the model.

`llm = Bedrock(model_id="...")`

## Install dependencies

This notebook demonstrates invoking Bedrock models directly using the AWS SDK, but for later notebooks in the workshop you'll also need to install [LangChain](https://github.com/hwchase17/langchain):

In [1]:
%pip install langchain==0.0.305 --force-reinstall --quiet

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
botocore 1.31.75 requires urllib3<1.27,>=1.25.4; python_version < "3.10", but you have urllib3 2.0.7 which is incompatible.
llama-index 0.8.37 requires urllib3<2, but you have urllib3 2.0.7 which is incompatible.[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.


In [2]:
%pip install pydantic==1.10.13 --force-reinstall --quiet

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llama-index 0.8.37 requires urllib3<2, but you have urllib3 2.0.7 which is incompatible.[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.


In [3]:
%pip install sqlalchemy==2.0.21 --force-reinstall --quiet

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llama-index 0.8.37 requires urllib3<2, but you have urllib3 2.0.7 which is incompatible.[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.


### Install ChromaDB and Lark packages
The self-query retriever requires you to have  [chromadb](https://docs.trychroma.com/) package and lark. [Lark](https://lark-parser.readthedocs.io/en/stable/) is a modern general-purpose parsing library for Python. With Lark, you can parse any context-free grammar, efficiently, with very little code.

In [4]:
%pip install chromadb==0.4.13 --force-reinstall --quiet

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
botocore 1.31.75 requires urllib3<1.27,>=1.25.4; python_version < "3.10", but you have urllib3 2.0.7 which is incompatible.
llama-index 0.8.37 requires urllib3<2, but you have urllib3 2.0.7 which is incompatible.[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.


In [5]:
%pip install lark==1.1.7

Looking in indexes: https://pypi.org/simple, https://token:****@api.missioncloud.com:443/internal-pypi/simple/
Collecting lark==1.1.7
  Downloading lark-1.1.7-py3-none-any.whl.metadata (1.9 kB)
Downloading lark-1.1.7-py3-none-any.whl (108 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m108.9/108.9 kB[0m [31m885.2 kB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: lark
Successfully installed lark-1.1.7
Note: you may need to restart the kernel to use updated packages.


In [6]:
%pip install pypdf==3.14.0

Looking in indexes: https://pypi.org/simple, https://token:****@api.missioncloud.com:443/internal-pypi/simple/
Collecting pypdf==3.14.0
  Downloading pypdf-3.14.0-py3-none-any.whl.metadata (6.9 kB)
Downloading pypdf-3.14.0-py3-none-any.whl (269 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m269.8/269.8 kB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0mm
[?25hInstalling collected packages: pypdf
  Attempting uninstall: pypdf
    Found existing installation: pypdf 3.15.2
    Uninstalling pypdf-3.15.2:
      Successfully uninstalled pypdf-3.15.2
Successfully installed pypdf-3.14.0
Note: you may need to restart the kernel to use updated packages.


## Prepare the data

In this example, you will use several years of Amazon's Letter to Shareholders as a text corpus to perform Q&A on.

First you will download these files from the internet.

In [7]:
!mkdir -p ./data

from urllib.request import urlretrieve
urls = [
    'https://s2.q4cdn.com/299287126/files/doc_financials/2023/ar/2022-Shareholder-Letter.pdf',
    'https://s2.q4cdn.com/299287126/files/doc_financials/2022/ar/2021-Shareholder-Letter.pdf',
    'https://s2.q4cdn.com/299287126/files/doc_financials/2021/ar/Amazon-2020-Shareholder-Letter-and-1997-Shareholder-Letter.pdf',
    'https://s2.q4cdn.com/299287126/files/doc_financials/2020/ar/2019-Shareholder-Letter.pdf'
]

filenames = [
    'AMZN-2022-Shareholder-Letter.pdf',
    'AMZN-2021-Shareholder-Letter.pdf',
    'AMZN-2020-Shareholder-Letter.pdf',
    'AMZN-2019-Shareholder-Letter.pdf'
]

metadata = [
    dict(year=2022, source=filenames[0]),
    dict(year=2021, source=filenames[1]),
    dict(year=2020, source=filenames[2]),
    dict(year=2019, source=filenames[3])]

data_root = "./data/"

for idx, url in enumerate(urls):
    file_path = data_root + filenames[idx]
    urlretrieve(url, file_path)

As part of Amazon's culture, the CEO always includes a copy of the 1997 Letter to Shareholders with every new release. This will cause repetition, take longer to generate embeddings, and may skew your results. In the next section you will take the downloaded data, trim the 1997 letter (last 3 pages) and overwrite them as processed files.

In [8]:
import glob
from pypdf import PdfReader, PdfWriter

local_pdfs = glob.glob(data_root + '*.pdf')

for local_pdf in local_pdfs:
    pdf_reader = PdfReader(local_pdf)
    pdf_writer = PdfWriter()
    for pagenum in range(len(pdf_reader.pages)-3):
        page = pdf_reader.pages[pagenum]
        pdf_writer.add_page(page)

    with open(local_pdf, 'wb') as new_file:
        new_file.seek(0)
        pdf_writer.write(new_file)
        new_file.truncate()


Now that you have clean PDFs to work with, you will enrich your documents with metadata, then use a process called "chunking" to break up a larger document into small pieces. These small pieces will allow you to generate embeddings without surpassing the input limit of the embedding model.

In this example you will break the document into 1000 character chunks, with a 100 character overlap. This will allow your embeddings to maintain some of its context.

In [9]:
import numpy as np
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader

documents = []

for idx, file in enumerate(filenames):
    loader = PyPDFLoader(data_root + file)
    document = loader.load()
    for document_fragment in document:
        document_fragment.metadata = metadata[idx]
        
    print(f'{len(document)} {document}\n')
    documents += document

# - in our testing Character split works better with this PDF data set
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size = 500,
    chunk_overlap  = 100,
)

docs = text_splitter.split_documents(documents)



7 [Document(page_content='Dear shareholders:\nAs I sit down to write my second annual shareholder letter as CEO, I find myself optimistic and energized\nby what lies ahead for Amazon. Despite 2022 being one of the harder macroeconomic years in recent memory,and with some of our own operating challenges to boot, we still found a way to grow demand (on top ofthe unprecedented growth we experienced in the first half of the pandemic). We innovated in our largestbusinesses to meaningfully improve customer experience short and long term. And, we made importantadjustments in our investment decisions and the way in which we’ll invent moving forward, while stillpreserving the long-term investments that we believe can change the future of Amazon for customers,\nshareholders, and employees.\nWhile there were an unusual number of simultaneous challenges this past year, the reality is that if you\noperate in large, dynamic, global market segments with many capable and well-funded competitors (theco

### Import all the necessary libraries and access bedrock

We will import all the necessary libraries and access bedrock

Interaction with the Bedrock API is done via boto3 SDK. To create a the Bedrock client, we are providing an utility method that supports different options for passing credentials to boto3. 
If you are running these notebooks from your own computer, make sure you have [installed the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) before proceeding.


#### Use default credential chain

If you are running this notebook from a Sagemaker Studio notebook and your Sagemaker Studio role has permissions to access Bedrock you can just run the cells below as-is. This is also the case if you are running these notebooks from a computer whose default credentials have access to Bedrock

#### Use a different role

In case you or your company has setup a specific role to access Bedrock, you can specify such role by uncommenting the line `#os.environ['BEDROCK_ASSUME_ROLE'] = '<YOUR_VALUES>'` in the cell below before executing it. Ensure that your current user or role have permissions to assume such role.

#### Use a specific profile

In case you are running this notebooks from your own computer and you have setup the AWS CLI with multiple profiles and the profile which has access to Bedrock is not the default one, you can uncomment the line `#os.environ['AWS_PROFILE'] = '<YOUR_VALUES>'` and specify the profile to use.

#### Note about `langchain`

The Bedrock classes provided by `langchain` create a default Bedrock boto3 client. We recommend to explicitly create the Bedrock client using the instructions below, and pass it to the class instantiation methods using `client=bedrock_client`

In [10]:
#### Un comment the following lines to run from your local environment outside of the AWS account with Bedrock access

#import os
#os.environ['BEDROCK_ASSUME_ROLE'] = '<YOUR_VALUES>'
#os.environ['AWS_PROFILE'] = 'bedrock-user'

In [11]:
import os
import boto3
import json
import sys

module_path = ".."
sys.path.append(os.path.abspath(module_path))
from utils import bedrock, print_ww

bedrock_client = bedrock.get_bedrock_client(
    assumed_role=os.environ.get("BEDROCK_ASSUME_ROLE", None),
    region=os.environ.get("AWS_DEFAULT_REGION", None),
    runtime=True # Default. Needed for invoke_model() from the data plane
)

Create new client
  Using region: None
boto3 Bedrock client successfully created!
bedrock-runtime(https://bedrock-runtime.us-east-1.amazonaws.com)


#### Creating a Chroma Vectorstore

First we'll want to create a Chroma VectorStore and seed it with some data. 

In this example, you will be using the Amazon Titan Embeddings Model from Amazon Bedrock to generate the embeddings for our Chroma vector database.

The `TokenCounterHandler` callback function is a function you can utilize in your LLM objects and chains to generate reports on token count. It is supplied here as a utility class that will output the token counts at the end of your result chain, or can be attached to a LLM object and invoked manually.

In [12]:
%pip install tiktoken==0.4.0 --force-reinstall

Looking in indexes: https://pypi.org/simple, https://token:****@api.missioncloud.com:443/internal-pypi/simple/
Collecting tiktoken==0.4.0
  Downloading tiktoken-0.4.0-cp39-cp39-macosx_11_0_arm64.whl (761 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m761.7/761.7 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting regex>=2022.1.18 (from tiktoken==0.4.0)
  Using cached regex-2023.10.3-cp39-cp39-macosx_11_0_arm64.whl.metadata (40 kB)
Collecting requests>=2.26.0 (from tiktoken==0.4.0)
  Using cached requests-2.31.0-py3-none-any.whl.metadata (4.6 kB)
Collecting charset-normalizer<4,>=2 (from requests>=2.26.0->tiktoken==0.4.0)
  Using cached charset_normalizer-3.3.2-cp39-cp39-macosx_11_0_arm64.whl.metadata (33 kB)
Collecting idna<4,>=2.5 (from requests>=2.26.0->tiktoken==0.4.0)
  Using cached idna-3.4-py3-none-any.whl (61 kB)
Collecting urllib3<3,>=1.21.1 (from requests>=2.26.0->tiktoken==0.4.0)
  Using cached urllib3-2.0.7-py3-none-any.whl.me

In [13]:
from utils.TokenCounterHandler import TokenCounterHandler

token_counter = TokenCounterHandler()

In [14]:
from IPython.display import Markdown, display
from langchain.llms.bedrock import Bedrock 
from langchain.embeddings.bedrock import BedrockEmbeddings

embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v1",
                               client=bedrock_client)

Next you will import the Document and ChromaDB modules from Langchain. Using these modules will allow you to quickly generate embeddings through Amazon Bedrock and store them locally in your ChromaDB vector store.

In [15]:
from langchain.schema import Document
from langchain.vectorstores import Chroma

In this step you will process documents and prepare them to be converted to vectors for the vector store.

Here you will use the from_documents function in the Langchain Chroma provider to build a vector database from your document embeddings.

In [16]:
db = Chroma.from_documents(docs, embeddings)

## Similarity Searching

Here you will set your search query, and look for documents that match.

In [17]:
query = "How has AWS evolved?"

### Basic Similarity Search

The results that come back from the `similarity_search_with_score` API are sorted by score from lowest to highest. The score value is represented by the [L-squared (or L2)](https://en.wikipedia.org/wiki/Lp_space) distance of each result. Lower scores are better, repesenting a shorter distance between vectors.

In [18]:
results_with_scores = db.similarity_search_with_score(query)
for doc, score in results_with_scores:
    print(f"Content: {doc.page_content}\nMetadata: {doc.metadata}\nScore: {score}\n\n")

Content: most importantlyfor customers, AWS continues to deliver new capabilities rapidly (over 3,300 new features and services launchedin 2022), and invest in long-term inventions that change what’s possible.
Metadata: {'source': 'AMZN-2022-Shareholder-Letter.pdf', 'year': 2022}
Score: 129.1371307373047


Content: at a significantly lower cost. We’re not close to being done innovating here,and this long-term investment should prove fruitful for both customers and AWS. AWS is still in the earlystages of its evolution, and has a chance for unusual growth in the next decade.
Metadata: {'source': 'AMZN-2022-Shareholder-Letter.pdf', 'year': 2022}
Score: 141.65225219726562


Content: not only given customersmuch more functionality in AWS than they can find anywhere else (which is a significant differentiator), butalso allowed us to arrive at the much more game-changing offering that AWS is today.
Metadata: {'source': 'AMZN-2021-Shareholder-Letter.pdf', 'year': 2021}
Score: 141.7540588378906

### Similarity Search with Metadata Filtering
Additionally, you can provide metadata to your query to filter the scope of your results. The `filter` parameter for search queries is a dictionary of metadata key/value pairs that will be matched to results to include/exclude them from your query.

In [19]:
filter = dict(year=2022)

In the next section, you will notice that your query has returned less results than the basic search, because of your filter criteria on the resultset.

In [20]:
results_with_scores = db.similarity_search_with_score(query, filter=filter)
for doc, score in results_with_scores:
    print(f"Content: {doc.page_content}\nMetadata: {doc.metadata}, Score: {score}\n\n")

Content: most importantlyfor customers, AWS continues to deliver new capabilities rapidly (over 3,300 new features and services launchedin 2022), and invest in long-term inventions that change what’s possible.
Metadata: {'source': 'AMZN-2022-Shareholder-Letter.pdf', 'year': 2022}, Score: 129.1371307373047


Content: at a significantly lower cost. We’re not close to being done innovating here,and this long-term investment should prove fruitful for both customers and AWS. AWS is still in the earlystages of its evolution, and has a chance for unusual growth in the next decade.
Metadata: {'source': 'AMZN-2022-Shareholder-Letter.pdf', 'year': 2022}, Score: 141.65225219726562


Content: In 2008, AWS was still a fairly small, fledgling business.We knew we were on to something, but it still required substantial capital investment. There were voicesinside and outside of the company questioning why Amazon (known mostly as an online retailer then) wouldbe investing so much in cloud computing. But

### Top-K Matching

Top-K Matching is a filtering technique that involves a 2 stage approach.

1. Perform a similarity search, returning the top K matches.
2. Apply your metadata filter on the smaller resultset.

Note: A caveat for Top-K matching is that if the value for K is too small, there is a chance that after filtering there will be no results to return.

Using Top-K matching requires 2 values:
- `k`, the max number of results to return at the end of our query
- `fetch_k`, the max number of results to return from the similarity search before applying filters


In [21]:
results = db.similarity_search(query, filter=filter, k=2, fetch_k=4)
for doc in results:
    print(f"Content: {doc.page_content}\nMetadata: {doc.metadata}\n\n")

Content: most importantlyfor customers, AWS continues to deliver new capabilities rapidly (over 3,300 new features and services launchedin 2022), and invest in long-term inventions that change what’s possible.
Metadata: {'source': 'AMZN-2022-Shareholder-Letter.pdf', 'year': 2022}


Content: at a significantly lower cost. We’re not close to being done innovating here,and this long-term investment should prove fruitful for both customers and AWS. AWS is still in the earlystages of its evolution, and has a chance for unusual growth in the next decade.
Metadata: {'source': 'AMZN-2022-Shareholder-Letter.pdf', 'year': 2022}




### Maximal Marginal Relevance

Another measurement of results is Maximal Marginal Relevance (MMR). The focus of MMR is to minimize the redundancy of your search results while still maintaining relevance by re-ranking the results to provide both similarity and diversity.

In the next section you will use the `max_marginal_relevance_search` API to run the same query as in the Metadata Filtering section, but with reranked results.

In [22]:
results = db.max_marginal_relevance_search(query, filter=filter)
for doc in results:
    print(f"Content: {doc.page_content}\nMetadata: {doc.metadata}\n\n")

Content: most importantlyfor customers, AWS continues to deliver new capabilities rapidly (over 3,300 new features and services launchedin 2022), and invest in long-term inventions that change what’s possible.
Metadata: {'source': 'AMZN-2022-Shareholder-Letter.pdf', 'year': 2022}


Content: In 2008, AWS was still a fairly small, fledgling business.We knew we were on to something, but it still required substantial capital investment. There were voicesinside and outside of the company questioning why Amazon (known mostly as an online retailer then) wouldbe investing so much in cloud computing. But, we knew we were inventing something special that couldcreate a lot of value for customers and Amazon in the future. We had a head start on potential competitors;and if anything, we wanted
Metadata: {'source': 'AMZN-2022-Shareholder-Letter.pdf', 'year': 2022}


Content: manage their technology infrastructure. Amazon would be a different company ifwe’d slowed investment in AWS during that 2008-2

## Q&A with Anthropic Claude and Retrieved Vectors

Now that you are able to query from the vector store, you're ready to feed context into your LLM.

Using the LangChain wrapper for Bedrock, creating an object for the LLM can be done in a single line of code where you specify the model_id of the desired LLM (Claude V2 in this case), and any model level arguments.

In [23]:
from langchain.llms.bedrock import Bedrock

model_kwargs_claude = { 
    "max_tokens_to_sample": 512,
    "stop_sequences": [],
    "temperature":0,  
    "top_p":0.5
}

# Anthropic Claude Model
llm = Bedrock(
    model_id="anthropic.claude-v2", 
    client=bedrock_client, 
    model_kwargs=model_kwargs_claude,
    callbacks=[token_counter]
)

Since you have a model object set up, you can use it to get a baseline of what the LLM will produce without any provided context.

Something you will notice is with the prompt "How has AWS evolved?", the answer isn't bad, but its not exactly what you'd look for from the lens of an executive. You'd want to hear about how they approached things that led to evolution, whereas the baseline results are just facts that indicate change. Later in the notebook, you will provide context to get a more tailored answer.

In [24]:
print(llm.predict("How has AWS evolved?"))

token_counter.report()

 Here are some key ways AWS has evolved over time:

- Expanded service offerings - AWS started in 2006 with basic infrastructure services like EC2 for compute, S3 for storage, and SQS for messaging. It now offers over 200 services covering a wide range of categories like compute, storage, database, networking, developer tools, machine learning, security, etc.

- Global infrastructure - AWS initially launched with one region in the US-East. It now has availability zones within 25 geographic regions across the world, with plans to launch more. This allows customers to deploy applications closer to their end users.

- Higher-level services - The initial AWS services required more hands-on management. Over time, AWS has released more abstracted, higher-level services that are easier for developers to use like Lambda for serverless computing, DynamoDB for NoSQL databases, SageMaker for machine learning, etc. 

- Hybrid capabilities - AWS has developed services and partnerships to integrate 

With your LLM ready to go, you'll create a prompt template to utilize context to answer a given question. Prompt formats will be different by model, so if you change your model you will also likely need to adjust your prompt.

In [25]:
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

prompt_template = """

Human: Here is a set of context, contained in <context> tags:

<context>
{context}
</context>

Use the context to provide an answer to the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

{question}

Assistant:"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

With the LLM endpoint object created, you are ready to create your first chain!

This chain is a simple example using LangChain's RetrievalQA chain, which will:
- take a query as input
- generate query embeddings
- query the vector database for relevant document chunks based on the query embedding
- inject the context and original query into the prompt template
- invoke the LLM with the completed prompt
- return the LLM result

The [`stuff` chain type](https://python.langchain.com/docs/modules/chains/document/stuff) simply takes the context documents and inserts them into the prompt.

By setting `return_source_documents` to `True`, the LLM responses will also contain the document chunks from the vector database, to illustrate where the context came from.

In [26]:
from langchain.chains import RetrievalQA
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=db.as_retriever(
        search_type="similarity",
        search_kwargs={"k": 5, "filter": filter},
        callbacks=[token_counter]
    ),
    return_source_documents=True,
    chain_type_kwargs={"prompt": PROMPT},
    callbacks=[token_counter]
)

Now that your chain is set up, you can supply queries to it and generate responses based on your source documents.

You'll note that the LLM response references the context documents provided, using them to formulate a response calling out things that were mentioned specifically by Amazon's CEO.

In [27]:
query = "How has AWS evolved?"
result = qa({"query": query})

print(f'Query: {result["query"]}\n')
print(f'Result: {result["result"]}\n')
print(f'Context Documents: ')
for srcdoc in result["source_documents"]:
      print(f'{srcdoc}\n')


Token Counts:
Total: 936
Embedding: N/A
Prompt: 392
Generation:544

Query: How has AWS evolved?

Result:  Based on the provided context, it seems AWS has evolved by:

- Continuing to deliver new capabilities and features rapidly (over 3,300 new features and services launched in 2022).

- Making long-term investments in inventions that change what's possible. 

- Providing capabilities at a significantly lower cost compared to alternatives.

- Still being in the early stages of its evolution, with a chance for unusual growth in the next decade. 

- Requiring substantial capital investment early on when it was still a fledgling business in 2008.

- Seeing increasing numbers of enterprises opting to move to AWS to enjoy benefits like agility, innovation, cost-efficiency and security.

- Delivering over 3,300 new capabilities rapidly in 2022.

So in summary, AWS has evolved by rapidly expanding its features and services, making long-term technology investments, reducing costs, and continu

In [28]:
query = "Why is Amazon successful?"
result = qa({"query": query})

print(f'Query: {result["query"]}\n')
print(f'Result: {result["result"]}\n')
print(f'Context Documents: ')
for srcdoc in result["source_documents"]:
      print(f'{srcdoc}\n')


Token Counts:
Total: 1548
Embedding: N/A
Prompt: 764
Generation:784

Query: Why is Amazon successful?

Result:  Based on the provided context, there are a few key reasons why Amazon has been successful:

- Amazon started out as just a books retailer but has expanded to sell a huge variety of physical and digital products, including through a third-party marketplace that accounts for 60% of unit sales. This selection and evolution has allowed Amazon to grow its business.

- Amazon offers services like AWS that provide technology infrastructure to other businesses. AWS and other services are high growth areas for Amazon. 

- Amazon's advertising business allows brands to effectively advertise on Amazon, similar to how physical retailers sell ad space. This is a large market opportunity for Amazon.

- Amazon Business provides procurement services to organizations and has thrived by focusing on selection, value, and convenience. 

- During tough economic times like 2008-2009, Amazon conti

In [29]:
query = "What business challenges has Amazon experienced?"
result = qa({"query": query})

print(f'Query: {result["query"]}\n')
print(f'Result: {result["result"]}\n')
print(f'Context Documents: ')
for srcdoc in result["source_documents"]:
      print(f'{srcdoc}\n')


Token Counts:
Total: 2158
Embedding: N/A
Prompt: 1153
Generation:1005

Query: What business challenges has Amazon experienced?

Result:  Based on the context provided, some of the business challenges Amazon has experienced include:

- Operating in large, dynamic, global market segments with many capable and well-funded competitors. The context mentions that conditions rarely stay stagnant for long in the environments Amazon operates in.

- Facing an unusual number of simultaneous challenges in the past year. The context mentions there were an unusual number of simultaneous challenges in the past year.

- Constant change, much of which Amazon has initiated themselves. The context mentions there has been constant change at Amazon over the past 25 years, much of which Amazon has initiated. 

- Making investments during difficult economic periods, like continuing investment in AWS during the 2008-2009 recession. The context highlights how Amazon continued investing in AWS during a difficu

In [30]:
query = "How was Amazon impacted by COVID-19?"
result = qa({"query": query})

print(f'Query: {result["query"]}\n')
print(f'Result: {result["result"]}\n')
print(f'Context Documents: ')
for srcdoc in result["source_documents"]:
      print(f'{srcdoc}\n')


Token Counts:
Total: 2752
Embedding: N/A
Prompt: 1635
Generation:1117

Query: How was Amazon impacted by COVID-19?

Result:  Based on the context provided, it seems that Amazon's consumer business grew significantly during the early part of the COVID-19 pandemic, when many physical stores were shut down. The context states:

"During the early part of the pandemic, with many physical stores shut down, our consumer business grew by what lies ahead for Amazon."

This suggests that Amazon's consumer business, which includes online retail, grew rapidly as consumers shifted more of their shopping online due to physical store closures during the pandemic. The context indicates this was a critical growth period for Amazon's consumer business.

Context Documents: 
page_content='A critical challenge we’ve continued to tackle is the rising cost to serve in our Stores fulfillment network (i.e.\nthe cost to get a product from Amazon to a customer)—and we’ve made several changes that we believe wil