# Demonstrate LLM RAG Application for Private Documents

## Use Cases
1. Knowledge discovery using Q&A on private knowledge documents
2. Customer Servicing Assistant for higher quality and faster 

## LLM Stack

* Document Source - `AWS Bedrock service FAQ document. A public HTML web page`
* LLM Framework - `LangChain`
* LLM - OpenAI
* Vector Store - Chroma

## Setup enviroment

In [1]:
import dotenv
dotenv.load_dotenv()

True

## Import necessary modules

In [2]:
from langchain.document_loaders import WebBaseLoader # load webpages
from langchain.text_splitter import RecursiveCharacterTextSplitter # split documents into chunks
from langchain.embeddings import OpenAIEmbeddings # Embedding model that converts text to vector
from langchain.vectorstores import Chroma # vector db to store vector embeddings
from langchain.schema.runnable import RunnablePassthrough
from langchain.prompts import PromptTemplate  # instructions to guide LLM
from langchain.chat_models import ChatOpenAI # LLM model
from langchain.schema import StrOutputParser 

## Step1: Load private documents (Web Pages, Markdown, Confluence, .txt, .pdf etc)

In [3]:
%%time

loader = WebBaseLoader(
    web_paths=("https://aws.amazon.com/bedrock/faqs/",),
)
bedrock_docs = loader.load()

CPU times: user 252 ms, sys: 23.5 ms, total: 275 ms
Wall time: 560 ms


In [4]:
type(bedrock_docs[0])

langchain.schema.document.Document

In [5]:
len(bedrock_docs)

1

## Inspect of metadata of document

In [6]:
bedrock_docs[0].metadata

{'source': 'https://aws.amazon.com/bedrock/faqs/',
 'title': 'Build Generative AI Applications with Foundation Models - Amazon Bedrock FAQs - AWS',
 'description': 'Find answers to frequently asked questions about Amazon Bedrock.',
 'language': 'en-US'}

## Inspect sample content of document

In [7]:
bedrock_docs[0].page_content

"\n\n\n\n\n\n\n\n\n\n\nBuild Generative AI Applications with Foundation Models - Amazon Bedrock FAQs - AWS\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n Skip to main content\n\n\n\n\n\nClick here to return to Amazon Web Services homepage\n\n\n\nContact Us\n Support\xa0 \nEnglish\xa0\nMy Account\xa0\n\n\n\n\n Sign In\n\n\n  Create an AWS Account \n\n\n\n\n\n\n\n\n\nre:Invent\nProducts\nSolutions\nPricing\nDocumentation\nLearn\nPartner Network\nAWS Marketplace\nCustomer Enablement\nEvents\nExplore More \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n Close \n\n\n\nعربي\nBahasa Indonesia\nDeutsch\nEnglish\nEspañol\nFrançais\nItaliano\nPortuguês\n\n\n\n\nTiếng Việt\nTürkçe\nΡусский\nไทย\n日本語\n한국어\n中文 (简体)\n中文 (繁體)\n\n\n\n\n\n Close \n\nMy Profile\nSign out of AWS Builder ID\nAWS Management Console\nAccount Settings\nBilling & Cost Management\nSecurity Credentials\nAWS Personal Health Dashboard\n\n\n\n Close \n\nSupport Center\nExpert Help\nKnowledge Cent

## Total Bedrock FAQ is 29k characters long

In [8]:
len(bedrock_docs[0].page_content)

29535

## sample data from Bedrock FAQ

In [9]:
print(bedrock_docs[0].page_content[3500:4500])

's Llama 2, and the Amazon Titan language and embeddings models.





Why should I use Amazon Bedrock?






There are five reasons to use Amazon Bedrock for building generative AI applications.

Choice of leading foundation models: Amazon Bedrock offers an easy-to-use developer experience to work with a broad range of high-performing FMs from Amazon and leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, and Stability AI. You can quickly experiment with a variety of FMs in the playground, and use a single API for inference regardless of the models you choose, giving you the flexibility to use FMs from different providers and keep up to date with the latest model versions with minimal code changes.
Easy model customization with your data: Privately customize FMs with your own data through a visual interface without writing any code. Simply select the training and validation data sets stored in Amazon Simple Storage Service (Amazon S3) and, if required, adjust the hyperparamet

## Step2: Split document into chunks
### chunk_size=1000 characters with chunk_overlap=200 characters between chunks
### Helps preserve the relevant context

In [10]:
%%time
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200, add_start_index=True
)
all_splits = text_splitter.split_documents(bedrock_docs)

CPU times: user 3.18 ms, sys: 47 µs, total: 3.23 ms
Wall time: 3.23 ms


In [11]:
all_splits

[Document(page_content='Build Generative AI Applications with Foundation Models - Amazon Bedrock FAQs - AWS\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n Skip to main content\n\n\n\n\n\nClick here to return to Amazon Web Services homepage\n\n\n\nContact Us\n Support\xa0 \nEnglish\xa0\nMy Account\xa0\n\n\n\n\n Sign In\n\n\n  Create an AWS Account \n\n\n\n\n\n\n\n\n\nre:Invent\nProducts\nSolutions\nPricing\nDocumentation\nLearn\nPartner Network\nAWS Marketplace\nCustomer Enablement\nEvents\nExplore More \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n Close \n\n\n\nعربي\nBahasa Indonesia\nDeutsch\nEnglish\nEspañol\nFrançais\nItaliano\nPortuguês\n\n\n\n\nTiếng Việt\nTürkçe\nΡусский\nไทย\n日本語\n한국어\n中文 (简体)\n中文 (繁體)\n\n\n\n\n\n Close \n\nMy Profile\nSign out of AWS Builder ID\nAWS Management Console\nAccount Settings\nBilling & Cost Management\nSecurity Credentials\nAWS Personal Health Dashboard\n\n\n\n Close \n\nSupport Center\nExpert Help\nKnowledge Cen

## 29k character Bedrock FAQ document has been split into 42 chunks

In [12]:
len(all_splits)

42

## 20 chunk has 990 characters

In [13]:
len(all_splits[20].page_content)

990

In [14]:
dict(all_splits[20])

{'page_content': 'Will AWS and third-party model providers use customer inputs to or outputs from Amazon Bedrock to train Amazon Titan or any third-party models?\n\n\n\n\n\n\nNo, AWS and the third-party model providers will not use any inputs to or outputs from Bedrock to train Amazon Titan or any third-party models.\n\n\n\n\n\n\n\n\n\n\nSDK\nOpen all\n\n\n\nWhat SDKs are supported for Amazon Bedrock?\n\n\n\n\n\n\nAmazon Bedrock supports SDKs for runtime services. iOS and Android SDKs, as well as Java, JS, Python, CLI, .Net, Ruby, PHP, Go, and CPP support both text and speech input.\n\n\n\n\n\nWhat SDKs support streaming functionality?\n\n\n\n\n\n\nStreaming is supported on all the SDKs.\n\n\n\n\n\n\n\n\n\n\nBilling and Support\nOpen all\n\n\n\nHow much does Amazon Bedrock cost?\n\n\n\n\n\n\nPlease see the Amazon Bedrock Pricing Page for current pricing information.\n\n\n\n\n\nWhat support is provided for Amazon Bedrock?\n\n\n\n\n\n\nDepending on your AWS support contract, Amazon Bedro

## Step3: Convert each chunk using Embedding model into a vector embedding and store in vector database
### Embeddings create a vector representation of a piece of text.
### Why? User supplied query text can be converted to embedding and used to perform semantic search in vector space

In [15]:
%%time
vectorstore = Chroma.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())

CPU times: user 941 ms, sys: 133 ms, total: 1.07 s
Wall time: 2.45 s


In [16]:
type(vectorstore)

langchain.vectorstores.chroma.Chroma

In [17]:
dir(vectorstore)

['_Chroma__query_collection',
 '_LANGCHAIN_DEFAULT_COLLECTION_NAME',
 '__abstractmethods__',
 '__annotations__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__slots__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_abc_impl',
 '_asimilarity_search_with_relevance_scores',
 '_client',
 '_client_settings',
 '_collection',
 '_cosine_relevance_score_fn',
 '_embedding_function',
 '_euclidean_relevance_score_fn',
 '_get_retriever_tags',
 '_max_inner_product_relevance_score_fn',
 '_persist_directory',
 '_select_relevance_score_fn',
 '_similarity_search_with_relevance_scores',
 'aadd_documents',
 'aadd_texts',
 'add_documents',
 'add_texts',
 'adelete',
 'afrom_documents',
 'afrom_texts',
 'amax_ma

In [18]:
vectorstore.similarity_search_with_score.__doc__

'Run similarity search with Chroma with distance.\n\n        Args:\n            query (str): Query text to search for.\n            k (int): Number of results to return. Defaults to 4.\n            filter (Optional[Dict[str, str]]): Filter by metadata. Defaults to None.\n\n        Returns:\n            List[Tuple[Document, float]]: List of documents most similar to\n            the query text and cosine distance in float for each.\n            Lower score represents more similarity.\n        '

In [19]:
vectorstore.similarity_search_with_score("what is bedrock?")

[(Document(page_content='Amazon Bedrock offers several capabilities to support security and privacy requirements. Bedrock is in scope for common compliance standards such as Service and Organization Control (SOC), International Organization for Standardization (ISO), Health Insurance Portability and Accountability Act (HIPAA) eligible, and customers can use Bedrock in compliance with the General Data Protection Regulation (GDPR). Amazon Bedrock is included in the scope of the SOC 1, 2, 3 reports, allowing customers to gain insights into our security controls. We demonstrate compliance through extensive third-party audits of our AWS controls. Amazon Bedrock is one of the AWS services under ISO Compliance for the ISO 9001, ISO 27001, ISO 27017, ISO 27018, ISO 27701, ISO 22301, and ISO 20000 standards. Amazon Bedrock is CSA Security Trust Assurance and Risk (STAR) Level 2 certified, which validates the use of best practices and the security posture of AWS cloud offerings. With Amazon Bedr

In [20]:
vectorstore.similarity_search_with_relevance_scores.__doc__

'Return docs and relevance scores in the range [0, 1].\n\n        0 is dissimilar, 1 is most similar.\n\n        Args:\n            query: input text\n            k: Number of Documents to return. Defaults to 4.\n            **kwargs: kwargs to be passed to similarity search. Should include:\n                score_threshold: Optional, a floating point value between 0 to 1 to\n                    filter the resulting set of retrieved docs\n\n        Returns:\n            List of Tuples of (doc, similarity_score)\n        '

In [21]:
vectorstore.similarity_search_with_relevance_scores("what is bedrock?")

[(Document(page_content='Amazon Bedrock offers several capabilities to support security and privacy requirements. Bedrock is in scope for common compliance standards such as Service and Organization Control (SOC), International Organization for Standardization (ISO), Health Insurance Portability and Accountability Act (HIPAA) eligible, and customers can use Bedrock in compliance with the General Data Protection Regulation (GDPR). Amazon Bedrock is included in the scope of the SOC 1, 2, 3 reports, allowing customers to gain insights into our security controls. We demonstrate compliance through extensive third-party audits of our AWS controls. Amazon Bedrock is one of the AWS services under ISO Compliance for the ISO 9001, ISO 27001, ISO 27017, ISO 27018, ISO 27701, ISO 22301, and ISO 20000 standards. Amazon Bedrock is CSA Security Trust Assurance and Risk (STAR) Level 2 certified, which validates the use of best practices and the security posture of AWS cloud offerings. With Amazon Bedr

## Step4: Retriever uses the user_input to perform semantic search on vector store to retrieve relevant chunks

In [22]:
%%time
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 5})
retrieved_docs = retriever.get_relevant_documents(
    "What is bedrock?"
)

CPU times: user 8.54 ms, sys: 2.56 ms, total: 11.1 ms
Wall time: 131 ms


In [23]:
len(retrieved_docs)

5

In [24]:
print(retrieved_docs[2].page_content)

Amazon Bedrock offers several capabilities to support security and privacy requirements. Bedrock is in scope for common compliance standards such as Service and Organization Control (SOC), International Organization for Standardization (ISO), Health Insurance Portability and Accountability Act (HIPAA) eligible, and customers can use Bedrock in compliance with the General Data Protection Regulation (GDPR). Amazon Bedrock is included in the scope of the SOC 1, 2, 3 reports, allowing customers to gain insights into our security controls. We demonstrate compliance through extensive third-party audits of our AWS controls. Amazon Bedrock is one of the AWS services under ISO Compliance for the ISO 9001, ISO 27001, ISO 27017, ISO 27018, ISO 27701, ISO 22301, and ISO 20000 standards. Amazon Bedrock is CSA Security Trust Assurance and Risk (STAR) Level 2 certified, which validates the use of best practices and the security posture of AWS cloud offerings. With Amazon Bedrock, your content is not


## Step 5: Generate 

In [25]:
%%time
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.
{context}
Question: {question}
Helpful Answer:"""
rag_prompt_custom = PromptTemplate.from_template(template)

rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | rag_prompt_custom
    | llm
    | StrOutputParser()
)

CPU times: user 2.67 ms, sys: 1.97 ms, total: 4.64 ms
Wall time: 7.7 ms


## User Input to generate

In [26]:
%%time
rag_chain.invoke("What is bedrock?")

CPU times: user 44.5 ms, sys: 5.49 ms, total: 50 ms
Wall time: 2.12 s


'Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) along with a broad set of capabilities to build generative AI applications without writing any code. It simplifies development with security, privacy, and responsible AI. Thanks for asking!'

In [27]:
%%time
rag_chain.invoke("what foundation models are available in Bedrock?")

CPU times: user 48.8 ms, sys: 6.97 ms, total: 55.8 ms
Wall time: 7.26 s


"The foundation models available in Bedrock include Anthropic's Claude, AI21 Labs' Jurassic-2, Stability AI's Stable Diffusion, Cohere's Command and Embed, Meta's Llama 2, and the Amazon Titan language and embeddings models. Thanks for asking!"

In [28]:
%%time
rag_chain.invoke("what is a foundation model?")

CPU times: user 72.1 ms, sys: 7.38 ms, total: 79.4 ms
Wall time: 1.98 s


'A foundation model is a high-performing AI model that serves as the basis for building generative AI applications. It can be customized with your own data and offers a range of options from different providers. Thanks for asking!'

In [29]:
%%time
rag_chain.invoke("how can I customize a foundation model with my own data?")

CPU times: user 51 ms, sys: 7.18 ms, total: 58.2 ms
Wall time: 2.07 s


'You can customize a foundation model with your own data on Amazon Bedrock by using a visual interface without writing any code. Simply select your training and validation datasets stored in Amazon S3 and adjust the hyperparameters if needed. Thanks for asking!'

In [30]:
%%time
rag_chain.invoke("what is a hyperparameter?")

CPU times: user 45.7 ms, sys: 5.89 ms, total: 51.6 ms
Wall time: 1.84 s


'A hyperparameter is a parameter that is set before the learning process begins and determines the behavior and performance of a machine learning model. It is not learned from the data but is chosen by the user. Thanks for asking!'

In [31]:
%%time
rag_chain.invoke("give me example of hyperparameter")

CPU times: user 50.1 ms, sys: 6.79 ms, total: 56.9 ms
Wall time: 1.72 s


'An example of a hyperparameter is the learning rate in a neural network. The learning rate determines how quickly the model adjusts its weights during training. Thanks for asking!'

In [32]:
%%time
rag_chain.invoke("What is a RAG?")

CPU times: user 49.1 ms, sys: 6.14 ms, total: 55.2 ms
Wall time: 1.15 s


'RAG stands for Retrieval Augmented Generation. Thanks for asking!'

In [33]:
%%time
rag_chain.invoke("Can you expand on it?")

CPU times: user 48.8 ms, sys: 6.68 ms, total: 55.5 ms
Wall time: 2.17 s


'Yes, Amazon Bedrock allows users to set up human review workflows to evaluate models. Users can customize the instructions and buttons on the evaluation UI for their team and provide detailed instructions with examples. This method is useful for evaluating subjective criteria that require human judgment or subject matter expertise. Thanks for asking!'

In [34]:
%%time
rag_chain.invoke("How do you build a RAG application?")

CPU times: user 73.8 ms, sys: 6.99 ms, total: 80.7 ms
Wall time: 3.75 s


"To build a RAG (Retrieval Augmented Generation) application, you can use Amazon Bedrock's fully managed agents that can invoke APIs dynamically to execute tasks. These agents extend the reasoning capabilities of FMs (Foundation Models) to break down tasks, create an orchestration plan, and execute it. Additionally, you can use Knowledge Bases for Amazon Bedrock to securely connect FMs to your data sources for retrieval augmentation, making the FM more knowledgeable about your specific domain and organization. Thanks for asking!"

In [35]:
%%time
rag_chain.invoke("what are knowledge bases for amazon bedrock?")

CPU times: user 48.4 ms, sys: 6.4 ms, total: 54.7 ms
Wall time: 2.67 s


'Knowledge Bases for Amazon Bedrock is a feature that takes care of the entire ingestion workflow of converting documents into embeddings (vectors) and storing them in a specialized vector database. It supports popular vector databases such as Amazon OpenSearch Serverless, Pinecone, Redis Enterprise Cloud, Amazon Aurora, and MongoDB. Thanks for asking!'

In [36]:
%%time
rag_chain.invoke("What are embeddings?")

CPU times: user 47.8 ms, sys: 6.65 ms, total: 54.5 ms
Wall time: 1.09 s


'Embeddings are vector representations of documents that capture their semantic meaning. Thanks for asking!'