# **Retrieval Augmented Question & Answering using Amazon Bedrock, Amazon Kendra, Amazon S3 and LangChain**

## **1. What are we going to build?**

We have two options for enabling a LLM to understand and answer our private domain-specific questions:

- Fine-tune the LLM on text data covering the topic mentioned.
- Using Retrieval Augmented Generation (RAG), a technique that implements an information retrieval component to the generation process. Allowing us to retrieve relevant information and feed this information into the generation model as a secondary source of information.

In this notebook, I will show you how to use the pattern of Retrieval Augmented Generation Question Answering using AWS Kendra, AWS S3 and AWS Bedrock.

The following diagram shows what we're going build:

- The private documents, about which we want to ask questions, are stored in an S3 bucket.
- We have a Kendra Index with a connector to the S3 bucket. The Index checks the s3 bucket every N minutes for new content. If new content is found in the bucket, it gets automatically parsed and stored into Kendra database. 
- The Jupyter notebook, given a specific question, retrieves the most relevant documents from Kendra, assembles a prompt with the extracted information from Kendra, and sends it to one of the multiple avaiable LLMs in AWS Bedrock.

![diagram](https://raw.githubusercontent.com/karlospn/building-qa-app-with-aws-bedrock-kendra-s3-and-streamlit/main/docs/aws-architecture-jupyter-notebook-langchain.png)



## **2. Deploying an AWS Kendra index with an S3 connector and store documents on it**

In the ``/infra`` folder, you'll find a series of Terraform files that will create everything you'll need.

The Terraform files will create the following resources:
- An s3 bucket with our private docs on it.
- A Kendra index with an s3 connector.
- An IAM role with the required permissions to make everything work.


## **3. Building the RAG pattern using LangChain**

[LangChain](https://python.langchain.com) is a Python library that streamlines the process of creating a RAG pattern with AWS Bedrock and Kendra. It offers a set of abstractions that simplifies every essential step of the RAG workflow.

If you don't want to use a third party library like ``LangChain``, go take a look at my other Jupyther Notebook: ``rag-with-only-boto3``.    
In there, I will show you how to build the exact same RAG pattern but using only ``boto3``.


### **3.1 Import dependencies**

In [15]:
from langchain.retrievers import AmazonKendraRetriever
from langchain.llms.bedrock import Bedrock
from langchain.chains import RetrievalQA
from IPython.display import Markdown, display
import boto3

### **3.2. Create a LangChain Retriever** 

- A LangChain Retriever is an interface that returns documents given an unstructured query.
- For retrieving the most pertinent documents from AWS Kendra based on a given query, we will use the ``AmazonKendraRetriever`` provided by LangChain.

In [16]:
kendra_client = boto3.client("kendra", 'eu-west-1')
kendra_index  = '03c49eca-c1f4-4e5d-b8d8-f913f02c5b4a'
retriever = AmazonKendraRetriever(index_id=kendra_index, top_k=3, client=kendra_client, attribute_filter={
    'EqualsTo': {
        'Key': '_language_code',
        'Value': {'StringValue': 'en'}
    }
})

Here's a quick example of how you can employ the ``AmazonKendraRetriever`` to manually fetch the most relevant documents.

In [17]:
retriever.get_relevant_documents('What are the benefits of using IHttpClientFactory?')

[Document(page_content='Document Title: NET-Microservices-Architecture-for-Containerized-NET-Applications\nDocument Excerpt: \nThe alternative is to use SocketsHttpHandler with configured PooledConnectionLifetime. This approach is applied to long-lived, static or singleton HttpClient instances. To learn more about different strategies, see HttpClient guidelines for .NET. Polly is a transient-fault-handling library that helps developers add resiliency to their applications, by using some pre-defined policies in a fluent and thread-safe manner. Benefits of using IHttpClientFactory The current implementation of IHttpClientFactory, that also implements IHttpMessageHandlerFactory, offers the following benefits: • Provides a central location for naming and configuring logical HttpClient objects. For example, you may configure a client (Service Agent) that’s pre-configured to access a specific microservice. • Codify the concept of outgoing middleware via delegating handlers in HttpClient and 

### **3.3. Set up a LangChain RetrievalQA chain that uses AWS Bedrock Titan Text G1 Express LLM to generate responses to your questions**

In [18]:
max_tokens = 1000
temperature = 0.7
bedrock_client = boto3.client("bedrock-runtime", 'us-west-2')

In [19]:
titan_llm = Bedrock(model_id="amazon.titan-text-express-v1", 
        region_name='us-west-2', 
        client=bedrock_client, 
        model_kwargs={"maxTokenCount": max_tokens, "temperature": temperature})

qa = RetrievalQA.from_chain_type(llm=titan_llm, chain_type="stuff", retriever=retriever)
answer = qa('What are the benefits of using IHttpClientFactory?')

display(Markdown(answer['result']))

 Provides a central location for naming and configuring logical HttpClient objects.
Codify the concept of outgoing middleware via delegating handlers in HttpClient and implementing Polly-based middleware to take advantage of Polly’s policies for resiliency.
HttpClient already has the concept of delegating handlers that could be linked together for outgoing HTTP requests.
Offers the ability to create and configure HttpClient instances in an app through Dependency Injection (DI).

### **3.4. Set up a LangChain RetrievalQA chain that use  AWS Bedrock Anthropic Claude V2 LLM to generate responses to your questions**

In [20]:
claude_llm = Bedrock(model_id="anthropic.claude-v2", region_name='us-west-2', 
                        client=bedrock_client, 
                        model_kwargs={"max_tokens_to_sample": max_tokens, "temperature": temperature})

qa = RetrievalQA.from_chain_type(llm=claude_llm, chain_type="stuff", retriever=retriever)
answer = qa('What are the benefits of using IHttpClientFactory?')
display(Markdown(answer['result']))

 Based on the document excerpts, some of the key benefits of using IHttpClientFactory are:

- Provides a central location for naming and configuring logical HttpClient objects. For example, you can configure a client that's pre-configured to access a specific microservice.

- Allows implementing Polly-based middleware to take advantage of Polly's policies for resiliency. 

- Codifies the concept of outgoing middleware via delegating handlers in HttpClient.

- Makes HttpClient instances manageable by configuring and creating them through Dependency Injection.

So in summary, it provides a centralized way to configure and manage HttpClient instances with added benefits like middleware and resiliency policies.

### **3.5. Set up a LangChain RetrievalQA chain that uses AWS Bedrock Ai21Labs Jurassic-2 Ultra LLM to generate responses to your questions**

In [21]:
ai21_j2_ultra_llm = Bedrock(model_id="ai21.j2-ultra-v1", region_name='us-west-2', 
                        client=bedrock_client, 
                        model_kwargs={"maxTokens": max_tokens, "temperature": temperature})

qa = RetrievalQA.from_chain_type(llm=ai21_j2_ultra_llm, chain_type="stuff", retriever=retriever)
answer = qa('What are the benefits of using IHttpClientFactory?')
display(Markdown(answer['result']))


IHttpClientFactory provides a central location for naming and configuring logical HttpClient objects, as well as codifying the concept of outgoing middleware via delegating handlers in HttpClient and implementing Polly-based middleware to take advantage of Polly's policies for resiliency.