# **Retrieval Augmented Question & Answering using Amazon Bedrock, Amazon Kendra and boto3**

## **1. What're going to build**

We have two options for enabling a LLM model to understand and answer our private domain-specific questions:

- Fine-tune the LLM on text data covering the topic mentioned.
- Using Retrieval Augmented Generation (RAG), a technique that implements an information retrieval component to the generation process. Allowing us to retrieve relevant information and feed this information into the generation model as a secondary source of information.

In this notebook, I will show you how to use the pattern of Retrieval Augmented Generation Question Answering using AWS Kendra, AWS S3 and AWS Bedrock.

The following diagram shows what we're going build:

- The private documents, about which we want to ask questions, are stored in an S3 bucket.
- We have a Kendra index with a connector to the S3 bucket, which allows us to store the data from the private documents in the Kendra database.
- The Jupyter notebook, given a specific question, retrieves the most relevant paragraphs from Kendra, assembles a prompt with the extracted information from Kendra, and sends it to one of the multiple LLMs available in AWS Bedrock.
- One of the multiple LLMs that make up AWS Bedrock is responsible for constructing the final response shown to the user.

![diagram](https://raw.githubusercontent.com/karlospn/building-qa-app-with-aws-bedrock-kendra-s3-and-streamlit/main/docs/aws-architecture-jupyter-notebook.png)



## **2. Deploying an AWS Kendra index with an S3 connector and store documents on it**

In the ``/infra`` folder, you'll find a series of Terraform files that will create everything you'll need.

The Terraform files will create the following resources:
- An s3 bucket with our private docs on it.
- A Kendra index with an s3 connector.
- An IAM role with the required permissions to make everything work.


## **3. Building the RAG pattern using boto3**

We're going to use the ``boto3`` library to interact with AWS Kendra and AWS Bedrock.

[LangChain](https://python.langchain.com) streamlines the process of creating a RAG pattern on AWS quite a bit. If you want to see how to build exactly the same RAG pattern, but this time using ``LangChain`` instead of ``boto3``, go take a look at my other Jupyther Notebook: ``rag-with-langchain``.


### **3.1 Import dependencies**

In [18]:
from IPython.display import Markdown, display
import boto3
import json

### **3.2. Given a specific query, retrieve the most relevant documents from Kendra** 

In [19]:
query = 'What are the benefits of using IHttpClientFactory?'

In [36]:
import boto3

kendra_client = boto3.client("kendra", 'eu-west-1')
kendra_index  = 'feca65ea-5bc7-405d-b116-7c4d42d148a5'

result = kendra_client.retrieve(QueryText = query,IndexId = 
                                kendra_index,  
                                PageSize = 3,
                                PageNumber = 1)

chunks = [retrieve_result["Content"] for retrieve_result in result["ResultItems"]]
joined_chunks = "\n".join(chunks)
print(joined_chunks)

The alternative is to use SocketsHttpHandler with configured PooledConnectionLifetime. This approach is applied to long-lived, static or singleton HttpClient instances. To learn more about different strategies, see HttpClient guidelines for .NET. Polly is a transient-fault-handling library that helps developers add resiliency to their applications, by using some pre-defined policies in a fluent and thread-safe manner. Benefits of using IHttpClientFactory The current implementation of IHttpClientFactory, that also implements IHttpMessageHandlerFactory, offers the following benefits: • Provides a central location for naming and configuring logical HttpClient objects. For example, you may configure a client (Service Agent) that’s pre-configured to access a specific microservice. • Codify the concept of outgoing middleware via delegating handlers in HttpClient and implementing Polly-based middleware to take advantage of Polly’s policies for resiliency. • HttpClient already has the concept 

### **3.3. Construct the prompt using the documents acquired from AWS Kendra**

In [37]:
prompt = f"""
Answer the following question based on the context below.
If you don't know the answer, just say that you don't know. Don't try to make up an answer. Do not answer beyond this context.
---
QUESTION: {query}                                            
---
CONTEXT:
{joined_chunks}
"""

display(Markdown(prompt))


Answer the following question based on the context below.
If you don't know the answer, just say that you don't know. Don't try to make up an answer. Do not answer beyond this context.
---
QUESTION: What are the benefits of using IHttpClientFactory?                                            
---
CONTEXT:
The alternative is to use SocketsHttpHandler with configured PooledConnectionLifetime. This approach is applied to long-lived, static or singleton HttpClient instances. To learn more about different strategies, see HttpClient guidelines for .NET. Polly is a transient-fault-handling library that helps developers add resiliency to their applications, by using some pre-defined policies in a fluent and thread-safe manner. Benefits of using IHttpClientFactory The current implementation of IHttpClientFactory, that also implements IHttpMessageHandlerFactory, offers the following benefits: • Provides a central location for naming and configuring logical HttpClient objects. For example, you may configure a client (Service Agent) that’s pre-configured to access a specific microservice. • Codify the concept of outgoing middleware via delegating handlers in HttpClient and implementing Polly-based middleware to take advantage of Polly’s policies for resiliency. • HttpClient already has the concept of delegating handlers that could be linked together for outgoing HTTP requests. You can register HTTP clients into the factory and you can use a Polly handler to use Polly policies for Retry, CircuitBreakers, and so on.
To address the issues mentioned above and to make HttpClient instances manageable, .NET Core 2.1 introduced two approaches, one of them being IHttpClientFactory. It’s an interface that’s used to configure and create HttpClient instances in an app through Dependency Injection (DI). It also provides extensions for Polly-based middleware to take advantage of delegating handlers in HttpClient. The alternative is to use SocketsHttpHandler with configured PooledConnectionLifetime. This approach is applied to long-lived, static or singleton HttpClient instances. To learn more about different strategies, see HttpClient guidelines for .NET. Polly is a transient-fault-handling library that helps developers add resiliency to their applications, by using some pre-defined policies in a fluent and thread-safe manner. Benefits of using IHttpClientFactory The current implementation of IHttpClientFactory, that also implements IHttpMessageHandlerFactory, offers the following benefits: • Provides a central location for naming and configuring logical HttpClient objects. For example, you may configure a client (Service Agent) that’s pre-configured to access a specific microservice. • Codify the concept of outgoing middleware via delegating handlers in HttpClient and implementing Polly-based middleware to take advantage of Polly’s policies for resiliency.
If you need to use HttpClient without DI or with other DI implementations, consider using a static or singleton HttpClient with PooledConnectionLifetime set up. For more information, see HttpClient guidelines for .NET. Multiple ways to use IHttpClientFactory There are several ways that you can use IHttpClientFactory in your application: • Basic usage • Use Named Clients • Use Typed Clients • Use Generated Clients For the sake of brevity, this guidance shows the most structured way to use IHttpClientFactory, which is to use Typed Clients (Service Agent pattern). However, all options are documented and are currently listed in this article covering the IHttpClientFactory usage. Multiple ways to use IHttpClientFactory There are several ways that you can use IHttpClientFactory in your application: • Basic usage • Use Named Clients • Use Typed Clients • Use Generated Clients For the sake of brevity, this guidance shows the most structured way to use IHttpClientFactory, which is to use Typed Clients (Service Agent pattern). However, all options are documented and are currently listed in this article covering the IHttpClientFactory usage. Note If your app requires cookies, it might be better to avoid using IHttpClientFactory in your app. For alternative ways of managing clients, see Guidelines for using HTTP clients How to use Typed Clients with IHttpClientFactory So, what’s a “Typed Client”? It’s just an HttpClient that’s pre-configured for some specific use. This configuration can include specific values such as the base server, HTTP headers or time outs. The following diagram shows how Typed Clients are used with IHttpClientFactory: https://docs.microsoft.com/dotnet/fundamentals/networking/http/httpclient-guidelines https://docs.microsoft.com/dotnet/fundamentals/networking/http/httpclient-guidelines https://docs.microsoft.com/aspnet/core/fundamentals/http-requests#consumption-patterns https://docs.microsoft.com/dotnet/api/system.net.http.ihttpclientfactory https://docs.microsoft.com/dotnet/fundamentals/networking/http/httpclient-guidelines 302 CHAPTER 7 | Implement resilient applications Figure 8-4.


### **3.4. Send the generated prompt to AWS Bedrock Titan Large LLM and receive the corresponding response**

In [38]:
bedrock_client = boto3.client("bedrock", 'us-west-2')
body = json.dumps({
    "inputText": prompt, 
    "textGenerationConfig":{
        "maxTokenCount":1500,
        "temperature":0.7,
    }
}) 

response = bedrock_client.invoke_model(body=body, modelId='amazon.titan-tg1-large')
result = json.loads(response.get('body').read())

display(Markdown(result.get('results')[0].get('outputText')))

• Provides a central location for naming and configuring logical HttpClient objects. For example, you may configure a client (Service Agent) that’s pre-configured to access a specific microservice.

### **3.5. Send the generated prompt to AWS Bedrock Anthropic Claude v2 LLM and receive the corresponding response**

In [39]:
bedrock_client = boto3.client("bedrock", 'us-west-2')

body = json.dumps({
    "prompt": prompt, 
    "max_tokens_to_sample": 1500, 
    "temperature": 0.7
})

response = bedrock_client.invoke_model(
    body=body, 
    modelId='anthropic.claude-v2'
)
response_body = json.loads(response.get("body").read())

print(response_body.get("completion"))

Typed Clients with IHttpClientFactory This diagram shows the following steps: 1. Services that need to make outgoing HTTP calls are configured to depend on an injected IHttpClientFactory. 2. At startup, the services that need clients register Typed Clients using IHttpClientFactory. 3. The configuration information is added, including base address, policies, and any other information. 4. When the service needs an HttpClient, it calls CreateClient on the factory interface. 5. The factory returns an HttpClient instance matching the registration. Notice how each Typed Client has configuration information and a name associated with it. The configuration information could include policies for that client, a base address, default headers, and so on. Using Typed Clients promotes reuse by consolidating the HttpClient configuration in one place. This consolidation can: • Reduce complexity in the app by encapsulating HttpClient configuration within IHttpClientFactory. • Standardize how HttpClient

### **3.6. Send the generated prompt to AWS Bedrock Ai21Labs J2 Grande Instruct LLM and receive the corresponding response**

In [40]:
bedrock_client = boto3.client("bedrock", 'us-west-2')

body = json.dumps({
    "prompt": prompt, 
    "maxTokens": 1500, 
    "temperature": 0.7
})

response = bedrock_client.invoke_model(
    body=body, 
    modelId='ai21.j2-grande-instruct'
)
response_body = json.loads(response.get("body").read())

print(response_body.get("completions")[0].get("data").get("text"))

The benefits described are:
* Central location for naming and configuring logical HttpClient objects.
* Codify the concept of outgoing middleware via delegating handlers in HttpClient and implementing Polly-based middleware to take advantage of Polly's policies for resiliency.
* HttpClient already has the concept of delegating handlers that could be linked together for outgoing HTTP requests.
The benefits of using IHttpClientFactory include providing a central location for naming and configuring logical HttpClient objects, codifying the concept of outgoing middleware via delegating handlers in HttpClient and implementing Polly-based middleware to take advantage of Polly's policies for resiliency, and having the concept of delegating handlers that could be linked together for outgoing HTTP requests.
