# **Retrieval Augmented Question & Answering using Amazon Bedrock, Amazon Kendra, Amazon S3 and the AWS SDK for Python library (boto3)**

## **1. What are we going to build?**

We have two options for enabling a LLM to understand and answer our private domain-specific questions:

- Fine-tune the LLM on text data covering the topic mentioned.
- Using Retrieval Augmented Generation (RAG), a technique that implements an information retrieval component to the generation process. Allowing us to retrieve relevant information and feed this information into the generation model as a secondary source of information.

In this notebook, I will show you how to use the pattern of Retrieval Augmented Generation Question Answering using AWS Kendra, AWS S3 and AWS Bedrock.

The following diagram shows what we're going build:

- The private documents, about which we want to ask questions, are stored in an S3 bucket.
- We have a Kendra Index with a connector to the S3 bucket. The Index checks the s3 bucket every N minutes for new content. If new content is found in the bucket, it gets automatically parsed and stored into Kendra database. 
- The Jupyter notebook, given a specific question, retrieves the most relevant paragraphs from Kendra, assembles a prompt with the extracted information from Kendra, and sends it to one of the multiple available LLMs in AWS Bedrock.

![diagram](https://raw.githubusercontent.com/karlospn/building-qa-app-with-aws-bedrock-kendra-s3-and-streamlit/main/docs/aws-architecture-jupyter-notebook.png)



## **2. Deploying an AWS Kendra index with an S3 connector and store documents on it**

In the ``/infra`` folder, you'll find a series of Terraform files that will create everything you'll need.

The Terraform files will create the following resources:
- An s3 bucket with our private docs on it.
- A Kendra index with an s3 connector.
- An IAM role with the required permissions to make everything work.


## **3. Building the RAG pattern using boto3**

We're going to use the ``boto3`` library to interact with AWS Kendra and AWS Bedrock.

[LangChain](https://python.langchain.com) streamlines the process of creating a RAG pattern on AWS quite a bit.     
If you want to see how to build exactly the same RAG pattern, but this time using ``LangChain`` instead of ``boto3``, go take a look at my other Jupyther Notebook: ``rag-with-langchain``.


### **3.1 Import dependencies**

In [19]:
from IPython.display import Markdown, display
import boto3
import json

### **3.2. Given a specific query, retrieve the most relevant documents from Kendra** 

In [20]:
query = 'What are the benefits of using IHttpClientFactory?'

In [21]:
import boto3

kendra_client = boto3.client("kendra", 'eu-west-1')
kendra_index  = '03c49eca-c1f4-4e5d-b8d8-f913f02c5b4a'

result = kendra_client.retrieve(QueryText = query,IndexId = 
                                kendra_index,  
                                PageSize = 3,
                                PageNumber = 1)

chunks = [retrieve_result["Content"] for retrieve_result in result["ResultItems"]]
joined_chunks = "\n".join(chunks)
print(joined_chunks)

The alternative is to use SocketsHttpHandler with configured PooledConnectionLifetime. This approach is applied to long-lived, static or singleton HttpClient instances. To learn more about different strategies, see HttpClient guidelines for .NET. Polly is a transient-fault-handling library that helps developers add resiliency to their applications, by using some pre-defined policies in a fluent and thread-safe manner. Benefits of using IHttpClientFactory The current implementation of IHttpClientFactory, that also implements IHttpMessageHandlerFactory, offers the following benefits: • Provides a central location for naming and configuring logical HttpClient objects. For example, you may configure a client (Service Agent) that’s pre-configured to access a specific microservice. • Codify the concept of outgoing middleware via delegating handlers in HttpClient and implementing Polly-based middleware to take advantage of Polly’s policies for resiliency. • HttpClient already has the concept 

### **3.3. Construct the prompt using the documents acquired from AWS Kendra**

In [22]:
prompt = f"""
Human: Answer the following question based on the context below.
If you don't know the answer, just say that you don't know. Don't try to make up an answer. Do not answer beyond this context.
---
QUESTION: {query}                                            
---
CONTEXT:
{joined_chunks}

Assistant:"""

display(Markdown(prompt))


Human: Answer the following question based on the context below.
If you don't know the answer, just say that you don't know. Don't try to make up an answer. Do not answer beyond this context.
---
QUESTION: What are the benefits of using IHttpClientFactory?                                            
---
CONTEXT:
The alternative is to use SocketsHttpHandler with configured PooledConnectionLifetime. This approach is applied to long-lived, static or singleton HttpClient instances. To learn more about different strategies, see HttpClient guidelines for .NET. Polly is a transient-fault-handling library that helps developers add resiliency to their applications, by using some pre-defined policies in a fluent and thread-safe manner. Benefits of using IHttpClientFactory The current implementation of IHttpClientFactory, that also implements IHttpMessageHandlerFactory, offers the following benefits: • Provides a central location for naming and configuring logical HttpClient objects. For example, you may configure a client (Service Agent) that’s pre-configured to access a specific microservice. • Codify the concept of outgoing middleware via delegating handlers in HttpClient and implementing Polly-based middleware to take advantage of Polly’s policies for resiliency. • HttpClient already has the concept of delegating handlers that could be linked together for outgoing HTTP requests. You can register HTTP clients into the factory and you can use a Polly handler to use Polly policies for Retry, CircuitBreakers, and so on.
To address the issues mentioned above and to make HttpClient instances manageable, .NET Core 2.1 introduced two approaches, one of them being IHttpClientFactory. It’s an interface that’s used to configure and create HttpClient instances in an app through Dependency Injection (DI). It also provides extensions for Polly-based middleware to take advantage of delegating handlers in HttpClient. The alternative is to use SocketsHttpHandler with configured PooledConnectionLifetime. This approach is applied to long-lived, static or singleton HttpClient instances. To learn more about different strategies, see HttpClient guidelines for .NET. Polly is a transient-fault-handling library that helps developers add resiliency to their applications, by using some pre-defined policies in a fluent and thread-safe manner. Benefits of using IHttpClientFactory The current implementation of IHttpClientFactory, that also implements IHttpMessageHandlerFactory, offers the following benefits: • Provides a central location for naming and configuring logical HttpClient objects. For example, you may configure a client (Service Agent) that’s pre-configured to access a specific microservice. • Codify the concept of outgoing middleware via delegating handlers in HttpClient and implementing Polly-based middleware to take advantage of Polly’s policies for resiliency.
If you need to use HttpClient without DI or with other DI implementations, consider using a static or singleton HttpClient with PooledConnectionLifetime set up. For more information, see HttpClient guidelines for .NET. Multiple ways to use IHttpClientFactory There are several ways that you can use IHttpClientFactory in your application: • Basic usage • Use Named Clients • Use Typed Clients • Use Generated Clients For the sake of brevity, this guidance shows the most structured way to use IHttpClientFactory, which is to use Typed Clients (Service Agent pattern). However, all options are documented and are currently listed in this article covering the IHttpClientFactory usage. Multiple ways to use IHttpClientFactory There are several ways that you can use IHttpClientFactory in your application: • Basic usage • Use Named Clients • Use Typed Clients • Use Generated Clients For the sake of brevity, this guidance shows the most structured way to use IHttpClientFactory, which is to use Typed Clients (Service Agent pattern). However, all options are documented and are currently listed in this article covering the IHttpClientFactory usage. Note If your app requires cookies, it might be better to avoid using IHttpClientFactory in your app. For alternative ways of managing clients, see Guidelines for using HTTP clients How to use Typed Clients with IHttpClientFactory So, what’s a “Typed Client”? It’s just an HttpClient that’s pre-configured for some specific use. This configuration can include specific values such as the base server, HTTP headers or time outs. The following diagram shows how Typed Clients are used with IHttpClientFactory: https://docs.microsoft.com/dotnet/fundamentals/networking/http/httpclient-guidelines https://docs.microsoft.com/dotnet/fundamentals/networking/http/httpclient-guidelines https://docs.microsoft.com/aspnet/core/fundamentals/http-requests#consumption-patterns https://docs.microsoft.com/dotnet/api/system.net.http.ihttpclientfactory https://docs.microsoft.com/dotnet/fundamentals/networking/http/httpclient-guidelines 302 CHAPTER 7 | Implement resilient applications Figure 8-4.

Assistant:

### **3.4. Send the generated prompt to AWS Bedrock Titan Text G1 Express LLM and receive the corresponding response**

In [23]:
bedrock_client = boto3.client("bedrock-runtime", 'us-west-2')
body = json.dumps({
    "inputText": prompt, 
    "textGenerationConfig":{
        "maxTokenCount":1500,
        "temperature":0.7,
    }
}) 

response = bedrock_client.invoke_model(body=body, modelId='amazon.titan-text-express-v1')
result = json.loads(response.get('body').read())

display(Markdown(result.get('results')[0].get('outputText')))

 The current implementation of IHttpClientFactory, that also implements IHttpMessageHandlerFactory, offers the following benefits:
* Provides a central location for naming and configuring logical HttpClient objects.
* Codify the concept of outgoing middleware via delegating handlers in HttpClient and implementing Polly-based middleware to take advantage of Polly’s policies for resiliency.
* HttpClient already has the concept of delegating handlers that could be linked together for outgoing HTTP requests.
* You can register HTTP clients into the factory and you can use a Polly handler to use Polly policies for Retry, CircuitBreakers, and so on.

### **3.5. Send the generated prompt to AWS Bedrock Anthropic Claude v2 LLM and receive the corresponding response**

In [24]:
bedrock_client = boto3.client("bedrock-runtime", 'us-west-2')

body = json.dumps({
    "prompt": prompt, 
    "max_tokens_to_sample": 1500, 
    "temperature": 0.7
})

response = bedrock_client.invoke_model(
    body=body, 
    modelId='anthropic.claude-v2'
)
response_body = json.loads(response.get("body").read())

print(response_body.get("completion"))

 Based on the provided context, some key benefits of using IHttpClientFactory include:

- It provides a central location for configuring and naming logical HttpClient instances. This makes it easier to manage and reuse HttpClients across an application.

- It allows codifying the concept of outgoing middleware via delegating handlers in HttpClient. This makes it easy to integrate Polly policies for resiliency.

- It enables registering HttpClients into the factory and using Polly handlers to apply resiliency policies like retry, circuit breakers, etc. 

- It offers multiple ways to use it - basic usage, named clients, typed clients, generated clients. The typed client approach provides a clean way to define HttpClients for specific use cases.

- Compared to using static/singleton HttpClient, it provides better lifetime management, configuration and middleware integration.

So in summary, it makes working with HttpClient more structured, maintainable and resilient.


### **3.6. Send the generated prompt to AWS Bedrock Ai21Labs Jurassic-2 Ultra LLM and receive the corresponding response**

In [25]:
bedrock_client = boto3.client("bedrock-runtime", 'us-west-2')

body = json.dumps({
    "prompt": prompt, 
    "maxTokens": 1500, 
    "temperature": 0.7
})

response = bedrock_client.invoke_model(
    body=body, 
    modelId='ai21.j2-ultra-v1'
)
response_body = json.loads(response.get("body").read())

print(response_body.get("completions")[0].get("data").get("text"))

 Based on the context provided, the benefits of using IHttpClientFactory include:

1. Providing a central location for naming and configuring logical HttpClient objects. For example, you can preconfigure a client that is specifically configured to access a specific microservice.
2. Codifying the concept of outgoing middleware via delegating handlers in HttpClient and implementing Polly-based middleware to take advantage of Polly's policies for resiliency.
3. Managing HttpClient instances through Dependency Injection (DI).

Please note that if your application requires cookie management, it may be better to avoid using IHttpClientFactory and instead use an alternative method for managing your HttpClient clients.
