# RAG using OpenSearch Neural Plugin

[Retrieval Augmented Generation](https://arxiv.org/abs/2005.11401) is a process that uses retrieval-based models such as Amazon Bedrock titan and generative models such as Anthropic Claude to enhance natural language generation by retrieving relevant information and incorporating it into the generation process. 

Much like the previous lab we are going to be writing a simple RAG application code that allows user to ask questions about various wines so they can make a purchasing decision. In this lab however we will use OpenSearch neural plugin that makes it simple to work with vector embeddings by creating embeddings for indexed data at indexing time.

## 1. Lab Pre-requisites

#### a. Download and install python dependencies

For this notebook we require the use of a few libraries. We'll use the Python clients for OpenSearch and SageMaker, and Python frameworks for text embeddings.

In [None]:
!pip install opensearch-py-ml accelerate tqdm --quiet
!pip install sagemaker --upgrade --quiet
!pip install requests_aws4auth --quiet
!pip install alive-progress --quiet
!pip install deprecated --quiet

#OpenSearch Python SDK
!pip install opensearch_py  --quiet
#Progress bar for for loop
!pip install alive-progress  --quiet

# As in the previous modules, let's import PyTorch and confirm that the latest version of PyTorch is running. 
# The version should already be 1.13.1 or higher. If not, we will restart the kernel.

import torch
pytorch_version = torch.__version__
print( f"Pytorch version: {pytorch_version}")

def restartkernel() :
    display_html("<script>Jupyter.notebook.kernel.restart()</script>",raw=True)
    
if pytorch_version.startswith('1.1'):
    from IPython.display import display_html
    restartkernel()

#### b. Import libraries & initialize resource information
The line below will import all the relevant libraries and modules used in this notebook.

In [None]:
import boto3
import os
import time
import json
import pandas as pd
from tqdm import tqdm
import sagemaker
from opensearchpy import OpenSearch, RequestsHttpConnection
from sagemaker import get_execution_role
import random 
import string
import s3fs
from urllib.parse import urlparse
from IPython.display import display, HTML
from alive_progress import alive_bar
from opensearch_py_ml.ml_commons import MLCommonClient
from requests_aws4auth import AWS4Auth
import requests 

#### Get CloudFormation stack output variables

We have preconfigured a few resources by creating a cloudformation stack in the account. Information of these resources will be used within this lab. We are going to load some of the information variables here.

You can ignore any "PythonDeprecationWarning" warnings.

In [None]:
# Create a Boto3 session
session = boto3.Session()

# Get the account id
account_id = boto3.client('sts').get_caller_identity().get('Account')

# Get the current region
region = session.region_name

cfn = boto3.client('cloudformation')

# Method to obtain output variables from Cloudformation stack. 
def get_cfn_outputs(stackname):
    outputs = {}
    for output in cfn.describe_stacks(StackName=stackname)['Stacks'][0]['Outputs']:
        outputs[output['OutputKey']] = output['OutputValue']
    return outputs

## Setup variables to use for the rest of the demo
cloudformation_stack_name = "advanced-opensearch-rag"

outputs = get_cfn_outputs(cloudformation_stack_name)
aos_host = outputs['OpenSearchDomainEndpoint']
s3_bucket = outputs['s3BucketTraining']
bedrock_inf_iam_role = outputs['BedrockBatchInferenceRole']
bedrock_inf_iam_role_arn = outputs['BedrockBatchInferenceRoleArn']
sagemaker_notebook_url = outputs['SageMakerNotebookURL']

# We will just print all the variables so you can easily copy if needed.
outputs

## 2. Prepare data
This lab combines semantic search with a generative model to present the retrieved data to the user . Below is a dataset of wine reviews, we'll sample this data set to recommend wines that resemble the user provided description.

#### Mandatory steps to download the data manually
Within these labs you will need to download the dataset from various sources. One is Kaggle (You will need to create a free account):
https://www.kaggle.com/datasets/christopheiv/winemagdata130k?select=winemag-data-130k-v2.json

Click **Download** button on the dataset page above. Once downloaded in your laptop, you will resume with following steps.

1. Execute the following cell to get URL to SageMaker Notebook. Click the URL to open sagemaker notebook instance.

In [None]:
link = f'<a href="{sagemaker_notebook_url}" target="_blank">Sagemaker notebook URL</a>'
display(HTML(link))

2. Browse to the `retrieval-augment-generation` directory
3. Click "Upload" to upload the zip downloaded from Kaggle
4. Click "New" -> "Terminal" to open a terminal window
5. Navigate to the `SageMaker/advanced-rag-amazon-opensearch/retrieval-augment-generation` directory by using following command. 
```
cd SageMaker/advanced-rag-amazon-opensearch/retrieval-augment-generation
```

6. Unzip the uploaded zip file using following command

```
unzip archive.zip
```

Make sure the unzipped file `winmag-data-130k-v2.json` is in the same directory as this python notebook.

After downloading and extracting the json file, execute the following cells to inspect the dataset, transform it into a pandas DataFrame, and sample a subset of the data.

#### Sampling subset of the records to load into opensearch quickly
Since the data is composed of 129,000 records, it could take some time to convert them into vectors and load them in a vector store. Therefore, we will take a subset (300 records) of our data. We will add a variable called record_id which corresponds to the index of the record

In [None]:
#Following code will not work without completing the above steps 
df = pd.read_json('winemag-data-130k-v2.json')
df_sample = df.sample(300,random_state=37).reset_index()
df_sample['record_id'] = range(1, len(df_sample) + 1)
df_sample[:5]

## 3. Create a connection with OpenSearch cluster.
Next, we'll use Python API to set up connection with OpenSearch Cluster.

#### Important pre-requisite
You should have followed the steps in the Lab instruction section to map Sagemaker notebook role to OpenSearch `ml_full_access` role. If not, please visit the lab instructions and complete the **Setting up permission for Notebook IAM Role** section.

#### Retrieving credentials from Secrets manager
We are going to use Sagemaker Notebook IAM role to configure the workflows in OpenSearch. This IAM Role has permission to pass BedrockInference IAM role to OpenSearch. OpenSearch will then be able to use BedrockInference IAM role to make calls to Bedrock models.

##### NOTE: 
_At any point in this exercise if you get a failure message - **The security token included in the request is expired.**_ You can resolve it by running this cell. The cell refreshes the security credentials we will be using through the rest of the lab.

In [None]:
kms = boto3.client('secretsmanager')
aos_credentials = json.loads(kms.get_secret_value(SecretId=outputs['OpenSearchSecret'])['SecretString'])

#credentials = boto3.Session().get_credentials()
#auth = AWSV4SignerAuth(credentials, region)
auth = (aos_credentials['username'], aos_credentials['password'])

aos_client = OpenSearch(
    hosts = [{'host': aos_host, 'port': 443}],
    http_auth = auth,
    use_ssl = True,
    verify_certs = True,
    connection_class = RequestsHttpConnection
)
ml_client = MLCommonClient(aos_client)

#initializing some variables that we will use later.

connector_id = ""
model_id = ""

## 4. Create and deploy model connector to Amazon Bedrock titan v2 

Following cell will create a connector using Notebook role. Following cell will create a connection with Amazon Bedrock titan v2 model. Following cell defines the connector configuration.

In [None]:
import boto3
import requests 
from requests_aws4auth import AWS4Auth
import json


if not connector_id:
    host = f'https://{aos_host}/'
    service = 'es'
    credentials = boto3.Session().get_credentials()
    awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)


    # Register repository
    path = '_plugins/_ml/connectors/_create'
    url = host + path

    payload = {
        "name": "Amazon Bedrock Connector: embedding",
        "description": "The connector to bedrock Titan embedding model",
        "version": 1,
        "protocol": "aws_sigv4",
        "credential": {
          "roleArn": f"arn:aws:iam::{account_id}:role/{bedrock_inf_iam_role}"
       },
       "parameters": {
        "region": region,
        "service_name": "bedrock",
        "model": "amazon.titan-embed-text-v2:0"
       },
       "actions": [
        {
          "action_type": "predict",
          "method": "POST",
          "url": "https://bedrock-runtime.${parameters.region}.amazonaws.com/model/${parameters.model}/invoke",
          "headers": {
            "content-type": "application/json",
            "x-amz-content-sha256": "required"
          },
         "request_body": "{ \"inputText\": \"${parameters.inputText}\" }",
         "pre_process_function": "connector.pre_process.bedrock.embedding",
         "post_process_function": "connector.post_process.bedrock.embedding"}
       ]
    }
    headers = {"Content-Type": "application/json"}

    r = requests.post(url, auth=awsauth, json=payload, headers=headers)
    print(r.status_code)
    print(r.text)
    connector_id = json.loads(r.text)["connector_id"]
else:
    print(f"Connector already exists - {connector_id}")
    
connector_id

Once the model connector is defined. We need to register the model and deploy. Following two cells will register and then deploy the model connection.

In [None]:
# Register the model
if not model_id:
    path = '_plugins/_ml/models/_register'
    url = 'https://'+aos_host + '/' + path
    payload = { "name": "Bedrock Titan embeddings model",
    "function_name": "remote",
    "description": "Bedrock Titan text embeddings model",
    "connector_id": connector_id}
    r = requests.post(url, auth=awsauth, json=payload, headers=headers)
    model_id = json.loads(r.text)["model_id"]
else:
    print("skipping model registration - model already exists")
print("Model registered under model_id: "+model_id)


In [None]:
# Deploy the model
path = '_plugins/_ml/models/'+model_id+'/_deploy'
url = 'https://'+aos_host + '/' + path
r = requests.post(url, auth=awsauth, headers=headers)
deploy_status = json.loads(r.text)["status"]
print("Deployment status of the model, "+model_id+" : "+deploy_status)

#### Create a test embedding to see model deploying working alright

In [None]:
input_sentences = ["food for my wine"]
embedding_output = ml_client.generate_embedding(f"{model_id}", input_sentences)
embed = embedding_output['inference_results'][0]['output'][0]['data']
print(embed[:5])

## 5. Create ingest pipeline
Let's create an ingestion pipeline that will call Amazon Bedrock titan model and convert the description field in the wine review to vector embedding. Ingest pipeline is a concept in OpenSearch that allows you to define certain actions to be performed at the time of data ingestion. You could do simple processing such as adding a static field, modify an existing field in the data, or call a remote model to get inference and store inference output together with the indexed record/document.

Following ingestion pipeline is going to call our remote model and convert wine review `description` field to vector and store it in the field called `description_embedding`
 

In [None]:
path =  "/_ingest/pipeline/wine-ingest-pipeline"
url = f"{aos_host}{path}"

payload = {
  "description": "An Wine index ingest pipeline",
  "processors": [
    {
      "text_embedding": {
        "model_id": f"{model_id}",
        "field_map": {
          "description": "description_embedding"
        }
      }
    }
  ]
}

aos_client.ingest.put_pipeline(id="wine-ingest-pipeline", body=payload)



## 6. Create a index in Amazon Opensearch Service 
Now we will define our index for wine reviews. We are going to define multiple fields from original data and a KNN field that will store description embeddings.

To create the index, we first define the index in JSON, then use the aos_client connection we initiated ealier to create the index in OpenSearch.

In [None]:
knn_index = {
    "settings": {
        "index.knn": True,
        "number_of_shards": 1,
        "number_of_replicas": 0,
        "default_pipeline": "wine-ingest-pipeline", 
        "analysis": {
          "analyzer": {
            "default": {
              "type": "standard",
              "stopwords": "_english_"
            }
          }
        }
    },
    "mappings": {
        "properties": {
            "description_embedding": {
                    "type": "knn_vector",
                    "dimension": 1024,
                    "method": {
                        "name": "hnsw",
                        "space_type": "innerproduct",
                        "engine": "faiss",
                        "parameters": {
                          "ef_construction": 256,
                          "m": 48
                        }
                      }
                    },
            "description": {
                "type": "text",
                "store": True
            },
            "designation": {
                "type": "text",
                "store": True
            },
            "variety": {
                "type": "text",
                "store": True
            },
            "country": {
                "type": "text",
                "store": True
            },
            "winery": {
                "type": "text",
                "store": True
            },
            "points": {
                "type": "integer",
                "store": True
            },
        }
    }
}


Using the above index definition, we now need to create the index in Amazon OpenSearch. Running this cell will recreate the index if you have already executed this notebook.

In [None]:
index_name = "wine_knowledge_base"

try:
    aos_client.indices.delete(index=index_name)
    print("Recreating index '" + index_name + "' on cluster.")
    aos_client.indices.create(index=index_name,body=knn_index,ignore=400)
except:
    print("Index '" + index_name + "' not found. Creating index on cluster.")
    aos_client.indices.create(index=index_name,body=knn_index,ignore=400)


Let's verify the created index information

In [None]:
aos_client.indices.get(index=index_name)

## 7. Load the raw data into the Index
Next, let's load the wine review data and embedding into the index we've just created. Notice that we will store our embedding in `description_vector` field which will later be used for KNN search

In [None]:
cnt = 0
batch = 0
action = json.dumps({ "index": { "_index": index_name } })
body_ = ''


with alive_bar(len(df_sample), force_tty = True) as bar:
    for index, record in (df_sample.iterrows()):

        payload={
           "description": record["description"],
           "points":record["points"],
           "variety":record["variety"],
           "country":record["country"],
           "designation":record["designation"],
           "winery":record["winery"]
        }
        body_ = body_ + action + "\n" + json.dumps(payload) + "\n"
        cnt = cnt+1
        
        if(cnt == 100):
            
            response = aos_client.bulk(
                                index = index_name,
                                 body = body_)
            cnt = 0
            batch = batch +1
            body_ = ''
        
        bar()
print("Total Bulk batches completed: "+str(batch))

To validate the load, we'll query the number of documents number in the index. We should have 300 hits in the index, or however many was specified earlier in sampling.

In [None]:
res = aos_client.search(index=index_name, body={"query": {"match_all": {}}})
print("Records found: %d." % res['hits']['total']['value'])

## 8. Search vector with "Semantic Search" 

Now we can define a helper function to execute the search query for us to find a wine whose review most closely matches the requested description. `retrieve_opensearch_with_semantic_search` embeds the search phrase, searches the index for the closest matching vector, and returns the top result.


In [None]:
def retrieve_opensearch_with_semantic_search(phrase, n=3):
    osquery={
        "_source": {
            "exclude": [ "description_embedding" ]
        },
        
      "size": n,
      "query": {
        "neural": {
          "description_embedding": {
            "query_text": f"{phrase}",
            "model_id": f"{model_id}",
            "k": 5
          }
        }
      }    
    }

    res = aos_client.search(index=index_name, 
                           body=osquery,
                           stored_fields=["description","winery","points", "designation", "country"],
                           explain = True)
    top_result = res['hits']['hits']
    
    results = []
    
    for entry in top_result:
        result = {
            "description":entry['_source']['description'],
            "winery":entry['_source']['winery'],
            "points":entry['_source']['points'],
            "designation":entry['_source']['designation'],
            "country":entry['_source']['country'],
            "variety":entry['_source']['variety'],
        }
        results.append(result)
    
    return results


Use the semantic search to get similar records with the sample question.

In [None]:
question_on_wine = "great tasting wine for thanks giving"
example_request = retrieve_opensearch_with_semantic_search(question_on_wine)
print(json.dumps(example_request, indent=4))

## 9. Prepare a method to call Amazon Bedrock - Anthropic Claude Sonnet model

Now we will define a function to call LLM to answer user's question. As LLM is trained with static data, and it does not have our wine review knowledge. While it may be able to answer, it may not be an answer that a business prefers. For example. in our case, we would not want to recommend a wine that we do not stock. So the recommendation has to be one of the wines from our collection i.e. 300 reviews that we loaded. 

After defining this function we will call it to see how LLM answers questions without the wine review data.

In [None]:
def query_llm_endpoint_with_json_payload(encoded_json):

    # Create a Bedrock Runtime client
    bedrock_client = boto3.client('bedrock-runtime')
    # Set the model ID for Claude 3 Sonnet
    model_id = 'anthropic.claude-3-sonnet-20240229-v1:0'
    accept = 'application/json'
    content_type = 'application/json'


    try:
        # Invoke the model with the native request payload
        response = bedrock_client.invoke_model(
            modelId=model_id,
            body=str.encode(str(encoded_json)),
            accept = accept,
            contentType=content_type
        )

        # Decode the response body
        response_body = json.loads(response.get('body').read())
        return response_body
    except Exception as e:
        print(f"Error: {e}")
        return none

def query_llm(system, user_question):

    # Prepare the model's payload
    payload = json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 10000,
        "system": system,
        "messages": [
            {
              "role": "user",
              "content": [
                {
                  "type": "text",
                  "text": f"{user_question}"
                }
              ]
            }
          ]
        })
    


    query_response = query_llm_endpoint_with_json_payload(payload)

    return query_response['content'][0]['text']


Let's check the generated result for a wine recommendation without retrieving our reviews. It may not be one of the wine that we stock.

In [None]:
def query_llm_without_rag(question):

    # In claude system prompt defines the role you want the Claude model to play
    system_prompt = "You are a sommelier that uses their vast knowledge of wine to make great recommendations people will enjoy."
    
    # User prompt contains the instructions for the model.
    user_prompt = f"As a sommelier, you must include the wine variety, the country of origin, and a colorful description relating to the customer question.\n Customer Question:{question}. Please provide name of the wine at the end of the answer, in a new line, in format Wine name: <wine name>"
    return query_llm(system_prompt, user_prompt)

question_on_wine = "Which Portuguese wine goes well with a steak?"

print(f"The recommened wine from LLM without RAG: \n\n{query_llm_without_rag(question_on_wine)}\n")

#### Testing for hallucination. 
Let's copy the wine name from the last line and past it in the question variable below to see if we have this wine in our stock. Please review the list of wines that are returned. They may be from portugal but not exactly the one we have been recommended by the model.

In [None]:
wine_name = "Quinta do Vesuvio Touriga Nacional"
example_request = retrieve_opensearch_with_semantic_search(wine_name)
print(json.dumps(example_request, indent=4))

## 10. Retrieval Augmented Generation
---
To resolve LLM hallunination problem, we can more context to LLM so that LLM can use context information to fine the model and generated factual result. RAG is one of the solution to the LLM hallucination. 


#### Create a prompt for the LLM using the search results from OpenSearch (RAG)

We will be using the Anthropic Sonnet model with one-shot prompting technique. Within instructions to the model in the prompt, we will provide a sample wine review and how model should use to answer user's question. At the end of the prompt wine reviews retrieved from Opensearch will be included for model to use. 

Before querying the model, the below function `generate_rag_based_system_prompt` is used to put together user prompt. The function takes in an input string to search the OpenSearch cluster for a matching wine, then compose the user prompt for LLM. 

System prompt defines the role that LLM plays.

User prompt contains the instructions and the context information that LLM model uses to answer user's question.

The prompt is in the following format:

**SYSTEM PROMPT:**

```
You are a sommelier that uses their vast knowledge of wine to make great recommendations people will enjoy. 
```


**USER PROMPT**
```
As a sommelier, you must include the wine variety, the country of origin, and a colorful description relating to the user's question.

Data:{'description': 'This perfumey white dances in intense and creamy layers of stone fruit and vanilla, remaining vibrant and balanced from start to finish. The generous fruit is grown in the relatively cooler Oak Knoll section of the Napa Valley. This should develop further over time and in the glass.', 'winery': 'Darioush', 'points': 92, 'designation': None, 'country': 'US'}

Recommendation:I have a wonderful wine for you. It's a dry, medium bodied white wine from Darioush winery in the Oak Knoll section of Napa Valley, US. It has flavors of vanilla and oak. It scored 92 points in wine spectator.

Data: {retrieved_documents}

Question from the user as is
```



### package the prompt and query the LLM
We will create a final function to query the LLM with the prompt. `query_llm_with_rag` is a function that calls LLM in a RAG.

`query_llm_with_rag` combines everything we've done in this module. It does all of the following:
- searches the OpenSearch index with semantic search for the relevant wine with "description vector"
- generate an LLM prompt from the search results
- queriy the LLM with RAG for a response

In [None]:
def query_llm_with_rag(user_question):
    retrieved_documents = retrieve_opensearch_with_semantic_search(user_question)
    one_shot_description_example = "{'description': 'This perfumey white dances in intense and creamy layers of stone fruit and vanilla, remaining vibrant and balanced from start to finish. The generous fruit is grown in the relatively cooler Oak Knoll section of the Napa Valley. This should develop further over time and in the glass.', 'winery': 'Darioush', 'points': 92, 'designation': None, 'country': 'US'}"
    one_shot_response_example = "I have a wonderful wine for you. It's a dry, medium bodied white wine from Darioush winery in the Oak Knoll section of Napa Valley, US. It has flavors of vanilla and oak. It scored 92 points in wine spectator."
    system_prompt= "You are a sommelier that uses vast knowledge of wine to make great recommendations people will enjoy"
    user_prompt = (
        f"As a sommelier, you must include the wine variety, the country of origin, and a colorful description relating to the user question. You are must pick a wine in \"Wine data\" section only, one that matches best the customer question. Do not suggest anything outside of the wine data provided. You don't necessarily have to pick the top rated wine if its not best suited for customer question.\n"
        f"Wine data: {one_shot_description_example} \n Recommendation: {one_shot_response_example} \n"
        f"Wine data: {retrieved_documents} \n"
        f"Customer Question: {user_question} \n"        
    )
    response = query_llm(system_prompt, user_prompt)
    return response

#### And finally, let's call the function and get a wine recommendation.

In [None]:
question_on_wine="Best Australian wine that goes great with beef ?"
recommendation = query_llm_with_rag(question_on_wine)
print(recommendation)

print(f"\n\ndocuments retrieved for above recommendations were \n\n{json.dumps(retrieve_opensearch_with_semantic_search(question_on_wine), indent=4)}")

#### Let's change it to Italian wine - it should produce a matching result.
We will call the same method again to see if there is an italian wine in our catalog.

In [None]:
question_on_wine="Best Italian wine that goes great with steak?"
recommendation = query_llm_with_rag(question_on_wine)
print(recommendation)

print(f"\n\ndocuments retrieved for above recommendations were \n\n{json.dumps(retrieve_opensearch_with_semantic_search(question_on_wine), indent=4)}")

You might notice that we asked for Australian wines that goes well with steak and we do not have any such wine in our collection. Therefore the model politely excuses. You may change the question and see how LLM recommends a wine from our select list that best suites your question.

### Additional info: changing kwargs for querying the LLM
If you want to change or add new parameters for LLM querying, you're able to add in new keyword arguments to the `query_llm` function. For example, to change the `temperature` value, simply change the function call:
`query_llm(description phrase, temperature = new float value)`