# Build Multimodal RAG with Amazon OpenSearch Service - to share

In this notebook, you will build and run multimodal search using a sample retail dataset. You will use multimodal generated embeddings for text and image and experiment by running text search only, image search only and both text and image search in OpenSearch Service.

You will be using a retail dataset that contains 2,465 retail product samples that belong to different categories such as accessories, home decor, apparel, housewares, books, and instruments. Each product contains metadata including the ID, current stock, name, category, style, description, price, image URL, and gender affinity of the product. You will be using only the product image and product description fields in the solution.

 
Step 1: Create embeddings for text and images

Step 2: Store the embeddings in OpenSearch Service index

Step 3: Use LLM to generate text using the context from OpenSearch




---

Step 1: 

1. Build AI/connector between AOS and Embedding model - Titan Mulitmodal embeddings model
2. Register/Deploy the Embedding model in AOS
3. Create a KNN index in AOS
4. Create an ingest pipeline to generate the embedding inside AOS

Step 2:

1. Index the data

Step 3:

1. Run multimodal neural search query in AOS 
2. Feed the LLM with the extract results from AOS - Claude Sonnet 3 
2.1. Build AI/connector between AOS and LLM to generate the text
2.2. Register/Deploy the LLM in AOS
2.3. Run conversational search query in AOS (using the LLM model)

## 1. Lab Pre-requisites


For this notebook we require a few libraries. We'll use the Python clients for Amazon OpenSearch Service and Amazon Bedrock, and OpenSearch ML Client library for generating multimodal embeddings.

#### 1.1. Import libraries & initialize resource information
The line below will import all the relevant libraries and modules used in this notebook.

In [None]:
!pip install opensearch-py
!pip install opensearch_py_ml
!pip install deprecated
!pip install requests_aws4auth

In [None]:
import boto3
import json
from opensearchpy import OpenSearch, RequestsHttpConnection
from opensearch_py_ml.ml_commons import MLCommonClient
import os
import urllib.request
import tarfile
import requests 
from requests_aws4auth import AWS4Auth
from ruamel.yaml import YAML
from PIL import Image

#### 1.2. Get CloudFormation stack output variables

We have preconfigured a few resources by creating a CloudFormation stack in the account. Names and ARN of these resources will be used within this lab. We are going to load some of the information variables here.

In [None]:
# Create a Boto3 session
session = boto3.Session()

# Get the account id
account_id = boto3.client('sts').get_caller_identity().get('Account')

# Get the current region
region = session.region_name

cfn = boto3.client('cloudformation')

# Method to obtain output variables from Cloudformation stack. 
def get_cfn_outputs(stackname):
    outputs = {}
    for output in cfn.describe_stacks(StackName=stackname)['Stacks'][0]['Outputs']:
        outputs[output['OutputKey']] = output['OutputValue']
    return outputs

## Setup variables to use for the rest of the demo
cloudformation_stack_name = "multimodal-rag-opensearch"

outputs = get_cfn_outputs(cloudformation_stack_name)
aos_host = outputs['OpenSearchDomainEndpoint']
s3_bucket = outputs['s3BucketTraining']
bedrock_inf_iam_role = outputs['BedrockBatchInferenceRole']
bedrock_inf_iam_role_arn = outputs['BedrockBatchInferenceRoleArn']
sagemaker_notebook_url = outputs['SageMakerNotebookURL']

# We will just print all the variables so you can easily copy if needed.
outputs

#### 1.3. Request Bedrock multimodal embedding model access in AWS Console

Make sure you have access to "Titan Multimodal Embeddings G1" in Amazon Bedrock within the right AWS region


## 2. Prepare the dataset

### 2.1.Download the dataset (.gz) and extract the .gz file

In [None]:
os.makedirs('tmp/images', exist_ok = True)
metadata_file = urllib.request.urlretrieve('https://aws-blogs-artifacts-public.s3.amazonaws.com/BDB-3144/products-data.yml', 'tmp/images/products.yaml')
img_filename,headers= urllib.request.urlretrieve('https://aws-blogs-artifacts-public.s3.amazonaws.com/BDB-3144/images.tar.gz', 'tmp/images/images.tar.gz')              
print(img_filename)
file = tarfile.open('tmp/images/images.tar.gz')
file.extractall('tmp/images/')
file.close()
#remove images.tar.gz
os.remove('tmp/images/images.tar.gz')

## 3. Create a connection with OpenSearch domain.
Next, we'll use Python API to set up connection with OpenSearch domain.

#### Important pre-requisite
You should have followed the steps in the Lab instruction section to map Sagemaker notebook role to OpenSearch `ml_full_access` role. If not, please visit the lab instructions and complete the **Setting up permission for Notebook IAM Role** section.

#### Retrieving credentials from Secrets manager
We are going to use Amazon Sagemaker Notebook IAM role to configure the workflows in OpenSearch. This IAM Role has permission to pass BedrockInference IAM role to OpenSearch. OpenSearch will then be able to use BedrockInference IAM role to make calls to Bedrock models.

##### NOTE: 
_At any point in this lab, if you get a failure message - **The security token included in the request is expired.**_ You can resolve it by running this cell again. The cell refreshes the security credentials that is required for the rest of the lab.

In [None]:
kms = boto3.client('secretsmanager')
aos_credentials = json.loads(kms.get_secret_value(SecretId=outputs['OpenSearchSecret'])['SecretString'])

#credentials = boto3.Session().get_credentials()
#auth = AWSV4SignerAuth(credentials, region)
auth = (aos_credentials['username'], aos_credentials['password'])

aos_client = OpenSearch(
    hosts = [{'host': aos_host, 'port': 443}],
    http_auth = auth,
    use_ssl = True,
    verify_certs = True,
    connection_class = RequestsHttpConnection
)
ml_client = MLCommonClient(aos_client)

#initializing some variables that we will use later.

embedding_connector_id = ""
embedding_model_id = ""

## 4. Create the OpenSearch Bedrock ML connector to Amazon Bedrock Titan Multimodal Embedding


In [None]:
## Create an OpenSearch remote model connector with Amazon Bedrock Titan MM Embedding model.

if not embedding_connector_id:
    host = f'https://{aos_host}/'
    service = 'es'
    credentials = boto3.Session().get_credentials()
    awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)


    # Register repository
    path = '_plugins/_ml/connectors/_create'
    url = host + path

    payload = {
        "name": "Amazon Bedrock Connector: embedding",
        "description": "The connector to bedrock Titan multimodal embedding model",
        "version": 1,
        "protocol": "aws_sigv4",
        "credential": {
          "roleArn": f"arn:aws:iam::{account_id}:role/{bedrock_inf_iam_role}"
       },
       "parameters": {
        "region": region,
        "service_name": "bedrock",
        "model": "amazon.titan-embed-image-v1"
       },
       "actions": [
        {
          "action_type": "predict",
          "method": "POST",
          "url": "https://bedrock-runtime.${parameters.region}.amazonaws.com/model/${parameters.model}/invoke",
          "headers": {
            "content-type": "application/json",
            "x-amz-content-sha256": "required"
          },
         "request_body": "{ \"inputText\": \"${parameters.inputText:-null}\", \"inputImage\": \"${parameters.inputImage:-null}\" }",
      "pre_process_function": "\n    StringBuilder parametersBuilder = new StringBuilder(\"{\");\n    if (params.text_docs.length > 0 && params.text_docs[0] != null) {\n      parametersBuilder.append(\"\\\"inputText\\\":\");\n      parametersBuilder.append(\"\\\"\");\n      parametersBuilder.append(params.text_docs[0]);\n      parametersBuilder.append(\"\\\"\");\n      \n      if (params.text_docs.length > 1 && params.text_docs[1] != null) {\n        parametersBuilder.append(\",\");\n      }\n    }\n    \n    \n    if (params.text_docs.length > 1 && params.text_docs[1] != null) {\n      parametersBuilder.append(\"\\\"inputImage\\\":\");\n      parametersBuilder.append(\"\\\"\");\n      parametersBuilder.append(params.text_docs[1]);\n      parametersBuilder.append(\"\\\"\");\n    }\n    parametersBuilder.append(\"}\");\n    \n    return  \"{\" +\"\\\"parameters\\\":\" + parametersBuilder + \"}\";",
      "post_process_function": "\n      def name = \"sentence_embedding\";\n      def dataType = \"FLOAT32\";\n      if (params.embedding == null || params.embedding.length == 0) {\n          return null;\n      }\n      def shape = [params.embedding.length];\n      def json = \"{\" +\n                 \"\\\"name\\\":\\\"\" + name + \"\\\",\" +\n                 \"\\\"data_type\\\":\\\"\" + dataType + \"\\\",\" +\n                 \"\\\"shape\\\":\" + shape + \",\" +\n                 \"\\\"data\\\":\" + params.embedding +\n                 \"}\";\n      return json;\n    "
    }
  ] 
    }
    headers = {"Content-Type": "application/json"}

    r = requests.post(url, auth=awsauth, json=payload, headers=headers)
    print(r.status_code)
    print(r.text)
    embedding_connector_id = json.loads(r.text)["connector_id"]
else:
    print(f"Connector already exists - {embedding_connector_id}")
    
embedding_connector_id

Once the model connector is defined. We need to register the model and deploy. Following two cells will register and then deploy the model connection respectively.

In [None]:
# Register the multimodal embedding model
if not embedding_model_id:
    path = '_plugins/_ml/models/_register'
    url = 'https://'+aos_host + '/' + path
    payload = { "name": "Bedrock Titan mm embeddings model",
    "function_name": "remote",
    "description": "Bedrock Titan mm embeddings model",
    "connector_id": embedding_connector_id}
    r = requests.post(url, auth=awsauth, json=payload, headers=headers)
    embedding_model_id = json.loads(r.text)["model_id"]
else:
    print("skipping model registration - model already exists")
print("Model registered under model_id: "+embedding_model_id)

In [None]:
# Deploy the embedding model
path = '_plugins/_ml/models/'+embedding_model_id+'/_deploy'
url = 'https://'+aos_host + '/' + path
r = requests.post(url, auth=awsauth, headers=headers)
deploy_status = json.loads(r.text)["status"]
print("Deployment status of the model, "+embedding_model_id+" : "+deploy_status)

## 4. Test the OpenSearch - Bedrock integration with a test input

In [None]:

import base64


path = '_plugins/_ml/models/'+embedding_model_id+'/_predict'
url = 'https://'+aos_host + '/' + path
img = "tmp/images/footwear/2d2d8ec8-4806-42a7-b8ba-ceb15c1c7e84.jpg"
with open(img, "rb") as image_file:
    input_image_binary = base64.b64encode(image_file.read()).decode("utf8")
    
payload = {
"parameters": {
"inputText": "Sleek, stylish black sneakers made for urban exploration. With fashionable looks and comfortable design, these sneakers keep your feet looking great while you walk the city streets in style",
"inputImage":input_image_binary
}
}

r = requests.post(url, auth=awsauth, json=payload, headers=headers)

try:
    embed = json.loads(r.text)['inference_results'][0]['output'][0]['data'][0:10]
    shape = json.loads(r.text)['inference_results'][0]['output'][0]['shape'][0]
    print("First 10 dimensions:")
    print(str(embed))
    print("\n")
    print("Total: " + str(shape) + " dimensions")
except KeyError as e:
    print(f"KeyError: {e}")
    print("The response does not contain the expected data structure.")
except Exception as e:
    print(f"Error: {e}")
    print("An unexpected error occurred.")

## 5. Create the OpenSearch ingest pipeline


Let's create an ingestion pipeline that will call Amazon Bedrock Titan Multimodal embedding model and convert the text and image into multimodal vector embedding. Ingest pipeline is a feature in OpenSearch that allows you to define certain actions to be performed at the time of data ingestion. You could do simple processing such as adding a static field, modify an existing field, or call a remote model to get inference and store inference output together with the indexed record/document. In our case inference output is vector embedding.

Following ingestion pipeline is going to call our remote model and convert product image `product_description` field and the `image_binary` to vector and store it in the field called `vector_embedding`

In [None]:
path = "_ingest/pipeline/bedrock-multimodal-ingest-pipeline"
url = 'https://'+aos_host + '/' + path
payload = {
"description": "A text/image embedding pipeline",
"processors": [
{
"text_image_embedding": {
"model_id":embedding_model_id,
"embedding": "vector_embedding",
"field_map": {
"text": "product_description",
"image": "image_binary"
}}}]}
r = requests.put(url, auth=awsauth, json=payload, headers=headers)
print(r.status_code)
print(r.text)

## 6. Create the k-NN index

In [None]:
path = "bedrock-multimodal-rag"
url = 'https://'+aos_host + '/' + path

#this will delete the index if already exists
requests.delete(url, auth=awsauth, json=payload, headers=headers)

payload = {
  "settings": {
    "index.knn": True,
    "default_pipeline": "bedrock-multimodal-ingest-pipeline"
  },
  "mappings": {
      
    "_source": {
     
    },
    "properties": {
      "vector_embedding": {
        "type": "knn_vector",
        "dimension": shape,
        "method": {
          "name": "hnsw",
          "engine": "faiss",
          "parameters": {}
        }
      },
      "product_description": {
        "type": "text"
      },
        "image_url": {
        "type": "text"
      },
      "image_binary": {
        "type": "binary"
      }
    }
  }
}
r = requests.put(url, auth=awsauth, json=payload, headers=headers)
print(r.status_code)
print(r.text)

## 7. Ingest the dataset into k-NN index usig Bulk request

In [None]:
def resize_image(photo, width, height):
    Image.MAX_IMAGE_PIXELS = 100000000
    
    with Image.open(photo) as image:
        image.verify()
    with Image.open(photo) as image:    
        
        if image.format in ["JPEG", "PNG"]:
            file_type = image.format.lower()
            path = image.filename.rsplit(".", 1)[0]

            image.thumbnail((width, height))
            image.save(f"{path}-resized.{file_type}")
    return file_type, path

# Load the products from the dataset
yaml = YAML()
items_ = yaml.load(open('tmp/images/products.yaml'))

batch = 0
count = 0
body_ = ''
batch_size = 100
last_batch = int(len(items_)/batch_size)
action = json.dumps({ 'index': { '_index': 'bedrock-multimodal-rag' } })

for item in items_:
    count+=1
    fileshort = "tmp/images/"+item["category"]+"/"+item["image"]
    payload = {}
    payload['image_url'] = fileshort
    payload['product_description'] = item['description']
    
    #resize the image and generate image binary
    file_type, path = resize_image(fileshort, 2048, 2048)

    with open(fileshort.split(".")[0]+"-resized."+file_type, "rb") as image_file:
        input_image = base64.b64encode(image_file.read()).decode("utf8")
    
    os.remove(fileshort.split(".")[0]+"-resized."+file_type)
    payload['image_binary'] = input_image
    
    body_ = body_ + action + "\n" + json.dumps(payload) + "\n"
    
    if(count == batch_size):
        response = aos_client.bulk(
        index = "bedrock-multimodal-rag",
        body = body_
        )
        batch += 1
        count = 0
        print("batch "+str(batch) + " ingestion done!")
        if(batch != last_batch):
            body_ = ""
        
            
#ingest the remaining rows
response = aos_client.bulk(
        index = "bedrock-multimodal-rag",
        body = body_
        )
        
print("All "+str(last_batch)+" batches ingested into index")


In [None]:
res = aos_client.search(index="bedrock-multimodal-rag", body={"query": {"match_all": {}}})
print("Records found: %d." % res['hits']['total']['value'])

## 8. Lexical search

In [None]:
#Keyword Search
query = "trendy footwear for women"
url = 'https://' + aos_host + "/bedrock-multimodal-rag/_search"
keyword_payload = {"_source": {
        "exclude": [
            "vector_embedding"
        ]
        },
        "query": {    "match": {
                        "product_description": {
                            "query": query
                        }
                        }
                    }
        
        ,"size":5,
  }

r = requests.get(url, auth=awsauth, json=keyword_payload, headers=headers)
response_ = json.loads(r.text)
docs = response_['hits']['hits']

for i,doc in enumerate(docs):
    print(str(i+1)+ ". "+doc["_source"]["product_description"])
    image = Image.open(doc["_source"]["image_url"])
    image.show()

## 9. Multimodal search with both image and text caption as inputs

In [None]:
#Multimodal Search
#Text and image as inputs
s3 = boto3.client('s3')
url = 'https://'+aos_host + "/bedrock-multimodal-rag/_search"
query = "trendy footwear for women"
print("Input text query: "+query)
# urllib.request.urlretrieve( 
#   'https://cdn.pixabay.com/photo/2014/09/03/20/15/shoes-434918_1280.jpg',"tmp/women-footwear.jpg") 
img = Image.open("tmp/women-footwear-1.jpg") 
print("Input query Image:")
img.show()
with open("tmp/women-footwear-1.jpg", "rb") as image_file:
    query_image_binary = base64.b64encode(image_file.read()).decode("utf8")
keyword_payload = {"_source": {
        "exclude": [
            "vector_embedding"
        ]
        },
        "query": {    
       
        "neural": {
            "vector_embedding": {
                
            "query_image":query_image_binary,
            "query_text":query,
                
            "model_id": embedding_model_id,
            "k": 5
            }
            
            }
                    }
        
        ,"size":5,
  }

r = requests.get(url, auth=awsauth, json=keyword_payload, headers=headers)
response_ = json.loads(r.text)
docs = response_['hits']['hits']

for i,doc in enumerate(docs):
    print(doc["_source"]["product_description"])
    image = Image.open(doc["_source"]["image_url"])
    image.show()

# Multimodal Conversational search

In this section, you will be using OpenSearch Service as the knowledge database to run multimodal retrieval and augment the LLM prompt with the relevant context. You will be using Claude v3 Sonnet as the foundational model to generate response and provide fashion advice to end users based on the available retail items stored in OpenSearch Service.

## 10. Create the OpenSearch Bedrock Claude LLM connector


## Create an OpenSearch AI connector for Claude V2

In [None]:
#initializing variables that we will use later.

llm_connector_id = ""
llm_model_id = ""

if not llm_connector_id:
    host = f'https://{aos_host}/'
    service = 'es'
    credentials = boto3.Session().get_credentials()
    awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)


    # Register repository
    path = '_plugins/_ml/connectors/_create'
    url = host + path

    payload = {
        "name": "Amazon Bedrock Connector: Claude 2",
        "description": "The connector to bedrock Claude V2",
        "version": 1,
        "protocol": "aws_sigv4",
        "credential": {
          "roleArn": f"arn:aws:iam::{account_id}:role/{bedrock_inf_iam_role}"
       },
       "parameters": {
           "region": region,
        "service_name": "bedrock",
        "auth": "Sig_V4",
        "model": "anthropic.claude-v2"
       },
       "actions": [
        {
          "action_type": "predict",
          "method": "POST",
          "headers": {
            "content-type": "application/json"
          },
            "url": "https://bedrock-runtime.${parameters.region}.amazonaws.com/model/${parameters.model}/invoke",
            "request_body": "{\"prompt\":\"\\n\\nHuman: ${parameters.inputs}\\n\\nAssistant:\",\"max_tokens_to_sample\":300,\"temperature\":0.5,\"top_k\":250,\"top_p\":1,\"stop_sequences\":[\"\\\\n\\\\nHuman:\"]}"
        }
            ]
    }

    r = requests.post(url, auth=awsauth, json=payload, headers=headers)
    print(r.status_code)
    print(r.text)
    llm_connector_id = json.loads(r.text)["connector_id"]
else:
    print(f"Connector already exists - {llm_connector_id}")
    
llm_connector_id

## 11. Register and deploy the OpenSearch Bedrock Claude LLM

In [None]:
# Register and deploy the llm model
if not llm_model_id:
    path = '_plugins/_ml/models/_register?deploy=true'
    url = 'https://'+aos_host + '/' + path
    payload = { 
        "name": "Amazon Bedrock Connector: Claude v2",
        "function_name": "remote",
        "description": "The connector to bedrock Claude v2",
        "connector_id": llm_connector_id
    }
    
    r = requests.post(url, auth=awsauth, json=payload, headers=headers)
    llm_model_id = json.loads(r.text)["model_id"]
    
else:
    print("skipping model registration - llm model already exists")
print("Model registered and deployed under llm_model_id: "+llm_model_id)

## 12. Test LLM model inference

In [None]:
path = '_plugins/_ml/models/'+llm_model_id+'/_predict'
url = 'https://'+aos_host + '/' + path

payload = {
    
    "parameters": {
    "inputs": "what do you recommend as outdoors footwear for long walk?"
  }
}

r = requests.post(url, auth=awsauth, json=payload, headers=headers)
    
if r.status_code == 200:
    llm_gen = json.loads(r.text)['inference_results'][0]['output'][0]['dataAsMap']['completion']
    print(str("Claude generated response without context: \n\n"+llm_gen))


## 13. Create conversational search using neural search in OpenSearch Service

Conversational search lets you ask questions in natural language, receive a text response based on a provided context and ask additional clarifying questions. 

In order to have a conversational search, the LLM needs to remember the context of the entire conversation to have follow-up questions. This is composed of 2 components:

Part 1: Define a conversation history comprising of all messages (The human-input question and the LLM answer) being stored within the same conversation memory.

Part 2: Define a Retrieval-Augmented Generation (RAG) pipeline to augment the LLM prompt with the relevant context from OpenSearch Service vector database.

In [None]:
## cluster prerequisites
## Enable conversation memory and RAG pipeline features in your cluster


path = '/_cluster/settings'
url = 'https://'+aos_host + '/' + path

payload = {
    
    "persistent": {
    "plugins.ml_commons.memory_feature_enabled": "true",
    "plugins.ml_commons.rag_pipeline_feature_enabled": "true"
  }
}

r = requests.put(url, auth=awsauth, json=payload, headers=headers)

print(r.status_code)
print(r.text)

## 14. Create the OpenSearch search RAG pipeline

The RAG search pipeline uses the retrieval_augmented_generation processor used in conversational search to enables the integration of information retrieval and language generation for conversational search applications. 

The retrieval_augmented_generation processor is a search results processor that intercepts query results, retrieves previous messages from the conversation, and send a prompt to a large language model (LLM), saving the response in conversational memory and returning both the original OpenSearch query results and the LLM response.

Let's create the search pipeline with the retrieval_augmented_generation processor to use at search time.

In [None]:
path = "_search/pipeline/multimodal_rag_pipeline"
url = 'https://'+aos_host + '/' + path
payload = {
    "response_processors": [
    {
      "retrieval_augmented_generation": {
        "tag": "bedrock_rag-pipeline_demo",
        "description": "Search pipeline using Bedrock Claude v2 Connector for RAG",
        "model_id": llm_model_id,
        "context_field_list": ["product_description"],
        "system_prompt": "You are a helpful shopping advisor that uses their vast knowledge of fashion tips to make great recommendations people will enjoy.",
        "user_instructions": "As a shopping advisor, be friendly and approachable. Greet the customer warmly. Evaluate each item provided in the context and provide a concise recommendation about each item to matches best the customer question using the order and number of search result related to each item. If there are items in the provided context that do not match the user question, explain that this may be due to insufficient items in the inventory. Finally, thank the client and let them know you're available if they have any other questions."
      }
    }
  ]


}
r = requests.put(url, auth=awsauth, json=payload, headers=headers)
print(r.status_code)
print(r.text)

## 15. RAG using multimodal search to provide prompt context

### First start by creating a conversational memory to store the 5 most recent messages to augment the LLM prompt with the conversation history.

Before starting a converstation, run the cell below to create the conversational memory. Do not run the cell below if you would like to run follow up questions whithin the same conversation.

In [None]:
## Create a conversation memory

## Initialize memory id variable to use in the next conversational search

memory_id= ''

path = '/_plugins/_ml/memory/'
url = 'https://'+aos_host + '/' + path

payload = {
    
    "name": "Conversation about bags"
}

r = requests.post(url, auth=awsauth, json=payload, headers=headers)

memory_id = json.loads(r.text)["memory_id"]

print("The new created memory id: " +memory_id) 

### Run a conversational search using the memory id created above:

In [None]:
query = "is this suitable for long walk?"
url = 'https://' + aos_host + "/bedrock-multimodal-rag/_search?search_pipeline=multimodal_rag_pipeline"


print("Input text query: "+query)
# urllib.request.urlretrieve( 
#   'https://cdn.pixabay.com/photo/2014/09/03/20/15/shoes-434918_1280.jpg',"tmp/women-footwear.jpg") 
img = Image.open("tmp/images/accessories/1.jpg") 
print("Input query Image:")
img.show()
with open("tmp/images/accessories/1.jpg", "rb") as image_file:
    query_image_binary = base64.b64encode(image_file.read()).decode("utf8")


multimodal_payload = {
    "_source": {
        "exclude": [
            "vector_embedding", "image_binary"
        ]},
    "query": {
        "neural": {
      "vector_embedding": {
        "query_image":query_image_binary,
        "query_text": query,
        "model_id": embedding_model_id,
        "k": 5
      }
    }
          },
    "size":5,
    "ext": {
    "generative_qa_parameters": {
      "llm_model": "bedrock/claude",
      "llm_question": query,
     "memory_id": memory_id ,
      "context_size": 5,
      "message_size": 5,
      "timeout": 60
    }
  }
  }

r = requests.get(url, auth=awsauth, json=multimodal_payload, headers=headers)
response_ = json.loads(r.text)
docs = response_['hits']['hits']
rag = response_['ext']['retrieval_augmented_generation']

print(rag['answer']+"\n\n")
print("Context: \n\n")

for i,doc in enumerate(docs):
    print(str(i+1)+ ". "+doc["_source"]["product_description"])
    image = Image.open(doc["_source"]["image_url"])
    image.show()

### Ask a follow up question

In [None]:
query = "thanks. How about similar bag for kids?"
url = 'https://' + aos_host + "/bedrock-multimodal-rag/_search?search_pipeline=multimodal_rag_pipeline"


print("Input text query: "+query)
# urllib.request.urlretrieve( 
#   'https://cdn.pixabay.com/photo/2014/09/03/20/15/shoes-434918_1280.jpg',"tmp/women-footwear.jpg") 
img = Image.open("tmp/images/accessories/1.jpg") 
print("Input query Image:")
img.show()
with open("tmp/images/accessories/1.jpg", "rb") as image_file:
    query_image_binary = base64.b64encode(image_file.read()).decode("utf8")


multimodal_payload = {
    "_source": {
        "exclude": [
            "vector_embedding", "image_binary"
        ]},
    "query": {
        "neural": {
      "vector_embedding": {
        "query_image":query_image_binary,
        "query_text": query,
        "model_id": embedding_model_id,
        "k": 5
      }
    }
          },
    "size":5,
    "ext": {
    "generative_qa_parameters": {
      "llm_model": "bedrock/claude",
      "llm_question": query,
     "memory_id": memory_id ,
      "context_size": 5,
      "message_size": 5,
      "timeout": 60
    }
  }
  }

r = requests.get(url, auth=awsauth, json=multimodal_payload, headers=headers)
response_ = json.loads(r.text)
docs = response_['hits']['hits']
rag = response_['ext']['retrieval_augmented_generation']

print(rag['answer']+"\n\n")
print("Context: \n\n")

for i,doc in enumerate(docs):
    print(str(i+1)+ ". "+doc["_source"]["product_description"])
    image = Image.open(doc["_source"]["image_url"])
    image.show()

## Check the conversation history

In [None]:
## To verify that the messages were added to the memory, provide the memory_ID to the Get Messages API:


path = '/_plugins/_ml/memory/'+memory_id +'/messages'
url = 'https://'+aos_host + '/' + path

r = requests.get(url, auth=awsauth, json=payload, headers=headers)

response = json.loads(r.text)

#print("The response contains the following messages" + r.text)

print(json.dumps(response, indent=2))


# Optional - Agent RAG flow with OpenSearch Service

In [None]:
## cluster prerequisites
## Enable agent framework in your cluster and disable triggering the native memory circuit breaker


path = '/_cluster/settings'
url = 'https://'+aos_host + '/' + path

payload = {
    
    "persistent": {
    "plugins.ml_commons.native_memory_threshold": 100,
    "plugins.ml_commons.agent_framework_enabled": "true"
  }
}

r = requests.put(url, auth=awsauth, json=payload, headers=headers)


if (r.status_code==200):
    print("Agent framework is enabled")

# Create connector

In [None]:
#initializing variables that we will use later.

llm_agent_connector_id = ""
llm_agent_model_id = ""

if not llm_agent_connector_id:
    host = f'https://{aos_host}/'
    service = 'es'
    credentials = boto3.Session().get_credentials()
    awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)


    # Register repository
    path = '_plugins/_ml/connectors/_create'
    url = host + path

    payload = {
        "name": "Amazon Bedrock Connector: Claude v1",
        "description": "The connector to bedrock Claude V1",
        "version": 1,
        "protocol": "aws_sigv4",
        "credential": {
          "roleArn": f"arn:aws:iam::{account_id}:role/{bedrock_inf_iam_role}"
       },
       "parameters": {
           "region": region,
        "service_name": "bedrock",
        "auth": "Sig_V4",
        "model": "anthropic.claude-instant-v1",
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens_to_sample": 8000,
    "temperature": 0.0001,
    "response_filter": "$.completion"
       },
       "actions": [
        {
          "action_type": "predict",
          "method": "POST",
          "headers": {
            "content-type": "application/json",
        "x-amz-content-sha256": "required"
          },
            "url": "https://bedrock-runtime.${parameters.region}.amazonaws.com/model/${parameters.model}/invoke",
            "request_body": "{\"prompt\":\"${parameters.prompt}\", \"max_tokens_to_sample\":${parameters.max_tokens_to_sample}, \"temperature\":${parameters.temperature},  \"anthropic_version\":\"${parameters.anthropic_version}\" }"
        }
            ]
    }

    r = requests.post(url, auth=awsauth, json=payload, headers=headers)
    print(r.status_code)
    llm_agent_connector_id = json.loads(r.text)["connector_id"]
else:
    print(f"Connector already exists - {llm_agent_connector_id}")
print(f"llm_agent_connector_id : {llm_agent_connector_id}")

In [None]:
# Register and deploy the llm model
if not llm_agent_model_id:
    path = '_plugins/_ml/models/_register?deploy=true'
    url = 'https://'+aos_host + '/' + path
    payload = { 
        "name": "Amazon Bedrock Connector: Claude v1",
        "function_name": "remote",
        "description": "The connector to bedrock Claude v1",
        "connector_id": "lBKGTZEBdCm6hxASedK7"
    }
    
    r = requests.post(url, auth=awsauth, json=payload, headers=headers)
    llm_agent_model_id = json.loads(r.text)["model_id"]
    
else:
    print("skipping model registration - llm model already exists")
print("llm_agent_model_id: "+llm_agent_model_id)

In [None]:
path = '_plugins/_ml/models/'+llm_agent_model_id+'/_predict'
url = 'https://'+aos_host + '/' + path

payload = {
    
    "parameters": {
    "prompt": "\n\nHuman: how are you? \n\nAssistant:"
  }
}

r = requests.post(url, auth=awsauth, json=payload, headers=headers)


if r.status_code == 200:
    llm_gen = json.loads(r.text)['inference_results'][0]['output'][0]['dataAsMap']['response']
    print(llm_gen)

## 15. Create, register and execute an agent

In [None]:
#initializing variable to store the agend id

agent_id = ""


# you’ll use the embedding model and the Claude model to create a flow agent

payload = {
    "name": "Fashion stylist agent",
  "type": "conversational_flow",
  "description": "this is a demo agent for fashion stylist",
  "app_type": "rag",
     "memory": {
        "type": "conversation_index"
    },
  "tools": [
    {
      "type": "VectorDBTool",
        "name": "shopping_knowledge_base",
      "parameters": {
        "model_id": embedding_model_id,
        "index": "bedrock-multimodal-rag",
        "embedding_field": "vector_embedding",
        "source_field": [
            "product_description"
        ],
          
        "input": "${parameters.question}"
      }
    },
    {
      "type": "MLModelTool",
        "name": "bedrock_claude_model_v1",
      "description": "A tool to recommend fashion styles",
      "parameters": {
        "model_id": llm_agent_model_id,
        "prompt": "\n\nHuman:You are a professional fashion stylist. You will always recommend product based on the given context. If you don't have enough context, you will ask Human to provide more information. If you don't see any related product to recommend, just say we don't have such product. If you don't know the answer, just say you don't know. \n\nContext:\n${parameters.shopping_knowledge_base.output:-}\n\n${parameters.chat_history:-}\n\nHuman:${parameters.question}\n\nAssistant:"
      }
    }
  ]
}

path = '_plugins/_ml/agents/_register'
url = 'https://'+aos_host + '/' + path
r = requests.post(url, auth=awsauth, json=payload , headers=headers )

agent_id = json.loads(r.text)["agent_id"]

#print status of the API call.
print(f"Status: {r.status_code}. Response:{r.text}")


In [None]:
# You can inspect the agent by sending a request to the agents endpoint and providing the agent ID:


# You can test the llm text generation here.
agent_payload = {
    "parameters":
    {
        "question": "What are the most comfortable shoes",
    "verbose": "true"
    } 
}

print((r.text))

path = '_plugins/_ml/agents/'+agent_id+'/_execute'
url = 'https://'+aos_host + '/' + path
r = requests.post(url, auth=awsauth, json=agent_payload , headers=headers)

if r.status_code == 200:
    llm_agent_gen = json.loads(r.text)['inference_results'][0]['output'][2]['result']
    print(str(llm_agent_gen))

# Deploy your application

To deploy this code as an application we will use [Streamlit](https://streamlit.io/). 

**Step 1:** Export the connector and model IDs you created earlier in this notebook. We will store them to a file so that they can be referenced outside of the notebook and persisted between kernel restarts.

In [None]:
connector_ids = {}
connector_ids['llm_connector_id'] = llm_connector_id
connector_ids['llm_model_id'] = llm_model_id
connector_ids['embedding_connector_id'] = embedding_connector_id
connector_ids['embedding_model_id'] = embedding_model_id
connector_ids['agent_id'] = agent_id

with open('connector_ids.json', 'w') as file:
    json.dump(connector_ids, file)

**Step 2:** Since you will launch the app.py script using the Notebook instance's shell, you need to install the package libraries in the shell environment.

In [None]:
!pip install streamlit
!pip install opensearch-py
!pip install opensearch_py_ml
!pip install deprecated
!pip install requests_aws4auth
!pip install requests

**Step 3:** Run the streamlit application from the shell. 
<br> The '--server.baseUrlPath' argument is used to set a custom base URL for the Streamlit app.

In [None]:
!streamlit run app.py --server.baseUrlPath="/proxy/absolute/8501"

Access the app at this URL: https://semantic-search-nb-qnru.notebook.us-east-1.sagemaker.aws/proxy/absolute/8501/

Update the above URL with the host name of your Jupyter notebook.