# Nova Multimodal Embedding Search using Amazon Bedrock and Amazon S3 Vectors
Work with Nova Multimodal Embedding Model and Amazon S3 Vectores

<!-- ![Nova MM Embedding](./images/nova-mm-embed-search.png) -->

In [None]:
!pip install --upgrade pip setuptools wheel
!pip install boto3 --upgrade

Restart kernal

In [None]:
model_id = 'amazon.nova-2-multimodal-embeddings-v1:0'

s3_bucket = '<Your S3 Bucket>'
s3_prefix = '<Your S3 Prefix>' 
account_id = '<Your account id>'

dim = 3072
s3vector_bucket = "<Your S3 Vector Bucket>"
s3vector_index = "<Your vector index>"

## Download a Sample Video and Upload to S3 as Input
We'll use the Nova Multimodal Embedding model to generate embeddings from this video and perform content-based search.

We will use an open-source sample video, [Meridian](https://en.wikipedia.org/wiki/Meridian_(film) ), as input to generate embeddings.

![Meridian](./images/sample-video-meridian.png)

In [None]:
# Download a sample video to local disk
sample_name = 'NetflixMeridian.mp4'
source_url = f'https://ws-assets-prod-iad-r-pdx-f3b3f9f1a7d6a3d0.s3.us-west-2.amazonaws.com/335119c4-e170-43ad-b55c-76fa6bc33719/NetflixMeridian.mp4'
!curl {source_url} --output {sample_name}

# Upload to S3
import boto3
s3 = boto3.client("s3")
s3_input_key = f'{s3_prefix}/video/{sample_name}'
s3.upload_file(sample_name, s3_bucket, s3_input_key)
print(f"Uploaded to s3://{s3_bucket}/{s3_input_key}")

## Generate Multimodal Embeddings Using Nova
We use Bedrock’s [StartAsyncInvoke](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_StartAsyncInvoke.html) to run the embedding task asynchronously. In this example, the video is hosted on S3—ideal for handling large video files.

In [None]:
from datetime import datetime

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")

request_body = {
    "taskType": "SEGMENTED_EMBEDDING",
    "segmentedEmbeddingParams": {
        "embeddingDimension": dim,
        "embeddingPurpose": "GENERIC_INDEX",
        "video": {
            "format": "mp4",
            "embeddingMode": "AUDIO_VIDEO_COMBINED",
            "source": {
                "s3Location": {
                    "uri": f's3://{s3_bucket}/{s3_input_key}',
                    "bucketOwner": account_id,
                }
            },
            "segmentationConfig": {"durationSeconds": 5},
        },
    },
}

# Invoke the Nova Embeddings model.
response = bedrock.start_async_invoke(
    modelId=model_id,
    modelInput=request_body,
    outputDataConfig={
        "s3OutputDataConfig": {"s3Uri": f"s3://{s3_bucket}"}
    },
)
START_TS = datetime.now()

invocation_arn = response.get("invocationArn")
response_metadata = response["ResponseMetadata"]


The asynchronous task takes time to complete. In the function below, we call the [GetAsyncInvoke API](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_GetAsyncInvoke.html) every 5 seconds to check the status of the task invoked from the previous file, until it completes. Once complete, the result is read from the .jsonl file in the output S3 bucket. Alternatively, you can set up an S3 trigger or an EventBridge rule to invoke a process (such as a Lambda function) to read the result, following an asynchronous architecture pattern.

In [None]:
import time
from IPython.display import clear_output

def wait_for_output_file(s3_bucket, s3_prefix, invocation_arn):
    # Wait until task complete
    status = None
    while status not in ["Completed", "Failed", "Expired"]:
        response = bedrock.get_async_invoke(invocationArn=invocation_arn)
        status = response['status']
        clear_output(wait=True)
        print(f"Embedding task status: {status}")
        time.sleep(5)

    outputS3Uri = response["outputDataConfig"]["s3OutputDataConfig"]["s3Uri"]
    output_s3_bucket = outputS3Uri.split('/')[2]
    output_s3_prefix = outputS3Uri.replace(f's3://{output_s3_bucket}/','')
    
    # List objects in the prefix
    response = s3.list_objects_v2(Bucket=output_s3_bucket, Prefix=output_s3_prefix)

    # Look for output jsonl files
    data = []
    output_key = None
    for obj in response.get('Contents', []):
        if obj['Key'].endswith('.jsonl'):
            output_key = obj['Key']
            embed_name = output_key.split('/')[-1].replace(".jsonl","").replace("embedding-","")
            if output_key:
                obj = s3.get_object(Bucket=s3_bucket, Key=output_key)
                content = obj['Body'].read().decode('utf-8')
                for item in content.split('\n'):
                    if item:
                        embed = json.loads(item)
                        if "segmentMetadata" in embed:
                            embed["segmentMetadata"]["type"] = embed_name
                        data.append(embed)

    return data

The result will be available in S3 once the task is complete. The code snippet below wait until the *.jsonl file is ready and read it from the output path specified in your request.

In [None]:
import json
from IPython.display import display, JSON
output = wait_for_output_file(s3_bucket, "", invocation_arn)
print(f'Embedding elasped:', datetime.now()-START_TS)
display(JSON(output))

## Create an Amazon S3 Vector Bucket and Index
In this example, we use Amazon S3 vectors to store the embeddings generated in the previous steps to serve light search as an example.

In [None]:
# Create a S3 vector bucket if not exists
s3vectors = boto3.client("s3vectors", region_name="us-east-1")
try:
    s3vectors.create_vector_bucket(vectorBucketName=s3vector_bucket)
    print(f"Vector bucket '{s3vector_bucket}' created successfully.")
except Exception as ex:    
    print(f'Failed to create S3 vector bucket: {s3vector_bucket}', ex)

In [None]:
# Delete index
s3vectors.delete_index(
        vectorBucketName=s3vector_bucket,
        indexName=s3vector_index)

In [None]:
# Create an index in the vector store if not exists
try:
    s3vectors.create_index(
        vectorBucketName=s3vector_bucket,
        indexName=s3vector_index,
        dataType='float32',  # Common data type for vector embeddings
        dimension=dim,
        distanceMetric='cosine' # or 'euclidean'
    )
    print(f"Vector index '{s3vector_index}' created successfully in bucket '{s3vector_bucket}'.")
except Exception as ex:    
    print(f'Failed to create S3 vector index {s3vector_index} in bucket: {s3vector_bucket}', ex)

## Store the Embeddings into the S3 vector index
You can use the Python boto3 library to index and query the S3 vector store

In [None]:
import boto3
import json

counter, idx = 0, 0
batch_size = 200

embeddings = []
for o in output:
    meta = o.get("segmentMetadata")
    if meta:
        embeddings.append({
                "key": f'{meta["type"]}-{meta["segmentIndex"]}',
                "data": {"float32": o["embedding"]},
                "metadata": meta
            })
        counter += 1
        idx += 1
        if counter >= batch_size or idx >= len(output):
            # Write embeddings into vector index with metadata.
            s3vectors.put_vectors(
                vectorBucketName=s3vector_bucket,   
                indexName=s3vector_index,   
                vectors=embeddings
            )
            counter = 0
print(f'{idx} embedding(s) added to the S3 vector index')

## Search the Vector Store and Display the Video Clips
Next, we create utility functions to query the S3 vector index using a provided embedding which return top N results, and to display the retrieved video clip embeddings in HTML format.

In [None]:
def search_s3_vectors(search_embed, topK=5, s3vector_bucket=s3vector_bucket, s3vector_index=s3vector_index):
    # Query vector index.
    response = s3vectors.query_vectors(
        vectorBucketName=s3vector_bucket,
        indexName=s3vector_index,
        queryVector={"float32": search_embed}, 
        topK=topK, 
        returnDistance=True,
        returnMetadata=True
    )
    return response

In [None]:
from IPython.display import HTML
import boto3
import uuid

def display_video_clips(search_response):
    # Format data for display
    start_times = []
    
    for clip in search_response["vectors"]:
        #print(clip)
        start_times.append(
            (round(clip["metadata"]["segmentStartSeconds"],2), 
             f'{round(float(clip["metadata"]["segmentStartSeconds"]),2)} - {round(float(clip["metadata"]["segmentEndSeconds"]),2)}s (score: {round(clip["distance"],3)})'))

    # Generate a presigned URL for the video in S3
    s3 = boto3.client('s3')
    url = s3.generate_presigned_url(
        ClientMethod='get_object',
        Params={'Bucket': s3_bucket, 'Key': s3_input_key},
        ExpiresIn=3600
    )

    # Generate buttons HTML
    buttons_html = ''.join([
        f'<button onclick="jumpTo({time})">{label}</button> '
        for time, label in start_times
    ])

    video_id = f"videoPlayer{str(uuid.uuid4())[0:4]}"
    html = f"""
    <video id="{video_id}" width="640" controls muted>
      <source src="{url}" type="video/mp4">
      Your browser does not support the video tag.
    </video>
    
    <div style="margin-top:10px;display:block;">
      {buttons_html}
    </div>
    
    <script>
      var video = document.getElementById('{video_id}');
    
      function jumpTo(time) {{
        video.currentTime = time;
        video.play();
      }}
    </script>
    """
    
    display(HTML(html))

## Search text input
You can search video embeddings generated by the Nova Multimodal Embedding model across various modalities such as text, image, video, and audio. In the following example, a text query is used to search the embeddings produced by the asynchronous process stored in S3, returning the top N results.

The following example demonstrates searching with an input text. Here, we use the synchronous [InvokeModel API](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) to generate an embedding for the given text. As a best practice, when providing a search text input, include the following prefix to improve accuracy:
`Instruction: Find an image, video, or document that matches the following description:\nQuery:`

In [None]:
text_input = "two men having a conversation"

request_body = {
    "taskType": "SINGLE_EMBEDDING",
    "singleEmbeddingParams": {
        "embeddingPurpose": "VIDEO_RETRIEVAL",
        "embeddingDimension": dim,
        "text": {
            "truncationMode": "NONE", # "START" | "END" | "NONE"
            "value": text_input
        },
    },
}

# Invoke the Nova Embeddings model.
response = bedrock.invoke_model(
    body=json.dumps(request_body),
    modelId=model_id,
    accept="application/json",
    contentType="application/json",
)

# Decode the response body.
response_body = json.loads(response.get("body").read())
response_metadata = response["ResponseMetadata"]

display(JSON(response_body))

Perform a search against the S3 vector index and display the retrieved videos and clips in HTML. Use the buttons below each video to jump directly to the timestamp where the clip begins.

In [None]:
search_response = search_s3_vectors(response_body["embeddings"][0]["embedding"])
display_video_clips(search_response)

## Search image input

Similar to text input search, you can also use image, video, or audio as input to search the video clip embeddings. The following example demonstrates searching with an input image. Here, we use the synchronous [InvokeModel API](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) to generate an embedding for the given image, which is passed as a Base64 string in the API request payload.

The input image is a car appeared in the video:

<img src="./images/meridian-car.png" alt="image input" width="50%">

In [None]:
import base64, io

# Search by image
base64_str = None
with open("./images/meridian-car.png", "rb") as file:
    base64_str = base64.b64encode(file.read()).decode("utf-8")

request_body = {
    "taskType": "SINGLE_EMBEDDING",
    "singleEmbeddingParams": {
        "embeddingPurpose": "VIDEO_RETRIEVAL",
        "embeddingDimension": dim,
        "image": {
            "format": "png",
            "detailLevel": "LOW",
            "source": {"bytes": base64_str},
        },
    },
}

# Invoke the Nova Embeddings model.
response = bedrock.invoke_model(
    body=json.dumps(request_body),
    modelId=model_id,
    accept="application/json",
    contentType="application/json",
)

# Decode the response body.
response_body = json.loads(response.get("body").read())
response_metadata = response["ResponseMetadata"]

display(JSON(response_body))

In [None]:
image_search_response = search_s3_vectors(response_body["embeddings"][0]["embedding"])
display_video_clips(image_search_response)

## Cleanup
Delete the video and the embedding files from S3. Delete the S3 vector store and index.

In [None]:
# List all objects under the prefix
response = s3.list_objects_v2(Bucket=s3_bucket, Prefix=s3_prefix)

if 'Contents' in response:
    # Create a list of object identifiers to delete
    objects_to_delete = [{'Key': obj['Key']} for obj in response['Contents']]

    # Delete the objects
    s3.delete_objects(
        Bucket=s3_bucket,
        Delete={'Objects': objects_to_delete}
    )
    print(f"Deleted {len(objects_to_delete)} objects from '{s3_prefix}' in bucket '{s3_bucket}'.")
else:
    print(f"No objects found under prefix '{s3_prefix}'.")


In [None]:
# Delete vector index
response = s3vectors.delete_index(
    vectorBucketName=s3vector_bucket,
    indexName=s3vector_index
)
print(response)

response = s3vectors.delete_vector_bucket(
    vectorBucketName=s3vector_bucket
)