# Caching Responses and Cost-efficient Embedding Storage with Amazon Bedrock

## Objective Three: Choose an Efficient Storage Strategy for Embeddings

In this hands-on walkthrough, you’ll use the `boto3 Python SDK` to interact with Amazon Bedrock and generate text embeddings using the `amazon.titan-embed-text-v2:0` model. You’ll create embeddings from a set of meal recipes and explore different storage and search strategies, including in-memory storage, `FAISS` for vector indexing, `Amazon S3` for persistent object storage, and `PostgreSQL` with the `pgvector` extension for SQL-based similarity search. Each approach demonstrates how to store and retrieve embeddings efficiently, along with its performance characteristics and associated storage costs.

### 1. Prepare the Environment

This step includes the code to install the required `Python` packages needed for the rest of the exercise and restart the kernel to ensure the packages are properly loaded. While running, you might see some `pip` dependency warnings. These can be safely ignored as they won’t impact the steps we’re performing here.

In [None]:
print("✅ Please wait while the installation completes. This may take a few "
      "minutes. If you encounter any dependency errors, you can ignore them.")
%pip install --upgrade -q botocore
%pip install --upgrade -q boto3
%pip install -q numpy==1.26.4
%pip install psycopg2-binary
%conda install -y -q -c conda-forge faiss-cpu
print("✅ Installation completed!")

from IPython.core.display import HTML
from IPython.display import display

try:
    display(HTML("<script>Jupyter.notebook.kernel.restart()</script>"))
    print("✅ Kernel restarted successfully")
except Exception as e:
    print("❌ Failed to restart the kernel")
    print(f"Error: {e}")

In this step, you import several libraries required for embedding generation, storage, and similarity search. You use `boto3` to interact with AWS services programmatically and `json` to handle JSON formatting for requests and responses. The `faiss` library is used to build and search a vector index in memory, enabling fast similarity lookups. `numpy` is imported to handle numerical operations and vector transformations required for cosine similarity calculations. Finally, `psycopg2` is used to connect to a `PostgreSQL` database and interact with it when storing and querying embeddings using the `pgvector` extension. You also create a Bedrock client using `boto3`, which allows you to invoke models from the `Amazon Bedrock` service.

In [None]:
try:
    import boto3
    import json
    import faiss
    import numpy as np
    import psycopg2
    print("----------------------------")
    print("✅ Libraries loaded successfully.")
except ImportError as e:
    print("----------------------------")
    print("❌ Failed to load libraries.")
    print(f"Error: {e}")
try:
    client = boto3.client(
        service_name="bedrock-runtime",
        region_name="us-east-1"
    )
    print("✅ bedrock-runtime client initialized successfully.")
except Exception as e:
    print("❌ Failed to initialize bedrock-runtime client.")
    print(f"Error: {e}")

### 2. Define Sample Recipes for Embedding Input

In this step you  define a set of recipes stored in a `Python` dictionary. Each entry in the dictionary includes the name of the recipe as the key and its full description as the value. These recipes will be used as input to generate embeddings for different storage and search scenarios.

In [None]:
recipes = {
        "Spaghetti Carbonara": """Boil spaghetti. In a pan, cook pancetta until crispy.
        Beat eggs with parmesan cheese. Combine spaghetti, pancetta, and egg mixture.
        Stir quickly to create a creamy sauce. Serve hot.""",

        "Chicken Curry": """Cook chopped onions, garlic, and ginger in oil. 
        Add curry powder, cumin, and turmeric. Stir in chicken pieces and brown
        them. Add tomatoes and simmer until chicken is cooked through.
        Serve with rice.""",

        "Vegan Salad": """Mix chopped kale, spinach, cherry tomatoes, and cucumbers.
        Add avocado slices and chickpeas. Dress with lemon juice, olive oil, and salt.""",

        "Grilled Cheese Sandwich": """Butter two slices of bread. Place cheddar cheese
        between them. Grill in a pan until bread is golden and cheese is melted."""
    }
print("✅ Dictionary created succesfully.")

### 3. Define Function to Generate Embeddings

This step defines the function to request `Amazon Bedrock` to invoke the `amazon.titan-embed-text-v2:0` model. It prepares the input text as a `JSON` payload, sends the request using the `Bedrock` client, and parses the response to extract the embedding. The resulting embedding is returned for use in storage or similarity search scenarios.

In [None]:
def get_embedding(text):
    payload = {"inputText": text}
    response = client.invoke_model(
        modelId="amazon.titan-embed-text-v2:0",
        body=json.dumps(payload),
        contentType="application/json"
    )
    result = json.loads(response['body'].read())
    return result["embedding"]

print("✅ Function to invoke model and create embeddings has been defined succesfully.")

### 4. Defining the Cosine Similarity Function

This step defines a function to calculate cosine similarity between two vectors. It converts the input lists to `NumPy` arrays and computes the cosine of the angle between them using the dot product and vector norms. The result is a value between -1 and 1, where higher values indicate greater similarity. This function will be used in scenarios where the storage solution does not provide built-in similarity search capabilities.

In [None]:
def cosine_similarity(vec1, vec2):
    vec1 = np.array(vec1)
    vec2 = np.array(vec2)
    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

print("✅ Cosine similarity function has been created succesfully.")

### 5. Store Embeddings in Memory

This method stores the recipe embeddings in memory and defines a function to perform a semantic similarity search. It uses the previously defined cosine similarity function to compare the query embedding with the in-memory stored embeddings and returns the most similar recipe based on the highest score.

In [None]:
# Store embeddings in memory
embedding_store = {}

for name, text in recipes.items():
    embedding = get_embedding(text)
    embedding_store[name] = {
        "text": text,
        "embedding": embedding
    }

# Perform semantic similarity search
def search_similar_recipes(query, top_k=1):
    query_embedding = get_embedding(query)
    results = []

    for name, data in embedding_store.items():
        score = cosine_similarity(query_embedding, data["embedding"])
        results.append((name, score, data["text"]))

    results.sort(key=lambda x: x[1], reverse=True)
    return results[:top_k]

# Example query
query = "How do I cook spaghetti with eggs, cheese, and pancetta?"
results = search_similar_recipes(query)

# Display results
for name, score, text in results:
    print(f"🔹Recipe: {name}")
    print(f"   Similarity Score: {score:.3f}")
    print(f"   Description: {text.strip()}\n")

Storing embeddings in memory is fast and cost-free, making it ideal for quick lookups and small-scale applications. It avoids the complexity of setting up external storage and offers low-latency access. However, it's volatile—data is lost when the application stops—and it doesn't scale well, as memory is limited and not suitable for storing large volumes of embeddings over time.

### 6. Store Embeddings in FAISS

This block generates normalized embeddings for each recipe and stores them in a `FAISS` index configured for cosine similarity. It defines a function that embeds the user’s query, normalizes it, and retrieves the most semantically similar recipe using inner product search.

In [None]:
# Get embedding for one sample recipe to determine embedding dimension
sample_embedding = get_embedding(next(iter(recipes.values())))
embedding_dim = len(sample_embedding)
print(f"Detected embedding dimension: {embedding_dim}")

# Create FAISS index for inner product (cosine similarity with normalized vectors)
index = faiss.IndexFlatIP(embedding_dim)

# Store recipe names and texts for mapping
recipe_lookup = []
embedding_matrix = []

for name, text in recipes.items():
    embedding = get_embedding(text)
    embedding_matrix.append(embedding)
    recipe_lookup.append((name, text))

# Convert to NumPy array and normalize for cosine similarity
embedding_matrix = np.array(embedding_matrix).astype('float32')
faiss.normalize_L2(embedding_matrix)  # normalize each vector to unit length

# Add embeddings to FAISS index
index.add(embedding_matrix)

# Semantic Search with FAISS using cosine similarity
def search_faiss(query, top_k=1):
    query_embedding = np.array(get_embedding(query)).astype('float32').reshape(1, -1)
    faiss.normalize_L2(query_embedding)  # normalize query vector
    scores, indices = index.search(query_embedding, top_k)

    results = []
    for idx, score in zip(indices[0], scores[0]):
        name, recipe_text = recipe_lookup[idx]
        results.append((name, score, recipe_text))
    return results

# Example query
query = "How do I cook chicken with onions?"
results = search_faiss(query)

# Display results
for name, score, text in results:
    print(f"🔹 Recipe: {name}")
    print(f"   Similarity Score: {score:.3f} (higher is more similar)")
    print(f"   Description: {text.strip()}\n")

Using `FAISS` in RAM offers fast and cost-effective similarity search with no external storage costs. Unlike plain in-memory storage, it supports large datasets through efficient indexing and can scale better. `FAISS` also supports disk-based indexes for persistence. However, in-memory use is still volatile and limited by available memory.

### 7. Store Embeddings in Amazon S3

This block generates embeddings for each recipe and stores them as `JSON` objects in an Amazon `S3` bucket. Since `S3` doesn't provide built-in similarity search capabilities, it uses the previously defined cosine similarity function to compare a query embedding against the stored embeddings. The top matching recipe is returned based on semantic relevance.

In [None]:
# Get dynamic AWS context
account_id = boto3.client("sts").get_caller_identity()["Account"]
region = boto3.session.Session().region_name
bucket_name = f"recipes-embeddings-bucket-{account_id}-{region}"
embedding_prefix = "recipes/"

# Initialize S3 client
s3 = boto3.client("s3", region_name=region)

# Generate and store embeddings in S3
for name, text in recipes.items():
    embedding = get_embedding(text)
    data = {
        "recipe_name": name,
        "text": text,
        "embedding": embedding
    }
    key = f"{embedding_prefix}{name.replace(' ', '_')}.json"
    s3.put_object(
        Bucket=bucket_name,
        Key=key,
        Body=json.dumps(data),
        ContentType="application/json"
    )

# Load embeddings from S3
def load_all_embeddings():
    response = s3.list_objects_v2(Bucket=bucket_name, Prefix=embedding_prefix)
    results = []
    for obj in response.get("Contents", []):
        file_obj = s3.get_object(Bucket=bucket_name, Key=obj["Key"])
        body = file_obj["Body"].read()
        data = json.loads(body)
        results.append(data)
    return results

# Search using cosine similarity
def search_similar_recipe(query, top_k=1):  
    query_embedding = get_embedding(query)
    stored_items = load_all_embeddings()

    similarities = []
    for item in stored_items:
        score = cosine_similarity(query_embedding, item["embedding"])
        similarities.append((item["recipe_name"], score, item["text"]))

    similarities.sort(key=lambda x: x[1], reverse=True)
    return similarities[:top_k]

# Example query
query = "How do I prepare a vegan salad?"
results = search_similar_recipe(query)

# Display results
for name, score, text in results:
    print(f"🔹 Recipe: {name}")
    print(f"   Similarity Score: {score:.3f}")
    print(f"   Description: {text.strip()}\n")

Using `Amazon S3` to store embeddings offers durability, scalability, and low long-term storage. It persists data across sessions and is ideal for batch processing or infrequently accessed embeddings. Compared to in-memory strategies, `S3` incurs cost per request and data volume stored, and it's slower for real-time search due to network latency and the need to load data before querying. However, it scales far beyond memory limits and supports easy data sharing across services or notebooks.

### 8. Store Embeddings in PostgreSQL DB with `pgvector` Vector Extension 

This block connects to a `PostgreSQL` database, generates embeddings for a set of recipe descriptions, and inserts them into a table that supports vector search using the `pgvector` extension. It then performs a similarity search by embedding a user query and retrieving the top relevant recipe, ordered by distance from the query embedding.

In [None]:
# Configuration
DB_INSTANCE_ID = "recipes-pg-db"
DB_NAME = "recipesdb"
DB_PORT = 5432
DB_USER = "masterusername"

session = boto3.session.Session()
region = session.region_name
rds = boto3.client("rds", region_name=region)
sts = boto3.client("sts", region_name=region)

# Get RDS endpoint
db_info = rds.describe_db_instances(DBInstanceIdentifier=DB_INSTANCE_ID)
db_endpoint = db_info['DBInstances'][0]['Endpoint']['Address']

# Get DB password from AWS account ID
account_id = sts.get_caller_identity()['Account']
db_password = account_id

# Connect to PostgreSQL
conn = psycopg2.connect(
    host=db_endpoint,
    port=DB_PORT,
    database=DB_NAME,
    user=DB_USER,
    password=db_password,
    sslmode="require"
)
cursor = conn.cursor()

# Ensure pgvector is installed
cursor.execute("CREATE EXTENSION IF NOT EXISTS vector;")
conn.commit()

# Create recipes table with vector(1024) if not exist
cursor.execute("""
    CREATE TABLE IF NOT EXISTS recipes (
        id SERIAL PRIMARY KEY,
        name TEXT UNIQUE,
        description TEXT,
        embedding vector(1024)
    );
""")
conn.commit()

# Insert embeddings only if not already present
for name, desc in recipes.items():
    cursor.execute("SELECT 1 FROM recipes WHERE name = %s", (name,))
    if cursor.fetchone():
        continue
    embedding = get_embedding(desc)  # Your embedding function must return 1024-dim vector
    cursor.execute(
        "INSERT INTO recipes (name, description, embedding) VALUES (%s, %s, %s)",
        (name, desc, embedding)
    )
conn.commit()

# Perform semantic similarity search
query = "How do I cook a cheese sandwich?"
query_embedding = get_embedding(query)

cursor.execute("""
    SELECT name, description, embedding <=> %s::vector AS cosine_distance
    FROM recipes
    ORDER BY cosine_distance ASC
    LIMIT 1;
""", (query_embedding,))

results = cursor.fetchall()
for name, description, distance in results:
    print(f"🔹 Recipe: {name}")
    print(f"   Distance: {distance:.3f}")
    print(f"   Description: {description.strip()}\n")

# Display results
cursor.close()
conn.close()

Databases with vector support provide persistent, indexed storage and efficient similarity search, making them well-suited for real-time embedding queries and structured data. They handle larger datasets better than memory but require more setup and can incur higher costs as data and query volume grow.

Congratulations! You have successfully completed objective two **Choose an Efficient Storage Strategy for Embeddings**. You can now close the `objective_three.ipynb` file.