# Caching Responses and Cost-efficient Embedding Storage with Amazon Bedrock

## Objective Three: Choose an Efficient Storage Strategy for Embeddings

In this hands-on walkthrough, you’ll use the boto3 Python SDK to interact with Amazon Bedrock and generate text embeddings using the `amazon.titan-embed-text-v2:0` model. You’ll create embeddings from a set of meal recipes and explore different storage and search strategies, including in-memory storage, FAISS for vector indexing, Amazon S3 for persistent object storage, and PostgreSQL with the pgvector extension for SQL-based similarity search. Each approach demonstrates how to store and retrieve embeddings efficiently, along with its performance characteristics and associated storage costs.

### 1. Prepare the Environment

This step includes the code to install the required Python packages needed for the rest of the exercise and restart the kernel to ensure the packages are properly loaded. While running, you might see some pip dependency warnings. These can be safely ignored as they won’t impact the steps we’re performing here.

In [None]:
%pip install --upgrade -q botocore
%pip install --upgrade -q boto3
%pip install --upgrade -q numpy
%pip install --upgrade -q faiss

from IPython.core.display import HTML
from IPython.display import display

try:
    display(HTML("<script>Jupyter.notebook.kernel.restart()</script>"))
    print("✅ Kernel restarted successfully")
except Exception as e:
    print("❌ Failed to restart the kernel")
    print(f"Error: {e}")

In this step, you import several libraries required for embedding generation, storage, and similarity search. You use `boto3` to interact with AWS services programmatically and `json` to handle JSON formatting for requests and responses. The `faiss` library is used to build and search a vector index in memory, enabling fast similarity lookups. `numpy` is imported to handle numerical operations and vector transformations required for cosine similarity calculations. Finally, `psycopg2` is used to connect to a PostgreSQL database and interact with it when storing and querying embeddings using the pgvector extension. You also create a Bedrock client using boto3, which allows you to invoke models from the Amazon Bedrock service.

In [None]:
try:
    import boto3
    import json
    import faiss
    import numpy as np
    import psycopg2
    print("----------------------------")
    print("✅ Libraries loaded successfully.")
except ImportError as e:
    print("----------------------------")
    print("❌ Failed to load libraries.")
    print(f"Error: {e}")
try:
    client = boto3.client(
        service_name="bedrock-runtime",
        region_name="us-east-1"
    )
    print("✅ bedrock-runtime client initialized successfully.")
except Exception as e:
        print("❌ Failed to initialize bedrock-runtime client.")
        print(f"Error: {e}")

### 2. Sample Embeddings

In this step you  define a set of recipes stored in a Python dictionary. Each entry in the dictionary includes the name of the recipe as the key and its full description as the value. These recipes will be used as input to generate embeddings for different storage and search scenarios.

In [None]:
recipes = {
        "Spaghetti Carbonara": """Boil spaghetti. In a pan, cook pancetta until crispy.
        Beat eggs with parmesan cheese. Combine spaghetti, pancetta, and egg mixture.
        Stir quickly to create a creamy sauce. Serve hot.""",
        
        "Chicken Curry": """Cook chopped onions, garlic, and ginger in oil. Add curry powder,
        cumin, and turmeric. Stir in chicken pieces and brown them. Add tomatoes and
        simmer until chicken is cooked through. Serve with rice.""",
        
        "Vegan Salad": """Mix chopped kale, spinach, cherry tomatoes, and cucumbers.
        Add avocado slices and chickpeas. Dress with lemon juice, olive oil, and salt.""",
        
        "Grilled Cheese Sandwich": """Butter two slices of bread. Place cheddar cheese
        between them. Grill in a pan until bread is golden and cheese is melted."""
    }

### 3. Invoke Embedding Model

This step defines the function to request Amazon Bedrock to invoke the `amazon.titan-embed-text-v2:0` model. It prepares the input text as a JSON payload, sends the request using the Bedrock client, and parses the response to extract the embedding. The resulting embedding is returned for use in storage or similarity search scenarios.

In [None]:
def get_embedding(text):
    payload = {"inputText": text}
    response = client.invoke_model(
        modelId="amazon.titan-embed-text-v2:0",
        body=json.dumps(payload),
        contentType="application/json"
    )
    result = json.loads(response['body'].read())
    return result["embedding"]

### 4. Defining the Cosine Similarity Function

This step defines a function to calculate cosine similarity between two vectors. It converts the input lists to NumPy arrays, then computes the cosine of the angle between them using the dot product and vector norms. The result is a value between -1 and 1, where higher values indicate greater similarity.

In [None]:
def cosine_similarity(vec1, vec2):
        vec1 = np.array(vec1)
        vec2 = np.array(vec2)
        return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

### 5. Store Embeddings in Memory

This step stores the recipe embeddings in memory and defines a function to perform a semantic similarity search. It generates an embedding for the input query, compares it with stored embeddings using cosine similarity, and returns the most similar recipes based on the highest scores.

In [None]:
# Store embeddings in memory
embedding_store = {}

for name, text in recipes.items():
    embedding = get_embedding(text)
    embedding_store[name] = {
        "text": text,
        "embedding": embedding
    }

# Perform semantic similarity search
def search_similar_recipes(query, top_k=2):
    query_embedding = get_embedding(query)
    results = []
    
    for name, data in embedding_store.items():
        score = cosine_similarity(query_embedding, data["embedding"])
        results.append((name, score, data["text"]))
        
        results.sort(key=lambda x: x[1], reverse=True)
        return results[:top_k]

    # Run a search for Spaghetti Carbonara-like recipe
    query = "How do I cook spaghetti with eggs, cheese, and pancetta?"
    results = search_similar_recipes(query)

    # Print results
    for name, score, text in results:
        print(f"🔹 Recipe: {name}")
        print(f"   Similarity Score: {score:.3f}")
        print(f"   Description: {text.strip()}\n")

Storing embeddings in memory is fast and cost-free, making it ideal for quick lookups and small-scale applications. It avoids the complexity of setting up external storage and offers low-latency access. However, it's volatile—data is lost when the application stops—and it doesn't scale well, as memory is limited and not suitable for storing large volumes of embeddings over time.