# Caching Embeddings

RedisVL provides an `EmbeddingsCache` that makes it easy to store and retrieve embedding vectors with their associated text and metadata. This cache is particularly useful for applications that frequently compute the same embeddings, enabling you to:

- Reduce computational costs by reusing previously computed embeddings
- Decrease latency in applications that rely on embeddings
- Store additional metadata alongside embeddings for richer applications

This notebook will show you how to use the `EmbeddingsCache` effectively in your applications.

## Setup

First, let's import the necessary libraries. We'll use a text embedding model from HuggingFace to generate our embeddings.

In [1]:
import os
import time
import numpy as np

# Disable tokenizers parallelism to avoid deadlocks
os.environ["TOKENIZERS_PARALLELISM"] = "False"

# Import the EmbeddingsCache
from redisvl.extensions.cache.embeddings import EmbeddingsCache
from redisvl.utils.vectorize import HFTextVectorizer

Let's create a vectorizer to generate embeddings for our texts:

In [2]:
# Initialize the vectorizer
vectorizer = HFTextVectorizer(
    model="sentence-transformers/all-mpnet-base-v2",
    cache_folder=os.getenv("SENTENCE_TRANSFORMERS_HOME")
)

## Initializing the EmbeddingsCache

Now let's initialize our `EmbeddingsCache`. The cache requires a Redis connection to store the embeddings and their associated data.

In [3]:
# Initialize the embeddings cache
cache = EmbeddingsCache(
    name="embedcache",                  # name prefix for Redis keys
    redis_url="redis://localhost:6379",  # Redis connection URL
    ttl=None                            # Optional TTL in seconds (None means no expiration)
)

## Basic Usage

### Storing Embeddings

Let's store some text with its embedding in the cache. The `set` method takes the following parameters:
- `text`: The input text that was embedded
- `model_name`: The name of the embedding model used
- `embedding`: The embedding vector
- `metadata`: Optional metadata associated with the embedding
- `ttl`: Optional time-to-live override for this specific entry

In [4]:
# Text to embed
text = "What is machine learning?"
model_name = "sentence-transformers/all-mpnet-base-v2"

# Generate the embedding
embedding = vectorizer.embed(text)

# Optional metadata
metadata = {"category": "ai", "source": "user_query"}

# Store in cache
key = cache.set(
    text=text,
    model_name=model_name,
    embedding=embedding,
    metadata=metadata
)

print(f"Stored with key: {key[:15]}...")

Stored with key: embedcache:a1b2c3d4...


### Retrieving Embeddings

To retrieve an embedding from the cache, use the `get` method with the original text and model name:

In [5]:
# Retrieve from cache

if result := cache.get(text=text, model_name=model_name):
    print(f"Found in cache: {result['text']}")
    print(f"Model: {result['model_name']}")
    print(f"Metadata: {result['metadata']}")
    print(f"Embedding shape: {np.array(result['embedding']).shape}")
else:
    print("Not found in cache.")

Found in cache: What is machine learning?
Model: sentence-transformers/all-mpnet-base-v2
Metadata: {'category': 'ai', 'source': 'user_query'}
Embedding shape: (768,)


### Checking Existence

You can check if an embedding exists in the cache without retrieving it using the `exists` method:

In [6]:
# Check if existing text is in cache
exists = cache.exists(text=text, model_name=model_name)
print(f"First query exists in cache: {exists}")

# Check if a new text is in cache
new_text = "What is deep learning?"
exists = cache.exists(text=new_text, model_name=model_name)
print(f"New query exists in cache: {exists}")

First query exists in cache: True
New query exists in cache: False


### Removing Entries

To remove an entry from the cache, use the `drop` method:

In [7]:
# Remove from cache
cache.drop(text=text, model_name=model_name)

# Verify it's gone
exists = cache.exists(text=text, model_name=model_name)
print(f"After dropping: {exists}")

After dropping: False


## Advanced Usage

### Key-Based Operations

The `EmbeddingsCache` also provides methods that work directly with Redis keys, which can be useful for advanced use cases:

In [8]:
# Store an entry again
key = cache.set(
    text=text,
    model_name=model_name,
    embedding=embedding,
    metadata=metadata
)
print(f"Stored with key: {key[:15]}...")

# Check existence by key
exists_by_key = cache.exists_by_key(key)
print(f"Exists by key: {exists_by_key}")

# Retrieve by key
result_by_key = cache.get_by_key(key)
print(f"Retrieved by key: {result_by_key['text']}")

# Drop by key
cache.drop_by_key(key)

Stored with key: embedcache:a1b2c3d4...
Exists by key: True
Retrieved by key: What is machine learning?


### Working with TTL (Time-To-Live)

You can set a global TTL when initializing the cache, or specify TTL for individual entries:

In [9]:
# Create a cache with a default 5-second TTL
ttl_cache = EmbeddingsCache(
    name="ttl_cache",
    redis_url="redis://localhost:6379",
    ttl=5  # 5 second TTL
)

# Store an entry
key = ttl_cache.set(
    text=text,
    model_name=model_name,
    embedding=embedding
)

# Check if it exists
exists = ttl_cache.exists_by_key(key)
print(f"Immediately after setting: {exists}")

# Wait for it to expire
time.sleep(6)

# Check again
exists = ttl_cache.exists_by_key(key)
print(f"After waiting: {exists}")

Immediately after setting: True
After waiting: False


You can also override the default TTL for individual entries:

In [10]:
# Store an entry with a custom 1-second TTL
key1 = ttl_cache.set(
    text="Short-lived entry",
    model_name=model_name,
    embedding=embedding,
    ttl=1  # Override with 1 second TTL
)

# Store another entry with the default TTL (5 seconds)
key2 = ttl_cache.set(
    text="Default TTL entry",
    model_name=model_name,
    embedding=embedding
    # No TTL specified = uses the default 5 seconds
)

# Wait for 2 seconds
time.sleep(2)

# Check both entries
exists1 = ttl_cache.exists_by_key(key1)
exists2 = ttl_cache.exists_by_key(key2)

print(f"Entry with custom TTL after 2 seconds: {exists1}")
print(f"Entry with default TTL after 2 seconds: {exists2}")

# Cleanup
ttl_cache.drop_by_key(key2)

Entry with custom TTL after 2 seconds: False
Entry with default TTL after 2 seconds: True


## Async Support

The `EmbeddingsCache` provides async versions of all methods for use in async applications. The async methods are prefixed with `a` (e.g., `aset`, `aget`, `aexists`, `adrop`).

In [11]:
async def async_cache_demo():
    # Store an entry asynchronously
    key = await cache.aset(
        text="Async embedding",
        model_name=model_name,
        embedding=embedding,
        metadata={"async": True}
    )
    
    # Check if it exists
    exists = await cache.aexists_by_key(key)
    print(f"Async set successful? {exists}")
    
    # Retrieve it
    result = await cache.aget_by_key(key)
    success = result is not None and result["text"] == "Async embedding"
    print(f"Async get successful? {success}")
    
    # Remove it
    await cache.adrop_by_key(key)

# Run the async demo
await async_cache_demo()

Async set successful? True
Async get successful? True


## Real-World Example

Let's build a simple embeddings caching system for a text classification task. We'll check the cache before computing new embeddings to save computation time.

In [12]:
# Create a fresh cache for this example
example_cache = EmbeddingsCache(
    name="example_cache",
    redis_url="redis://localhost:6379",
    ttl=3600  # 1 hour TTL
)

# Function to get embedding with caching
def get_cached_embedding(text, model_name):
    # Check if it's in the cache first
    if cached_result := example_cache.get(text=text, model_name=model_name):
        print(f"Found in cache: {text}")
        return cached_result["embedding"]
    
    # Not in cache, compute the embedding
    print(f"Computing embedding for: {text}")
    embedding = vectorizer.embed(text)
    
    # Store in cache
    example_cache.set(
        text=text,
        model_name=model_name,
        embedding=embedding,
    )
    
    return embedding

# Simulate processing a stream of queries
queries = [
    "What is artificial intelligence?",
    "How does machine learning work?",
    "What is artificial intelligence?",  # Repeated query
    "What are neural networks?",
    "How does machine learning work?"   # Repeated query
]

# Process the queries and track statistics
total_queries = 0
cache_hits = 0

for query in queries:
    total_queries += 1
    
    # Check cache before computing
    before = example_cache.exists(text=query, model_name=model_name)
    if before:
        cache_hits += 1
    
    # Get embedding (will compute or use cache)
    embedding = get_cached_embedding(query, model_name)

# Report statistics
cache_misses = total_queries - cache_hits
hit_rate = (cache_hits / total_queries) * 100

print("\nStatistics:")
print(f"Total queries: {total_queries}")
print(f"Cache hits: {cache_hits}")
print(f"Cache misses: {cache_misses}")
print(f"Cache hit rate: {hit_rate:.1f}%")

# Cleanup
for query in set(queries):  # Use set to get unique queries
    example_cache.drop(text=query, model_name=model_name)

Computing embedding for: How does machine learning work?
Found in cache: What is artificial intelligence?
Computing embedding for: What are neural networks?
Found in cache: How does machine learning work?
Found in cache: What are neural networks?

Statistics:
Total queries: 5
Cache hits: 3
Cache misses: 2
Cache hit rate: 60.0%


## Performance Benchmark

Let's run a benchmark to compare the performance of embedding with and without caching. We'll measure the time it takes to process the same query multiple times.

In [13]:
from tqdm.notebook import tqdm

# Text to use for benchmarking
benchmark_text = "This is a benchmark text to measure the performance of embedding caching."
benchmark_model = "sentence-transformers/all-mpnet-base-v2"

# Create a fresh cache for benchmarking
benchmark_cache = EmbeddingsCache(
    name="benchmark_cache",
    redis_url="redis://localhost:6379",
    ttl=3600  # 1 hour TTL
)

# Function to get embeddings without caching
def get_embedding_without_cache(text, model_name):
    return vectorizer.embed(text)

# Function to get embeddings with caching
def get_embedding_with_cache(text, model_name):
    if cached_result := benchmark_cache.get(text=text, model_name=model_name):
        return cached_result["embedding"]
    
    embedding = vectorizer.embed(text)
    benchmark_cache.set(
        text=text,
        model_name=model_name,
        embedding=embedding
    )
    return embedding

# Number of iterations for the benchmark
n_iterations = 10

# Benchmark without caching
print("Benchmarking without caching:")
start_time = time.time()
for _ in tqdm(range(n_iterations)):
    _ = get_embedding_without_cache(benchmark_text, benchmark_model)
no_cache_time = time.time() - start_time
print(f"Time taken without caching: {no_cache_time:.4f} seconds")
print(f"Average time per embedding: {no_cache_time/n_iterations:.4f} seconds")

# Benchmark with caching
print("\nBenchmarking with caching:")
start_time = time.time()
for _ in tqdm(range(n_iterations)):
    _ = get_embedding_with_cache(benchmark_text, benchmark_model)
cache_time = time.time() - start_time
print(f"Time taken with caching: {cache_time:.4f} seconds")
print(f"Average time per embedding: {cache_time/n_iterations:.4f} seconds")

# Compare performance
speedup = no_cache_time / cache_time
latency_reduction = (no_cache_time/n_iterations) - (cache_time/n_iterations)
print(f"\nPerformance comparison:")
print(f"Speedup with caching: {speedup:.2f}x faster")
print(f"Time saved: {no_cache_time - cache_time:.4f} seconds ({(1 - cache_time/no_cache_time) * 100:.1f}%)")
print(f"Latency reduction: {latency_reduction:.4f} seconds per query")


Benchmarking without caching:


  0%|          | 0/10 [00:00<?, ?it/s]

Time taken without caching: 0.8720 seconds
Average time per embedding: 0.0872 seconds

Benchmarking with caching:


  0%|          | 0/10 [00:00<?, ?it/s]

Time taken with caching: 0.0524 seconds
Average time per embedding: 0.0052 seconds

Performance comparison:
Speedup with caching: 16.64x faster
Time saved: 0.8196 seconds (94.0%)
Latency reduction: 0.0820 seconds per query


## Common Use Cases for Embedding Caching

Embedding caching is particularly useful in the following scenarios:

1. **Search applications**: Cache embeddings for frequently searched queries to reduce latency
2. **Content recommendation systems**: Cache embeddings for content items to speed up similarity calculations
3. **API services**: Reduce costs and improve response times when generating embeddings through paid APIs
4. **Batch processing**: Speed up processing of datasets that contain duplicate texts
5. **Chatbots and virtual assistants**: Cache embeddings for common user queries to provide faster responses
6. **Development** workflows

## Cleanup

Let's clean up our caches to avoid leaving data in Redis:

In [14]:
# Clean up all caches
cache.clear()
ttl_cache.clear()
example_cache.clear()
benchmark_cache.clear()

## Summary

The `EmbeddingsCache` provides an efficient way to store and retrieve embeddings with their associated text and metadata. Key features include:

- Simple API for storing and retrieving embeddings
- Support for metadata storage alongside embeddings
- Configurable time-to-live (TTL) for cache entries
- Key-based operations for advanced use cases
- Async support for use in asynchronous applications
- Significant performance improvements (16x faster in our benchmark)

By using the `EmbeddingsCache`, you can reduce computational costs and improve the performance of applications that rely on embeddings.