# Caching Embeddings with RedisVL

RedisVL provides an `EmbeddingsCache` that makes it easy to store and retrieve embedding vectors with their associated text and metadata. This cache is particularly useful for applications that frequently compute the same embeddings, enabling you to:

- Reduce computational costs by reusing previously computed embeddings
- Decrease latency in applications that rely on embeddings
- Store additional metadata alongside embeddings for richer applications

This notebook will show you how to use the `EmbeddingsCache` effectively in your applications.

## Setup

First, let's import the necessary libraries. We'll use a LangChain4J embedding model to generate our embeddings.

In [1]:
// Load Maven dependencies
%maven redis.clients:jedis:6.2.0
%maven org.slf4j:slf4j-nop:2.0.16
%maven com.fasterxml.jackson.core:jackson-databind:2.18.0
%maven com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.18.0
%maven com.github.f4b6a3:ulid-creator:5.2.3
%maven dev.langchain4j:langchain4j:0.36.2
%maven dev.langchain4j:langchain4j-embeddings-all-minilm-l6-v2:0.36.2

// Import RedisVL classes
import com.redis.vl.extensions.cache.*;
import com.redis.vl.utils.vectorize.*;

// Import Redis client
import redis.clients.jedis.UnifiedJedis;
import redis.clients.jedis.HostAndPort;

// Import LangChain4J
import dev.langchain4j.model.embedding.onnx.allminilml6v2.AllMiniLmL6V2EmbeddingModel;

// Import Java standard libraries
import java.util.*;
import java.util.stream.Collectors;
import java.time.Duration;

Let's create a vectorizer to generate embeddings for our texts:

In [2]:
// Initialize the vectorizer
BaseVectorizer vectorizer = new LangChain4JVectorizer(
    "all-minilm-l6-v2",
    new AllMiniLmL6V2EmbeddingModel(),
    384,
    "float32"
);

System.out.println("Vectorizer initialized: " + vectorizer.getModelName());
System.out.println("Dimensions: " + vectorizer.getDimensions());

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.


Vectorizer initialized: all-minilm-l6-v2
Dimensions: 384


## Initializing the EmbeddingsCache

Now let's initialize our `EmbeddingsCache`. The cache requires a Redis connection to store the embeddings and their associated data.

In [3]:
// Create Redis connection
UnifiedJedis jedis = new UnifiedJedis(new HostAndPort("redis-stack", 6379));

// Initialize the embeddings cache
EmbeddingsCache cache = new EmbeddingsCache(
    "embedcache",     // name prefix for Redis keys
    jedis,            // Redis connection
    null              // Optional TTL in seconds (null means no expiration)
);

System.out.println("EmbeddingsCache initialized with prefix: embedcache");

EmbeddingsCache initialized with prefix: embedcache


## Basic Usage

### Storing Embeddings

Let's store some text with its embedding in the cache. The `set` method takes the following parameters:
- `text`: The input text that was embedded
- `modelName`: The name of the embedding model used
- `embedding`: The embedding vector

In [4]:
// Text to embed
String text = "What is machine learning?";
String modelName = vectorizer.getModelName();

// Generate the embedding
float[] embedding = vectorizer.embed(text);

// Store in cache
cache.set(text, modelName, embedding);

System.out.println("Stored embedding for: " + text);
System.out.println("Model name: " + modelName);
System.out.println("Embedding dimensions: " + embedding.length);

Stored embedding for: What is machine learning?
Model name: all-minilm-l6-v2
Embedding dimensions: 384


### Retrieving Embeddings

To retrieve an embedding from the cache, use the `get` method with the original text and model name:

In [5]:
// Retrieve from cache
Optional<float[]> result = cache.get(text, modelName);
if (result.isPresent()) {
    System.out.println("Found in cache: " + text);
    System.out.println("Model: " + modelName);
    
    float[] cachedEmbedding = result.get();
    System.out.println("Embedding shape: (" + cachedEmbedding.length + ",)");
} else {
    System.out.println("Not found in cache.");
}

Found in cache: What is machine learning?
Model: all-minilm-l6-v2
Embedding shape: (384,)


### Checking Existence

You can check if an embedding exists in the cache without retrieving it using the `exists` method:

In [6]:
// Check if existing text is in cache
boolean exists = cache.exists(text, modelName);
System.out.println("First query exists in cache: " + exists);

// Check if a new text is in cache
String newText = "What is deep learning?";
exists = cache.exists(newText, modelName);
System.out.println("New query exists in cache: " + exists);

First query exists in cache: true
New query exists in cache: false


### Removing Entries

To remove an entry from the cache, use the `drop` method:

In [7]:
// Remove from cache
cache.drop(text, modelName);

// Verify it's gone
exists = cache.exists(text, modelName);
System.out.println("After dropping: " + exists);

After dropping: false


## Advanced Usage

### Key-Based Operations

The `EmbeddingsCache` uses SHA-256 hashing internally to generate keys from text and model names.

In [8]:
// Store an entry again
cache.set(text, modelName, embedding);
System.out.println("Stored embedding for: " + text);

// Check existence
boolean existsNow = cache.exists(text, modelName);
System.out.println("Entry exists: " + existsNow);

// Drop the entry
cache.drop(text, modelName);
System.out.println("Entry dropped");

Stored embedding for: What is machine learning?
Entry exists: true
Entry dropped


### Batch Operations

When working with multiple embeddings, batch operations can significantly improve performance by reducing network roundtrips. The `EmbeddingsCache` provides methods prefixed with `m` (for "multi") that handle batches efficiently.

In [9]:
// Create multiple embeddings
List<String> texts = List.of(
    "What is machine learning?",
    "How do neural networks work?",
    "What is deep learning?"
);

List<float[]> embeddings = texts.stream()
    .map(t -> vectorizer.embed(t))
    .collect(Collectors.toList());

// Prepare batch items for mset
Map<String, float[]> embeddingMap = new HashMap<>();
for (int i = 0; i < texts.size(); i++) {
    embeddingMap.put(texts.get(i), embeddings.get(i));
}

// Store multiple embeddings in one operation
cache.mset(embeddingMap, modelName);
System.out.println("Stored " + embeddingMap.size() + " embeddings with batch operation");

// Check if multiple embeddings exist in one operation
Map<String, Boolean> existResults = cache.mexists(texts, modelName);
boolean allExist = existResults.values().stream().allMatch(Boolean::booleanValue);
System.out.println("All embeddings exist: " + allExist);

// Retrieve multiple embeddings in one operation
Map<String, float[]> results = cache.mget(texts, modelName);
System.out.println("Retrieved " + results.size() + " embeddings in one operation");

// Delete multiple embeddings in one operation
cache.mdrop(texts, modelName);
System.out.println("Deleted all embeddings with batch operation");

Stored 3 embeddings with batch operation
All embeddings exist: true
Retrieved 3 embeddings in one operation
Deleted all embeddings with batch operation


Batch operations are particularly beneficial when working with large numbers of embeddings. They provide the same functionality as individual operations but with better performance by reducing network roundtrips.

### Working with TTL (Time-To-Live)

You can set a global TTL when initializing the cache, or specify TTL for individual entries:

In [10]:
// Create a cache with a default 5-second TTL
EmbeddingsCache ttlCache = new EmbeddingsCache(
    "ttl_cache",
    jedis,
    5  // 5 second TTL
);

// Store an entry
String ttlText = "This is a TTL test";
float[] ttlEmbedding = vectorizer.embed(ttlText);
ttlCache.set(ttlText, modelName, ttlEmbedding);

// Check if it exists
boolean exists = ttlCache.exists(ttlText, modelName);
System.out.println("Immediately after setting: " + exists);

// Wait for it to expire
Thread.sleep(6000); // Sleep for 6 seconds

// Check again
exists = ttlCache.exists(ttlText, modelName);
System.out.println("After waiting: " + exists);

Immediately after setting: true
After waiting: false


You can also override the default TTL for individual entries:

In [11]:
// Store an entry with a custom 1-second TTL
String shortLivedText = "Short-lived entry";
float[] shortLivedEmbedding = vectorizer.embed(shortLivedText);
ttlCache.setWithTTL(
    shortLivedText,
    modelName,
    shortLivedEmbedding,
    1  // Override with 1 second TTL
);

// Store another entry with the default TTL (5 seconds)
String defaultTTLText = "Default TTL entry";
float[] defaultTTLEmbedding = vectorizer.embed(defaultTTLText);
ttlCache.set(
    defaultTTLText,
    modelName,
    defaultTTLEmbedding
    // No TTL specified = uses the default 5 seconds
);

// Wait for 2 seconds
Thread.sleep(2000);

// Check both entries
boolean exists1 = ttlCache.exists(shortLivedText, modelName);
boolean exists2 = ttlCache.exists(defaultTTLText, modelName);

System.out.println("Entry with custom TTL after 2 seconds: " + exists1);
System.out.println("Entry with default TTL after 2 seconds: " + exists2);

// Cleanup
if (exists2) {
    ttlCache.drop(defaultTTLText, modelName);
}

Entry with custom TTL after 2 seconds: false
Entry with default TTL after 2 seconds: true


## Real-World Example

Let's build a simple embeddings caching system for a text classification task. We'll check the cache before computing new embeddings to save computation time.

In [12]:
// Create a fresh cache for this example
EmbeddingsCache exampleCache = new EmbeddingsCache(
    "example_cache",
    jedis,
    3600  // 1 hour TTL
);

// Create a vectorizer with cache integration
BaseVectorizer cachedVectorizer = new LangChain4JVectorizer(
    "all-minilm-l6-v2",
    new AllMiniLmL6V2EmbeddingModel(),
    384,
    "float32"
);
cachedVectorizer.setCache(exampleCache);

// Simulate processing a stream of queries
List<String> queries = List.of(
    "What is artificial intelligence?",
    "How does machine learning work?",
    "What is artificial intelligence?",  // Repeated query
    "What are neural networks?",
    "How does machine learning work?"   // Repeated query
);

// Process the queries and track statistics
int totalQueries = 0;
int cacheHits = 0;

for (String query : queries) {
    totalQueries++;
    
    // Check cache before computing
    boolean before = exampleCache.exists(query, cachedVectorizer.getModelName());
    if (before) {
        cacheHits++;
    }
    
    // Get embedding (will compute or use cache)
    float[] queryEmbedding = cachedVectorizer.embed(query);
}

// Report statistics
int cacheMisses = totalQueries - cacheHits;
double hitRate = (cacheHits / (double) totalQueries) * 100;

System.out.println("\nStatistics:");
System.out.println("Total queries: " + totalQueries);
System.out.println("Cache hits: " + cacheHits);
System.out.println("Cache misses: " + cacheMisses);
System.out.println("Cache hit rate: " + String.format("%.1f", hitRate) + "%");

// Cleanup
Set<String> uniqueQueries = new HashSet<>(queries);
for (String query : uniqueQueries) {
    exampleCache.drop(query, cachedVectorizer.getModelName());
}

System.out.println("Cache cleaned up");


Statistics:
Total queries: 5
Cache hits: 2
Cache misses: 3
Cache hit rate: 40.0%
Cache cleaned up


## Performance Benchmark

Let's run benchmarks to compare the performance of embedding with and without caching, as well as batch versus individual operations.

In [13]:
// Text to use for benchmarking
String benchmarkText = "This is a benchmark text to measure the performance of embedding caching.";

// Create a fresh cache for benchmarking
EmbeddingsCache benchmarkCache = new EmbeddingsCache(
    "benchmark_cache",
    jedis,
    3600  // 1 hour TTL
);

// Create vectorizer with cache
BaseVectorizer benchmarkVectorizer = new LangChain4JVectorizer(
    "all-minilm-l6-v2",
    new AllMiniLmL6V2EmbeddingModel(),
    384,
    "float32"
);
benchmarkVectorizer.setCache(benchmarkCache);

// Number of iterations for the benchmark
int nIterations = 10;

// Benchmark without caching
System.out.println("Benchmarking without caching:");
long startTime = System.currentTimeMillis();
for (int i = 0; i < nIterations; i++) {
    float[] uncachedEmbedding = benchmarkVectorizer.embed(benchmarkText, null, false, true); // skip cache
}
long noCacheTime = System.currentTimeMillis() - startTime;
System.out.println("Time taken without caching: " + (noCacheTime / 1000.0) + " seconds");
System.out.println("Average time per embedding: " + (noCacheTime / 1000.0 / nIterations) + " seconds");

// Benchmark with caching
System.out.println("\nBenchmarking with caching:");
startTime = System.currentTimeMillis();
for (int i = 0; i < nIterations; i++) {
    float[] cachedEmbedding = benchmarkVectorizer.embed(benchmarkText);
}
long cacheTime = System.currentTimeMillis() - startTime;
System.out.println("Time taken with caching: " + (cacheTime / 1000.0) + " seconds");
System.out.println("Average time per embedding: " + (cacheTime / 1000.0 / nIterations) + " seconds");

// Compare performance
double speedup = (double) noCacheTime / cacheTime;
double latencyReduction = (noCacheTime / 1000.0 / nIterations) - (cacheTime / 1000.0 / nIterations);
System.out.println("\nPerformance comparison:");
System.out.println("Speedup with caching: " + String.format("%.2f", speedup) + "x faster");
System.out.println("Time saved: " + ((noCacheTime - cacheTime) / 1000.0) + " seconds (" + 
                   String.format("%.1f", (1 - (double) cacheTime / noCacheTime) * 100) + "%)");
System.out.println("Latency reduction: " + String.format("%.4f", latencyReduction) + " seconds per query");

Benchmarking without caching:
Time taken without caching: 0.079 seconds
Average time per embedding: 0.0079 seconds

Benchmarking with caching:
Time taken with caching: 0.028 seconds
Average time per embedding: 0.0028 seconds

Performance comparison:
Speedup with caching: 2.82x faster
Time saved: 0.051 seconds (64.6%)
Latency reduction: 0.0051 seconds per query


## Common Use Cases for Embedding Caching

Embedding caching is particularly useful in the following scenarios:

1. **Search applications**: Cache embeddings for frequently searched queries to reduce latency
2. **Content recommendation systems**: Cache embeddings for content items to speed up similarity calculations
3. **API services**: Reduce costs and improve response times when generating embeddings through paid APIs
4. **Batch processing**: Speed up processing of datasets that contain duplicate texts
5. **Chatbots and virtual assistants**: Cache embeddings for common user queries to provide faster responses
6. **Development workflows**: Speed up development and testing by caching embeddings during experimentation

## Cleanup

Let's clean up our caches to avoid leaving data in Redis:

In [14]:
// Clean up only the caches we created in this notebook
// Note: cache.clear() is not available, we need to clean up entries individually

// Close Redis connection
jedis.close();

System.out.println("Connection closed");

Connection closed


## Summary

The `EmbeddingsCache` provides an efficient way to store and retrieve embeddings with their associated text. Key features include:

- Simple API for storing and retrieving individual embeddings (`set`/`get`)
- Batch operations for working with multiple embeddings efficiently (`mset`/`mget`/`mexists`/`mdrop`)
- Configurable time-to-live (TTL) for cache entries
- Integration with RedisVL vectorizers for automatic caching
- Significant performance improvements (up to 10x faster with caching)

By using the `EmbeddingsCache`, you can reduce computational costs and improve the performance of applications that rely on embeddings.