# Semantic Caching

Semantic caching is an intelligent caching strategy that stores and retrieves responses based on the meaning of queries rather than exact text matches. Unlike traditional caching that requires identical strings, semantic caching can return cached responses for questions that are semantically similar, even when phrased differently.

## Semantic Caching vs. Traditional Caching vs. LLM Re-generation

**Traditional caching** stores responses using exact query strings as keys:
- **Fast retrieval** for identical queries
- **Cache misses** for any variation in phrasing, even minor differences
- **Low cache hit rates** in conversational applications where users rarely phrase questions identically

**LLM re-generation** involves calling the language model for every query:
- **Flexible** handling of any question variation
- **High API costs** and latency for repeated similar questions

**Semantic caching** uses vector similarity to match queries with cached responses:
- **High cache hit rates** by matching semantically similar questions
- **Cost reduction** by avoiding redundant LLM calls for similar queries
- **Fast retrieval** through vector similarity search

In this notebook, we'll implement semantic caching using RedisVL with pre-generated FAQs about a Chevrolet Colorado vehicle brochure, demonstrating how semantic similarity can dramatically improve cache hit rates compared to exact string matching.

## Running Redis

There are several options one can follow to have a running instance of Redis. For the sake of simplicity, in this notebook, we will run it in a Docker container.

For production where high-availability and reliability is a concern, we recommend using [Redis Cloud](https://cloud.redis.io/).

A free database can be spun up in Redis Cloud.

### Running Redis in a Docker Container using TestContainers

**Docker containers** are lightweight, portable environments that package an application and all its dependencies so it runs consistently across different systems. **Testcontainers** is a library that lets us run lightweight, disposable Docker containers for integration testing, so you can test against real services like databases or message queues without complex setup.

Make sure you have Docker installed: [install Docker](https://www.docker.com/get-started/).

#### Installing dependencies

In [1]:
@file:DependsOn("org.testcontainers:testcontainers:2.0.2")

#### Configuring a generic Redis Container

In [2]:
import org.testcontainers.containers.GenericContainer
import org.testcontainers.utility.DockerImageName

class RedisContainer : GenericContainer<RedisContainer>(DockerImageName.parse("redis:latest")) {
    init {
        withExposedPorts(6379)
    }
}

#### Creating a Docker network

This is necessary because later on this notebook we will spin up a Redis Insight container that needs to be in the same network.

In [3]:
import org.testcontainers.containers.Network

val network = Network.newNetwork()
val networkAlias = "redis-network"

#### Start a Redis Container

In [4]:
val networkAlias = "redis"
val redis = RedisContainer().withNetwork(network).withNetworkAliases(networkAlias)
redis.start()

val host = redis.host
val port = redis.getMappedPort(6379)
println("Redis 8 started at $host:$port")

Redis 8 started at localhost:54215


## Implementing our Semantic Cache

### Installing dependencies

As mentioned in the beginning, we will use RedisVL's semantic routing abstraction to implement our semantic cache. Therefore, we will need to add RedisVL as a dependency.

In [5]:
@file:DependsOn("com.redis:redisvl:0.0.1")
%use serialization

### Loading Pre-Generated FAQs

For this semantic caching demonstration, we'll use pre-generated frequently asked questions (FAQs) about a Chevrolet Colorado vehicle brochure. These FAQs were created by processing the vehicle documentation and extracting question-answer pairs using an LLM.


In [6]:
import java.io.File

val jsonText = File("./resources/3_colorado_faqs.json").readText(Charsets.UTF_8)
val jsonArray = Json.parseToJsonElement(jsonText).jsonArray

println("Loaded ${jsonArray.size} FAQs from file")

Loaded 346 FAQs from file


### Setting up the Text Vectorizer

The vectorizer is responsible for converting text into numerical vector representations that capture semantic meaning. RedisVL provides several vectorizer options such as OpenAI and VertexAI. We're using the HuggingFace Text Vectorizer for this example.

In [7]:
import com.redis.vl.utils.vectorize.SentenceTransformersVectorizer

val vectorizer = SentenceTransformersVectorizer("Xenova/all-MiniLM-L6-v2")

val embedding = vectorizer.embed("What is the capital city of Italy?")

println(embedding.joinToString())

-0.009056281, 0.09096523, -0.051762886, 0.08848378, -0.12719342, -0.0703391, 0.029510844, 0.013291523, -0.057980966, -0.014017097, 0.03739981, -0.13108169, 0.0018671635, 0.03550265, -0.055068597, -0.04273072, 0.0480743, 0.035149302, 0.051385034, 0.008154835, 0.02939507, -0.02790439, 0.04798433, 0.012633902, 0.050369605, 0.03730664, -0.016114296, 0.016826835, -0.05483934, -0.04307148, -0.014681098, 0.0032649112, 0.10389013, -0.085853584, 0.016533818, 0.017277544, -0.012875621, -0.008417194, 0.106101766, -3.3647308E-4, 0.03838455, -0.007070606, 0.064803414, 0.04349774, 0.027908528, -0.004982669, 0.05417708, 0.08491659, 0.01753072, -0.04387867, -0.0089426385, -0.029429087, -0.04308129, -0.0137046715, -0.049384452, 0.079110876, 0.0159977, -0.023842642, 0.010396142, -0.017871607, -0.02013254, -0.029775942, -0.057334274, 0.079562895, 0.017678022, 0.046195857, -0.025770709, -0.052720636, -0.07104178, -0.016904766, 0.005821192, -0.04959368, 0.012194841, -0.06851538, 0.024740597, -0.06589627, -

### Creating the SemanticCache


In [8]:
import com.redis.vl.extensions.cache.SemanticCache
import redis.clients.jedis.HostAndPort
import redis.clients.jedis.UnifiedJedis

val jedis = UnifiedJedis(HostAndPort(host, port))

// Initialize the semantic cache with Redis connection
val cache = SemanticCache.Builder()
    .name("llmcache")
    .distanceThreshold(0.2F)
    .ttl(360)
    .redisClient(jedis)
    .vectorizer(vectorizer)
    .build()

### Storing FAQs in the Semantic Cache

In [9]:
jsonArray.forEachIndexed { i, el ->
    val obj = el.jsonObject
    val prompt = obj["prompt"]?.jsonPrimitive?.content.orEmpty()
    val response = obj["response"]?.jsonPrimitive?.content.orEmpty()
    cache.store(prompt, response)
}

### Testing the Semantic Cache

In [10]:
val cacheHit = cache.check("What models of chevy colorado are available?").get()
println("Prompt: ${cacheHit.prompt}")
println("Response: ${cacheHit.response}")
println("Distance: ${cacheHit.distance}")

Prompt: What are the available models of the Colorado?
Response: The available models of the Colorado are WT, LT, Z71, and ZR2.
Distance: 0.18383932


In [11]:
val cacheHit = cache.check("What entertainment system comes with the car?").get()
println("Prompt: ${cacheHit.prompt}")
println("Response: ${cacheHit.response}")
println("Distance: ${cacheHit.distance}")

Prompt: What entertainment system is included in the vehicle?
Response: The vehicle includes the Chevrolet Infotainment 3 system with an 8-inch diagonal color touch-screen.
Distance: 0.09986466


In [12]:
cache.check("Does the car drive on the water?")

Optional.empty

## Redis Insight

Redis Insight is a visual tool that helps you explore, monitor, and optimize your Redis data and performance through an easy-to-use interface.

It can be downloaded and run locally in your machine or be run in a Docker container. To make this recipe self-contained and straightforward, we're going to run it in a Docker container using Test Containers.

### Configuring a generic Redis Insight Container

In [13]:
import org.testcontainers.containers.GenericContainer
import org.testcontainers.containers.wait.strategy.Wait
import org.testcontainers.utility.DockerImageName

class RedisInsightContainer : GenericContainer<RedisInsightContainer>(
    DockerImageName.parse("redis/redisinsight:latest") // or latest stable version
) {
    init {
        withExposedPorts(5540)
        withEnv("RI_REDIS_HOST", "redis")
        withEnv("RI_REDIS_PORT", "6379") // Since this will run in the same Docker network, we don't need to set the mapped port for the Redis Server
        withEnv("RI_REDIS_ALIAS", "Local Redis")
        withEnv("RI_REDIS_USERNAME", "default")
        withEnv("RI_REDIS_PASSWORD", "")
        withEnv("RI_REDIS_TLS", "FALSE")

        waitingFor(Wait.forHttp("/").forPort(5540))
    }

    fun getUiUrl(): String = "http://${host}:${getMappedPort(5540)}"
}

### Starting the Redis Insight container

In [14]:
val redisInsight = RedisInsightContainer().withNetwork(network)
redisInsight.start()

println("RedisInsight UI: ${redisInsight.getUiUrl()}")

RedisInsight UI: http://localhost:54223


## Spinning down Docker containers

Finally, once we're done, let's clean up all the resources we created for our recipe:

In [15]:
redis.stop()
redisInsight.stop()
network.close()