# Semantic Caching

Semantic caching is an intelligent caching strategy that stores and retrieves responses based on the meaning of queries rather than exact text matches. Unlike traditional caching that requires identical strings, semantic caching can return cached responses for questions that are semantically similar, even when phrased differently.

## Semantic Caching vs. Traditional Caching vs. LLM Re-generation

**Traditional caching** stores responses using exact query strings as keys:
- **Fast retrieval** for identical queries
- **Cache misses** for any variation in phrasing, even minor differences
- **Low cache hit rates** in conversational applications where users rarely phrase questions identically

**LLM re-generation** involves calling the language model for every query:
- **Flexible** handling of any question variation
- **High API costs** and latency for repeated similar questions

**Semantic caching** uses vector similarity to match queries with cached responses:
- **High cache hit rates** by matching semantically similar questions
- **Cost reduction** by avoiding redundant LLM calls for similar queries
- **Fast retrieval** through vector similarity search

In this notebook, we'll implement semantic caching using RedisVL with pre-generated FAQs about a Chevrolet Colorado vehicle brochure, demonstrating how semantic similarity can dramatically improve cache hit rates compared to exact string matching.

## Installing Dependencies

In [19]:
%use ktor-client
%use serialization
%use coroutines

import io.ktor.client.HttpClient
import io.ktor.client.engine.cio.CIO
import io.ktor.client.plugins.contentnegotiation.ContentNegotiation
import io.ktor.serialization.kotlinx.json.json

val client = HttpClient(CIO) {
    install(ContentNegotiation) {
        json(Json { ignoreUnknownKeys = true })
    }
}

## Configuring LangCache

### Access Configuration

In [1]:
val apiKey = System.getenv("LANG_CACHE_API_KEY")
val cacheId = "28e9625f77be4186b295ef6d3577c6d0"
val baseUrl = "https://aws-us-east-1.langcache.redis.io/v1/caches/$cacheId"

### Modeling the requests and responses from the API

In [15]:
@Serializable
data class CacheEntryRequest(
    val prompt: String,
    val response: String? = null
)

@Serializable
data class CacheEntryResponse(
    val entryId: String
)

@Serializable
data class CacheEntry(
    val id: String,
    val prompt: String,
    val response: String,
    val attributes: Map<String, String> = emptyMap(),
    val similarity: Double? = null,
    @SerialName("search_strategy")
    val searchStrategy: String? = null
)

@Serializable
data class SearchResponse(
    val data: List<CacheEntry>
)

### Storing in LangCache

In [16]:
import io.ktor.client.call.*
import io.ktor.client.request.*
import io.ktor.http.*

runBlocking {
    val saveResponse: CacheEntryResponse = client.post("$baseUrl/entries") {
        header("Authorization", "Bearer $apiKey")
        contentType(ContentType.Application.Json)
        setBody(CacheEntryRequest(
            prompt = "How does semantic caching work?",
            response = "Semantic caching stores and retrieves data based on meaning, not exact matches."
        ))
    }.body()

    println("Save response: $saveResponse")
}

Save response: CacheEntryResponse(entryId=fda1b671e21b06a0a957c04b1692ab90)


### Retrieving from LangCache

In [17]:
import io.ktor.client.call.*
import io.ktor.client.request.*
import io.ktor.http.*

runBlocking {
        val searchResponse: SearchResponse = client.post("$baseUrl/entries/search") {
            header("Authorization", "Bearer $apiKey")
            contentType(ContentType.Application.Json)
            setBody(CacheEntryRequest(prompt = "What is semantic caching?"))
        }.body()

    println("Search response: $searchResponse")
}

Search response: SearchResponse(data=[CacheEntry(id=fda1b671e21b06a0a957c04b1692ab90, prompt=How does semantic caching work?, response=Semantic caching stores and retrieves data based on meaning, not exact matches., attributes={}, similarity=0.9292393, searchStrategy=null)])
