# Semantic Caching with LangCache

LangCache is Redis' new managed service for semantic caching. Semantic caching is an intelligent caching strategy that stores and retrieves responses based on the meaning of queries rather than exact text matches. Unlike traditional caching that requires identical strings, semantic caching can return cached responses for questions that are semantically similar, even when phrased differently.

## Semantic Caching vs. Traditional Caching vs. LLM Re-generation

**Traditional caching** stores responses using exact query strings as keys:
- **Fast retrieval** for identical queries
- **Cache misses** for any variation in phrasing, even minor differences
- **Low cache hit rates** in conversational applications where users rarely phrase questions identically

**LLM re-generation** involves calling the language model for every query:
- **Flexible** handling of any question variation
- **High API costs** and latency for repeated similar questions

**Semantic caching** uses vector similarity to match queries with cached responses:
- **High cache hit rates** by matching semantically similar questions
- **Cost reduction** by avoiding redundant LLM calls for similar queries
- **Fast retrieval** through vector similarity search

In this notebook, we'll implement semantic caching using LangCache to demonstrate how semantic similarity can improve cache hit rates compared to exact string matching.

## Getting Started with LangCache

### Creating a Redis Database on Redis Cloud

- Go to https//cloud.redis.io and create a new account if you don't have one yet.
- Once logged in, click on the plus sign next to "Databases" on the sidebar:

<img src="./readme-assets/2_1_sidebar.png" alt="" width="200">

- Select the free option:

<img src="./readme-assets/2_2_tier_selection.png" alt="" width="200">

- Select `AWS` as the vendor and `us-east-1` as the region:

<img src="./readme-assets/2_3_settings.png" alt="" width="200">

- Create the database and wait for it to be available. It should take less than a minute:

<img src="./readme-assets/2_4_pending.png" alt="" width="200">

- Once the database is available. Click on LangCache on the side bar:

<img src="./readme-assets/2_5_langcache_sidebar.png" alt="" width="200">

- Then, click on Quick Create to easily get started:

<img src="./readme-assets/2_6_quick_create.png" alt="" width="200">

- Finally, copy the credential variable to use them in the next steps of this recipe:

<img src="./readme-assets/2_7_connectivity.png" alt="" width="600">

## Installing Dependencies

In [6]:
%use ktor-client
%use serialization
%use coroutines

import io.ktor.client.HttpClient
import io.ktor.client.engine.cio.CIO
import io.ktor.client.plugins.contentnegotiation.ContentNegotiation
import io.ktor.serialization.kotlinx.json.json

val client = HttpClient(CIO) {
    install(ContentNegotiation) {
        json(Json { ignoreUnknownKeys = true })
    }
}

## Configuring LangCache

### Access Configuration

In [7]:
val apiKey = System.getenv("LANG_CACHE_API_KEY")
val cacheId = "28e9625f77be4186b295ef6d3577c6d0"
val baseUrl = "https://aws-us-east-1.langcache.redis.io/v1/caches/$cacheId"

### Modeling the requests and responses from the API

In [8]:
@Serializable
data class CacheEntryRequest(
    val prompt: String,
    val response: String? = null
)

@Serializable
data class CacheEntryResponse(
    val entryId: String
)

@Serializable
data class CacheEntry(
    val id: String,
    val prompt: String,
    val response: String,
    val attributes: Map<String, String> = emptyMap(),
    val similarity: Double? = null,
    @SerialName("search_strategy")
    val searchStrategy: String? = null
)

@Serializable
data class SearchResponse(
    val data: List<CacheEntry>
)

### Storing in LangCache

In [4]:
import io.ktor.client.call.*
import io.ktor.client.request.*
import io.ktor.http.*

runBlocking {
    val saveResponse: CacheEntryResponse = client.post("$baseUrl/entries") {
        header("Authorization", "Bearer $apiKey")
        contentType(ContentType.Application.Json)
        setBody(CacheEntryRequest(
            prompt = "How does semantic caching work?",
            response = "Semantic caching stores and retrieves data based on meaning, not exact matches."
        ))
    }.body()

    println("Save response: $saveResponse")
}

Save response: CacheEntryResponse(entryId=fda1b671e21b06a0a957c04b1692ab90)


### Retrieving from LangCache

In [5]:
import io.ktor.client.call.*
import io.ktor.client.request.*
import io.ktor.http.*

runBlocking {
        val searchResponse: SearchResponse = client.post("$baseUrl/entries/search") {
            header("Authorization", "Bearer $apiKey")
            contentType(ContentType.Application.Json)
            setBody(CacheEntryRequest(prompt = "What is semantic caching?"))
        }.body()

    println("Search response: $searchResponse")
}

Search response: SearchResponse(data=[CacheEntry(id=fda1b671e21b06a0a957c04b1692ab90, prompt=How does semantic caching work?, response=Semantic caching stores and retrieves data based on meaning, not exact matches., attributes={}, similarity=0.9292393, searchStrategy=null)])
