# Semantic Caching with LangCache

Semantic caching is an intelligent caching strategy that stores and retrieves responses based on the meaning of queries rather than exact text matches. Unlike traditional caching that requires identical strings, semantic caching can return cached responses for questions that are semantically similar, even when phrased differently.

## Semantic Caching vs. Traditional Caching vs. LLM Re-generation

**Traditional caching** stores responses using exact query strings as keys:
- **Fast retrieval** for identical queries
- **Cache misses** for any variation in phrasing, even minor differences
- **Low cache hit rates** in conversational applications where users rarely phrase questions identically

**LLM re-generation** involves calling the language model for every query:
- **Flexible** handling of any question variation
- **High API costs** and latency for repeated similar questions

**Semantic caching** uses vector similarity to match queries with cached responses:
- **High cache hit rates** by matching semantically similar questions
- **Cost reduction** by avoiding redundant LLM calls for similar queries
- **Fast retrieval** through vector similarity search

In this notebook, we'll implement semantic caching using RedisVL with pre-generated FAQs about a Chevrolet Colorado vehicle brochure, demonstrating how semantic similarity can dramatically improve cache hit rates compared to exact string matching.

## Installing Dependencies

This semantic caching implementation with LangCache requires one Python library that integrates with LangCache managed service. LangCache managed service, in turn, will be responsible for generating the embeddings and storing & retrieving entries from the cache.

In [1]:
import os
%pip install langcache

Note: you may need to restart the kernel to use updated packages.


## Configuring LangCache SDK

In order to connect to LangCache, we will need to configure its client with server URL, the Cache ID, and the API key. All of those can be found in the configuration of LangCache in https://cloud.redis.io

In [7]:
from langcache import LangCache
import os

api_key = os.getenv("LANG_CACHE_API_KEY")

lang_cache = LangCache(
    server_url="https://gcp-us-east4.langcache.redis.io",
    cache_id="30b5d6f3fafb40d69b18b255e0a7b3c3",
    api_key=api_key,
)

## Adding an entry to the cache

Adding an entry to the cache is as easy as calling a single function that receives the prompt and the response as parameters.

In [8]:
save_response = lang_cache.set(
    prompt="How does semantic caching work?",
    response="Semantic caching stores and retrieves data based on meaning, not exact matches."
)
print("Save entry response:", save_response)

Save entry response: entry_id='fda1b671e21b06a0a957c04b1692ab90'


## Retrieving entries from the cache

Retrieving entries is equally as simple. We just need to invoke the search function passing the prompt as parameter.

In [9]:
search_response = lang_cache.search(
    prompt="What is semantic caching?"
)
print("Search entry response:", search_response)

Search entry response: data=[CacheEntry(id='13a51eaa233436e3b6910fe1542f0c22', prompt='What is semantic caching?', response='Semantic caching stores and retrieves data based on meaning, not exact matches.', attributes={}, similarity=1.0, search_strategy=<SearchStrategy.SEMANTIC: 'semantic'>), CacheEntry(id='fda1b671e21b06a0a957c04b1692ab90', prompt='How does semantic caching work?', response='Semantic caching stores and retrieves data based on meaning, not exact matches.', attributes={}, similarity=0.9291304, search_strategy=<SearchStrategy.SEMANTIC: 'semantic'>)]


When using the langcache embedding model trained by Redis, we can also avoid false negatives. In this case, we cached "What is semantic caching?", but now we're checking in the cache for the opposite by just adding a new word: 'not' - Most embedding models would struggle with this and find both sentences to still be very similar. But Redis' trained model excels at such cases.

In [10]:
search_response = lang_cache.search(
    prompt="What is not semantic caching?"
)
print("Search entry response:", search_response)

Search entry response: data=[]


Finally, we can also check something that hasn't been cached at all, expecting nothing to be returned.'

In [11]:
search_response = lang_cache.search(
    prompt="What's the fastest commercial plane available?"
)
print("Search entry response:", search_response)

Search entry response: data=[]
