<a href="https://colab.research.google.com/github/sugarforever/LangChain-Tutorials/blob/main/LangChain_Caching.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Understanding LangChain Caching

In this notebook, we will see:
1. How LangChain framework uses caching mechanism to improve LLM interaction efficiency.
2. The caching algorithms of 2 different underlying storages, In-Memory and SQLite.

Hope it will help you understand if and when you should use CACHE.

In [1]:
!pip install langchain openai --quiet --upgrade

In [2]:
import os
os.environ['OPENAI_API_KEY'] = 'your openai api key'

## Get your ChatOpenAI instance ready

In [3]:
import langchain
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI()

## 1. In Memory Cache

In [None]:
from langchain.cache import InMemoryCache
langchain.llm_cache = InMemoryCache()

### Ask a question and measure how long it takes for LLM to respond.

In [None]:
%%time

llm.predict("What is OpenAI?")

CPU times: user 25 ms, sys: 6.4 ms, total: 31.4 ms
Wall time: 4.54 s


"OpenAI is an artificial intelligence research laboratory and company that aims to ensure that artificial general intelligence (AGI) benefits all of humanity. It was founded in December 2015 as a non-profit organization but later transformed into a for-profit company called OpenAI LP in 2019. OpenAI conducts research in various fields of AI, develops cutting-edge technologies, and publishes most of its AI research findings. The organization's mission is to ensure that AGI is developed safely, is aligned with human values, and is used for the benefit of all individuals."

#### How the cache stores data

**source code**: [cache.py](https://github.com/hwchase17/langchain/blob/v0.0.219/langchain/cache.py#L102)
```python
class InMemoryCache(BaseCache):
    """Cache that stores things in memory."""

    def __init__(self) -> None:
        """Initialize with empty cache."""
        self._cache: Dict[Tuple[str, str], RETURN_VAL_TYPE] = {}
```

This is the implementation of InMemoryCache.

In [None]:
# First element of the tuple
list(langchain.llm_cache._cache.keys())[0][0]

'[{"lc": 1, "type": "constructor", "id": ["langchain", "schema", "HumanMessage"], "kwargs": {"content": "What is OpenAI?"}}]'

In [None]:
# Second element of the tuple
list(langchain.llm_cache._cache.keys())[0][1]

'{"lc": 1, "type": "constructor", "id": ["langchain", "chat_models", "openai", "ChatOpenAI"], "kwargs": {"openai_api_key": {"lc": 1, "type": "secret", "id": ["OPENAI_API_KEY"]}}}---[(\'stop\', None)]'

### Ask same question again and see the quicker response.

In [None]:
%%time

llm.predict("What is OpenAI?")

## 2. SQLite as Cache

In [None]:
!rm -f .cache.db

In [None]:
from langchain.cache import SQLiteCache
langchain.llm_cache = SQLiteCache(database_path=".cache.db")

### Ask the same question twice and measure the performance difference

In [None]:
%%time

llm.predict("What is OpenAI?")

CPU times: user 4.25 ms, sys: 980 µs, total: 5.23 ms
Wall time: 4.97 ms


'OpenAI is an artificial intelligence research laboratory and company. It was founded in December 2015 with the goal of developing and promoting friendly AI for the benefit of all humanity. OpenAI conducts cutting-edge research in various areas of AI and aims to ensure that artificial general intelligence (AGI) benefits everyone and is used responsibly. They work on advancing AI technology, publishing most of their AI research, and providing public goods to help society navigate the path to AGI. OpenAI also develops and deploys AI models and systems, such as the language model GPT-3, to showcase the capabilities and potential applications of AI.'

In [None]:
%%time

llm.predict("What is OpenAI?")

CPU times: user 39.3 ms, sys: 9.16 ms, total: 48.5 ms
Wall time: 4.84 s


'OpenAI is an artificial intelligence research lab and company founded in December 2015. Its mission is to ensure that artificial general intelligence (AGI) benefits all of humanity. OpenAI conducts research to develop safe and beneficial AI technologies and also aims to promote the widespread adoption of such technologies for societal benefit. The organization has made significant contributions to the field of AI, particularly in areas such as natural language processing, reinforcement learning, and robotics. OpenAI also develops and maintains various open-source AI tools and frameworks to facilitate the development and deployment of AI applications.'

### Add some space in the sentence and ask again

In [None]:
%%time

llm.predict("What is  OpenAI?")

In [None]:
import sqlalchemy
from sqlalchemy import create_engine, text
engine = create_engine("sqlite:///.cache.db")

### **Why does the extra space cause the cache miss??**

#### How SQLite stores cache data

**source code**: [cache.py](https://github.com/hwchase17/langchain/blob/v0.0.219/langchain/cache.py#L128)
```python
class FullLLMCache(Base):  # type: ignore
    """SQLite table for full LLM Cache (all generations)."""

    __tablename__ = "full_llm_cache"
    prompt = Column(String, primary_key=True)
    llm = Column(String, primary_key=True)
    idx = Column(Integer, primary_key=True)
    response = Column(String)


class SQLAlchemyCache(BaseCache):
    """Cache that uses SQAlchemy as a backend."""

    def __init__(self, engine: Engine, cache_schema: Type[FullLLMCache] = FullLLMCache):
        """Initialize by creating all tables."""
        self.engine = engine
        self.cache_schema = cache_schema
        self.cache_schema.metadata.create_all(self.engine)
```

This is the schema of cache table `full_llm_cache`.

In [None]:
with engine.connect() as connection:

    rs = connection.exec_driver_sql('select * from full_llm_cache')
    print(rs.keys())
    for row in rs:
        print(row)

RMKeyView(['prompt', 'llm', 'idx', 'response'])
('[{"lc": 1, "type": "constructor", "id": ["langchain", "schema", "HumanMessage"], "kwargs": {"content": "What is OpenAI?"}}]', '{"lc": 1, "type": "constructor", "id": ["langchain", "chat_models", "openai", "ChatOpenAI"], "kwargs": {"openai_api_key": {"lc": 1, "type": "secret", "id": ["OPENAI_API_KEY"]}}}---[(\'stop\', None)]', 0, '{"lc": 1, "type": "constructor", "id": ["langchain", "schema", "ChatGeneration"], "kwargs": {"message": {"lc": 1, "type": "constructor", "id": ["lang ... (588 characters truncated) ... AI models and systems, such as the language model GPT-3, to showcase the capabilities and potential applications of AI.", "additional_kwargs": {}}}}}')
('[{"lc": 1, "type": "constructor", "id": ["langchain", "schema", "HumanMessage"], "kwargs": {"content": "What is  OpenAI?"}}]', '{"lc": 1, "type": "constructor", "id": ["langchain", "chat_models", "openai", "ChatOpenAI"], "kwargs": {"openai_api_key": {"lc": 1, "type": "secret", "

## Semantic Cache

Semantic cache stores prompts and responses, and evaluate hits based on semantic similarity.

In [None]:
!pip install langchain openai --quiet --upgrade

In [None]:
import os
os.environ['OPENAI_API_KEY'] = 'your openai api key'

### Follow [Redis official doc](https://redis.com/blog/running-redis-on-google-colab/) to install and start redis server on google colab.

In [None]:
!curl -fsSL https://packages.redis.io/redis-stack/redis-stack-server-6.2.6-v7.focal.x86_64.tar.gz -o redis-stack-server.tar.gz
!tar -xvf redis-stack-server.tar.gz
!pip install redis

!./redis-stack-server-6.2.6-v7/bin/redis-stack-server --daemonize yes

./
./redis-stack-server-6.2.6-v7/
./redis-stack-server-6.2.6-v7/bin/
./redis-stack-server-6.2.6-v7/bin/redis-benchmark
./redis-stack-server-6.2.6-v7/bin/redis-cli
./redis-stack-server-6.2.6-v7/bin/redis-sentinel
./redis-stack-server-6.2.6-v7/bin/redis-stack-server
./redis-stack-server-6.2.6-v7/bin/redis-check-rdb
./redis-stack-server-6.2.6-v7/bin/redis-check-aof
./redis-stack-server-6.2.6-v7/bin/redis-server
./redis-stack-server-6.2.6-v7/share/
./redis-stack-server-6.2.6-v7/share/RSAL_LICENSE
./redis-stack-server-6.2.6-v7/share/APACHE_LICENSE
./redis-stack-server-6.2.6-v7/lib/
./redis-stack-server-6.2.6-v7/lib/redisgraph.so
./redis-stack-server-6.2.6-v7/lib/redistimeseries.so
./redis-stack-server-6.2.6-v7/lib/rejson.so
./redis-stack-server-6.2.6-v7/lib/redisbloom.so
./redis-stack-server-6.2.6-v7/lib/redisearch.so
./redis-stack-server-6.2.6-v7/etc/
./redis-stack-server-6.2.6-v7/etc/README
./redis-stack-server-6.2.6-v7/etc/redis-stack.conf
./redis-stack-server-6.2.6-v7/etc/redis-stack-se

In [12]:
import langchain
from langchain.llms import OpenAI

# To make the caching really obvious, lets use a slower model.
llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2)

### Initialize the Redis semantic cache with default score threshold 0.2

In [None]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.cache import RedisSemanticCache


langchain.llm_cache = RedisSemanticCache(redis_url="redis://localhost:6379", embedding=OpenAIEmbeddings(), score_threshold=0.2)

In [None]:
%%time

llm("Please translate 'this is Monday' into Chinese")

CPU times: user 74.4 ms, sys: 7.11 ms, total: 81.5 ms
Wall time: 2.19 s


'\n\n这是周一'

Notice that, the query below is 1 word different from the previous one. Cache got similarily hit.

In [None]:
%%time

llm("Please translate 'this is Tuesday' into Chinese")

CPU times: user 6.35 ms, sys: 0 ns, total: 6.35 ms
Wall time: 211 ms


'\n\n这是周一'

In [None]:
%%time

llm("Tell me a joke")

CPU times: user 34.2 ms, sys: 2.85 ms, total: 37 ms
Wall time: 3.88 s


'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

In [None]:
%%time

llm("Tell me 2 jokes")

CPU times: user 7.27 ms, sys: 0 ns, total: 7.27 ms
Wall time: 247 ms


'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

### Initialize the Redis semantic cache with default score threshold 0.05

In [None]:
langchain.llm_cache = RedisSemanticCache(redis_url="redis://localhost:6379", embedding=OpenAIEmbeddings(), score_threshold=0.05)

In [None]:
%%time

llm("Give me a peach")

CPU times: user 22.8 ms, sys: 61 µs, total: 22.9 ms
Wall time: 1.49 s


'\n\nA peach is a smooth, round fruit with a soft, velvety skin. The flesh is sweet and juicy, with a hint of acidity. Peaches are a good source of vitamins A and C, as well as fiber.'

In [None]:
%%time

llm("Give me 2 peaches")

CPU times: user 16.9 ms, sys: 0 ns, total: 16.9 ms
Wall time: 939 ms


"\n\nI can't give you anything because this is not a real request."

### Deep dive into Redis semantic cache

#### Find the keys in the cache

In [None]:
langchain.llm_cache._cache_dict

{'cache:bf6f6d9ebdf492e28cb8bf4878a4b951': <langchain.vectorstores.redis.Redis at 0x7fed7bd13310>}

#### Manually execute similarity search to fetch the similar documents with scores

You should expect that the more similar the document is, the smaller the score will be.

In [None]:
langchain.llm_cache._cache_dict['cache:bf6f6d9ebdf492e28cb8bf4878a4b951'].similarity_search_with_score(query='Give me 2 peaches')

[(Document(page_content='Give me 2 peaches', metadata={'llm_string': "[('_type', 'openai'), ('best_of', 2), ('frequency_penalty', 0), ('logit_bias', {}), ('max_tokens', 256), ('model_name', 'text-davinci-002'), ('n', 2), ('presence_penalty', 0), ('request_timeout', None), ('stop', None), ('temperature', 0.7), ('top_p', 1)]", 'prompt': 'Give me 2 peaches', 'return_val': ["\n\nI can't give you anything because this is not a real request.", "\n\nI can't do that."]}),
  2.38418579102e-07),
 (Document(page_content='Give me a peach', metadata={'llm_string': "[('_type', 'openai'), ('best_of', 2), ('frequency_penalty', 0), ('logit_bias', {}), ('max_tokens', 256), ('model_name', 'text-davinci-002'), ('n', 2), ('presence_penalty', 0), ('request_timeout', None), ('stop', None), ('temperature', 0.7), ('top_p', 1)]", 'prompt': 'Give me a peach', 'return_val': ['\n\nA peach is a smooth, round fruit with a soft, velvety skin. The flesh is sweet and juicy, with a hint of acidity. Peaches are a good so

### Conclusion

The score threshold is the key factor in using Redis semantic cache for similarity cache.

## Semantic Cache with GPTCache

### What is GPTCache?

An open source project dedicated to building a semantic cache for storing LLM responses.

Two use cases:
1. Exact match
2. Similar match

GPTCache addressed the following questions:
1. How to generate embeddings for the queries? (via embedding function)
2. How to cache the data? (via cache store of data manager, such as SQLite, MySQL, and PostgreSQL. More NoSQL databases will be added in the future)
3. How to store and search vector embeddings? (via vector store of data manager, such as FAISS or vector databases such as Milvus. More vector databases and cloud services will be added in the future.)
4. How to determine eviction policy? (LRU or FIFO)
5. How to determine cache hit or miss? (via evaluation function)

Please refer to the following Cache class definition for better understanding of how above questions are addressed:

```python
class Cache:
   def init(self,
            cache_enable_func=cache_all,
            pre_embedding_func=last_content,
            embedding_func=string_embedding,
            data_manager: DataManager = get_data_manager(),
            similarity_evaluation=ExactMatchEvaluation(),
            post_process_messages_func=first,
            config=Config(),
            next_cache=None,
            **kwargs
            ):
       self.has_init = True
       self.cache_enable_func = cache_enable_func
       self.pre_embedding_func = pre_embedding_func
       self.embedding_func = embedding_func
       self.data_manager: DataManager = data_manager
       self.similarity_evaluation = similarity_evaluation
       self.post_process_messages_func = post_process_messages_func
       self.data_manager.init(**kwargs)
       self.config = config
       self.next_cache = next_cache
```

In [5]:
!pip install gptcache --quiet

In [6]:
import langchain
from langchain.llms import OpenAI

llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2)

### Exact Match

In [7]:
from gptcache import Cache
from gptcache.manager.factory import manager_factory
from gptcache.processor.pre import get_prompt
from gptcache.adapter.api import init_similar_cache
from langchain.cache import GPTCache
import hashlib

def get_hashed_name(name):
    return hashlib.sha256(name.encode()).hexdigest()


def init_gptcache(cache_obj: Cache, llm: str):
    hashed_llm = get_hashed_name(llm)
    cache_obj.init(
        pre_embedding_func=get_prompt,
        data_manager=manager_factory(manager="map", data_dir=f"map_cache_{hashed_llm}"),
    )


langchain.llm_cache = GPTCache(init_gptcache)

In [11]:
question = "What is cache eviction policy?"

In [12]:
%%time

llm(question)

CPU times: user 10.6 ms, sys: 3.74 ms, total: 14.3 ms
Wall time: 1.69 s


'\n\nA cache eviction policy is a strategy for managing the contents of a cache. When a cache becomes full, the policy determines which items will be removed to make room for new items.'

In [13]:
%%time

llm(question)

CPU times: user 627 µs, sys: 0 ns, total: 627 µs
Wall time: 634 µs


'\n\nA cache eviction policy is a strategy for managing the contents of a cache. When a cache becomes full, the policy determines which items will be removed to make room for new items.'

In [14]:
%%time

llm("What is cache eviction   policy?")

CPU times: user 10.7 ms, sys: 87 µs, total: 10.8 ms
Wall time: 966 ms


'\n\nThere are several cache eviction policies, but the two most common are least recently used (LRU) and first in, first out (FIFO).'

### Similar Match

In [15]:
from gptcache import Cache
from gptcache.adapter.api import init_similar_cache
from langchain.cache import GPTCache
import hashlib


def get_hashed_name(name):
    return hashlib.sha256(name.encode()).hexdigest()


def init_gptcache(cache_obj: Cache, llm: str):
    hashed_llm = get_hashed_name(llm)
    init_similar_cache(cache_obj=cache_obj, data_dir=f"similar_cache_{hashed_llm}")


langchain.llm_cache = GPTCache(init_gptcache)

In [16]:
%%time

llm(question)

CPU times: user 1.94 s, sys: 313 ms, total: 2.26 s
Wall time: 2.49 s


'\n\nA cache eviction policy is a set of rules that determine when and how often cached data is removed from the cache.'

In [17]:
%%time

llm(question)

CPU times: user 1.21 s, sys: 342 µs, total: 1.21 s
Wall time: 636 ms


'\n\nA cache eviction policy is a set of rules that determine when and how often cached data is removed from the cache.'

In [18]:
%%time

llm("What is cache eviction   policy?")

CPU times: user 1.14 s, sys: 0 ns, total: 1.14 s
Wall time: 1.01 s


'\n\nA cache eviction policy is a set of rules that determine when and how often cached data is removed from the cache.'

In [19]:
%%time

llm("Give me a peach")

CPU times: user 2.52 s, sys: 284 ms, total: 2.8 s
Wall time: 5.54 s


'\n\nA peach is a fruit that is typically round or oval in shape and has a soft, fuzzy outer skin. The flesh of a peach is usually yellow or white and is sweet and juicy.'

In [20]:
%%time

llm("Give me 2 peaches")

CPU times: user 1.18 s, sys: 22.6 ms, total: 1.2 s
Wall time: 645 ms


'\n\nA peach is a fruit that is typically round or oval in shape and has a soft, fuzzy outer skin. The flesh of a peach is usually yellow or white and is sweet and juicy.'