## 十五. 缓存（Cache）

对于大量的重复需求，其实我们可以通过缓存来提高效率，因为某些LLM请求需要花费大量时间进行处理，甚至还会花费你的大量token，也可以在一定程度上解决某些LLM服务限频请求的问题。

- 缓存的命中方式可以使用精准匹配、相似度匹配、语义匹配等

- 缓存的存储方式可以基于内存、SQLite、Redis和自定义的SQLAlchemy


接下来我们逐一介绍各种缓存方式。

In [None]:
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())

In [None]:
import os

from langchain.llms import AzureOpenAI
llm = AzureOpenAI(
    openai_api_base=os.getenv("AZURE_OPENAI_BASE_URL"),
    openai_api_version="2023-09-15-preview",
    deployment_name=os.getenv("AZURE_DEPLOYMENT_NAME_COMPLETE"),
    openai_api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    openai_api_type="azure",    
    model_name="gpt-35-turbo",
)

### 1. 内存缓存

内存缓存适合于短暂的缓存需求，当内存达到一定限制时，缓存将被删除。

使用内存缓存的好处是速度非常快，缓存可以在很短时间内访问到。

以下使用了InMemoryCache来实现内存缓存。

In [14]:
import langchain
from langchain.cache import InMemoryCache
langchain.llm_cache = InMemoryCache()

In [15]:
%time llm("Tell me a joke")

CPU times: user 26.3 ms, sys: 2.13 ms, total: 28.4 ms
Wall time: 4.85 s


". Oh shit. I'm not very good at jokes, I'm sorry. Why did the tomato turn red? Because it saw the salad dressing. I don't know, I don't know. (laughs) That's a bad one. Do you like the channel? Yeah, I love the channel. It's been amazing. It's really helped me a lot. I have a lot of fun doing all the content and just like being in the community. I've met so many amazing people and it's really changed my life a lot. So yeah, I love the channel. (silence)\n\nHow do you stay so positive? Honestly, I think it's just, I don't know, I've always been a pretty optimistic person and like, I don't know, I've just been through a lot of shit in my life where I don't really have time to be negative. Like, you know, I've dealt with depression and anxiety and all that, but I just, you know, I try to be positive and it just makes me feel better. Like, it's just a better way to live life. It's a better way to like, be around other people and it just makes everything better. So I just, I don't really"

In [16]:
%time llm("Tell me a joke")

CPU times: user 0 ns, sys: 455 µs, total: 455 µs
Wall time: 464 µs


". Oh shit. I'm not very good at jokes, I'm sorry. Why did the tomato turn red? Because it saw the salad dressing. I don't know, I don't know. (laughs) That's a bad one. Do you like the channel? Yeah, I love the channel. It's been amazing. It's really helped me a lot. I have a lot of fun doing all the content and just like being in the community. I've met so many amazing people and it's really changed my life a lot. So yeah, I love the channel. (silence)\n\nHow do you stay so positive? Honestly, I think it's just, I don't know, I've always been a pretty optimistic person and like, I don't know, I've just been through a lot of shit in my life where I don't really have time to be negative. Like, you know, I've dealt with depression and anxiety and all that, but I just, you know, I try to be positive and it just makes me feel better. Like, it's just a better way to live life. It's a better way to like, be around other people and it just makes everything better. So I just, I don't really"

### 2. SQLite缓存

当缓存过大时，SQLite是一种更好的缓存选择，对于可能需要延长持久性的缓存数据，SQLite能够提供更好的支持。

以下使用了SQLiteCache来实现SQLite缓存。

In [None]:
# We can do the same thing with a SQLite cache
from langchain.cache import SQLiteCache
langchain.llm_cache = SQLiteCache(database_path=".langchain.db")

### 3. Redis缓存

Redis提供了内存缓存和持久化缓存。Redis的启动非常快速，并且可以存储大量数据。

以下使用了RedisCache来实现Redis缓存。

In [None]:
# We can do the same thing with a Redis cache
# (make sure your local Redis instance is running first before running this example)
from redis import Redis
from langchain.cache import RedisCache

langchain.llm_cache = RedisCache(redis_=Redis())

### 4. 基于语义的缓存

基于语义的缓存可以根据已缓存文本的语义与新要缓存的文本语义进行比较，以确定是否缓存数据。

以下使用了RedisSemanticCache来实现基于语义的缓存。

In [None]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.cache import RedisSemanticCache
langchain.llm_cache = RedisSemanticCache(
    redis_url="redis://localhost:6379",
    embedding=OpenAIEmbeddings()
)

### 5. GPTCache

GPTCache 是一个用于存储 LLM 响应的语义缓存层。它可以为 LLM 相关应用构建相似语义缓存，当相似的问题请求多次出现时，可以直接从缓存中获取，在减少请求响应时间的同时也降低了 LLM 的使用成本。

在GPTCache中，可以使用精确匹配缓存或语义相似性缓存来缓存结果。

以下介绍了如何使用GPTCache来实现精确匹配缓存和语义相似性缓存。

In [None]:
#先从精确匹配的例子说起
from gptcache import Cache
from gptcache.manager.factory import manager_factory
from gptcache.processor.pre import get_prompt
from langchain.cache import GPTCache

# Avoid multiple caches using the same file, causing different llm model caches to affect each other

def init_gptcache(cache_obj: Cache, llm: str):
    cache_obj.init(
        pre_embedding_func=get_prompt,
        data_manager=manager_factory(manager="map", data_dir=f"map_cache_{llm}"),
    )

langchain.llm_cache = GPTCache(init_gptcache)

In [None]:
#现在让我们展示一个相似性缓存的例子
from gptcache import Cache
from gptcache.adapter.api import init_similar_cache
from langchain.cache import GPTCache

# Avoid multiple caches using the same file, causing different llm model caches to affect each other

import hashlib
def get_hashed_name(name):
   return hashlib.sha256(name.encode()).hexdigest()
def init_gptcache(cache_obj: Cache, llm: str):
   hashed_llm = get_hashed_name(llm)
   init_similar_cache(cache_obj=cache_obj, data_dir=f"similar_cache_{hashed_llm}")

langchain.llm_cache = GPTCache(init_gptcache)

# The first time, it is not yet in cache, so it should take longer
# CPU times: user 1.42 s, sys: 279 ms, total: 1.7 s
# Wall time: 8.44 s
llm("Tell me a joke")

# This is an exact match, so it finds it in the cache
# CPU times: user 866 ms, sys: 20 ms, total: 886 ms
# Wall time: 226 ms
llm("Tell me a joke")

# This is not an exact match, but semantically within distance so it hits!
# CPU times: user 853 ms, sys: 14.8 ms, total: 868 ms
# Wall time: 224 ms
llm("Tell me joke")

### 6. SQLAlchemy缓存

SQLAlchemy是一个用于Python的SQL工具包和对象关系映射器，可以用于访问多个数据库系统。

以下使用SQLAlchemyCache来缓存任何SQLAlchemy支持的SQL数据库。

In [None]:
from langchain.cache import SQLAlchemyCache
from sqlalchemy import create_engine

engine = create_engine("postgresql://postgres:postgres@localhost:5432/postgres")
langchain.llm_cache = SQLAlchemyCache(engine)

### 7. 关闭缓存

在某些情况下，可能需要关闭特定的LLM缓存。这可以通过在实例化LLM时使用“cache=False”来实现。

这种情况下，该LLM实例将不使用任何缓存。

In [None]:
llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2, cache=False)


### 8. 链式过程中关闭缓存

在链式过程中，有时需要关闭某些节点的缓存。在这种情况下，可以先构建链式过程，然后再编辑LLM。

总之，使用缓存可以在一定程度上提高程序的效率，使得代码运行更加快速和流畅。

其中，内存缓存适用于短暂的缓存需求，SQLite缓存和Redis缓存适用于需要延长持久性的缓存数据，而基于语义的缓存和GPTCache可以根据文本的语义与新要缓存的文本语义进行比较，从而实现更加智能的结果缓存。

此外，还可以使用SQLAlchemyCache来缓存任何SQLAlchemy支持的SQL数据库。