<a href="https://colab.research.google.com/github/antonum/Redis-Workshops/blob/main/05-LangChain_Redis/05.1x_LangChain_RedisCachedEmbeddings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Redis CacheBackedEmbeddings

![Redis](https://redis.com/wp-content/themes/wpx/assets/images/logo-redis.svg?auto=webp&quality=85,75&width=120)

This notebook goes over how to use Redis to cache the embeddings.

### Install Dependencies


In [1]:
!pip install -q langchain_openai langchain_community langchain redis unstructured faiss-cpu


[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m15.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m817.0/817.0 kB[0m [31m24.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m250.3/250.3 kB[0m [31m18.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m22.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.6/17.6 MB[0m [31m55.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m246.4/246.4 kB[0m [31m11.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m226.7/226.7 kB[0m [31m25.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m67.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━

## Initialize OpenAI

You need to supply the OpenAI API key (starts with `sk-...`) when prompted. You can find your API key at https://platform.openai.com/account/api-keys

In [2]:
import openai
import os
import getpass

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("OPENAI_API_KEY")

OPENAI_API_KEY··········


### Install Redis Stack

Redis will be used as Va message history store for LangChain. Instead of using in-notebook Redis Stack https://redis.io/docs/getting-started/install-stack/ you can provision your own free instance of Redis in the cloud. Get your own Free Redis Cloud instance at https://redis.com/try-free/

In [3]:
%%sh
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
sudo apt-get update  > /dev/null 2>&1
sudo apt-get install redis-stack-server  > /dev/null 2>&1
redis-stack-server --daemonize yes

deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb jammy main
Starting redis-stack-server, database path /var/lib/redis-stack


### Connect to Redis

By default this notebook would connect to the local instance of Redis Stack. If you have your own Redis Cloud instance - replace REDIS_PASSWORD, REDIS_HOST and REDIS_PORT values with your own.

In [4]:
import os


REDIS_HOST = os.getenv("REDIS_HOST", "localhost")
REDIS_PORT = os.getenv("REDIS_PORT", "6379")
REDIS_PASSWORD = os.getenv("REDIS_PASSWORD", "")
#Replace values above with your own if using Redis Cloud instance
#REDIS_HOST="redis-18374.c253.us-central1-1.gce.cloud.redislabs.com"
#REDIS_PORT=18374
#REDIS_PASSWORD="1TNxTEdYRDgIDKM2gDfasupCADXXXX"

#shortcut for redis-cli $REDIS_CONN command
# If SSL is enabled on the endpoint add --tls
if REDIS_PASSWORD!="":
  os.environ["REDIS_CONN"]=f"-h {REDIS_HOST} -p {REDIS_PORT} -a {REDIS_PASSWORD} --no-auth-warning"
else:
  os.environ["REDIS_CONN"]=f"-h {REDIS_HOST} -p {REDIS_PORT}"

# If SSL is enabled on the endpoint, use rediss:// as the URL prefix
REDIS_URL = f"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}"

In [5]:
#test Redis connection
!redis-cli $REDIS_CONN PING

PONG


In [6]:
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
#from langchain_community.vectorstores import FAISS
from langchain.vectorstores.redis import Redis
from langchain_openai import OpenAIEmbeddings
from langchain.embeddings import CacheBackedEmbeddings
from langchain.storage import RedisStore

underlying_embeddings = OpenAIEmbeddings()

store = RedisStore(redis_url=REDIS_URL)

#store = LocalFileStore("./cache/")

cached_embedder = CacheBackedEmbeddings.from_bytes_store(
    underlying_embeddings, store, namespace=underlying_embeddings.model+":"
)

In [7]:
list(store.yield_keys())

[]

In [8]:
#store.mset([("k1", b"v1"), ("k2", b"v2")])
#print(store.mget(["k1", "k2"]))

In [9]:
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredURLLoader

In [10]:
# Add your own URLs here
urls = [
    "https://raw.githubusercontent.com/hwchase17/chat-your-data/master/state_of_the_union.txt"
]
loader = UnstructuredURLLoader(urls=urls)
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20, add_start_index = True)
texts = text_splitter.split_documents(documents)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


In [22]:
%%time
db = FAISS.from_documents(texts, cached_embedder)

CPU times: user 357 ms, sys: 20.2 ms, total: 377 ms
Wall time: 390 ms


In [17]:
%%time
db = FAISS.from_documents(texts, cached_embedder)

CPU times: user 361 ms, sys: 16.7 ms, total: 378 ms
Wall time: 386 ms


Finally let's look under the Redis covers. Data for the single session `foo` is stored in the key `message_store:foo` of the type LIST

In [20]:
#db.similarity_search("Donald Trump")

In [23]:
!redis-cli $REDIS_CONN keys "*"
!redis-cli $REDIS_CONN get "text-embedding-ada-00255c1930e-bc0b-550d-a284-030f8cbfd05a"

  1) "text-embedding-ada-002:f6302684-d123-5bcc-bd21-138c55bfe1d0"
  2) "text-embedding-ada-002:57f96d07-6181-500c-b508-2b2e544c4ecf"
  3) "text-embedding-ada-002:4749907c-fdb4-5491-83bf-e42dae39bd81"
  4) "text-embedding-ada-002:6b0af38f-609f-5710-beb1-54f2223548fb"
  5) "text-embedding-ada-002:4a95777c-6bcf-52e0-9d32-19b09b56ca27"
  6) "text-embedding-ada-002:ee3972d6-2aca-5949-a6f8-5a52834804a2"
  7) "text-embedding-ada-002:606b8094-4553-53b0-aee9-d91819a7eec6"
  8) "text-embedding-ada-002:7375d2b7-2b25-5e27-883c-11bc88c8c890"
  9) "text-embedding-ada-002:20c8623e-38b7-5179-abaa-f5d3f70ccfdd"
 10) "text-embedding-ada-002:d58f699f-6437-52bc-bdf1-816e8f2be77b"
 11) "text-embedding-ada-002:2adbac23-90c6-5efe-8e93-29f8bd542877"
 12) "text-embedding-ada-002:6eb015f0-9f09-5b8f-8418-ad67a9b8ea9d"
 13) "text-embedding-ada-002:56b01e4b-f04f-5fe1-8a0a-568d65cb5e12"
 14) "text-embedding-ada-002:71792bb7-993d-542f-947e-656040f2c607"
 15) "text-embedding-ada-002:164a2401-4a95-5cdc-ab27-8ff22143a