<a href="https://colab.research.google.com/github/antonum/Redis-Workshops/blob/main/05-LangChain_Redis/05.1x_LangChain_RedisCachedEmbeddings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Redis CacheBackedEmbeddings

![Redis](https://redis.com/wp-content/themes/wpx/assets/images/logo-redis.svg?auto=webp&quality=85,75&width=120)

This notebook goes over how to use Redis to cache the embeddings.

### Install Dependencies


In [20]:
!pip install -q langchain_openai langchain_community langchain redis unstructured faiss-cpu


[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.6/17.6 MB[0m [31m30.9 MB/s[0m eta [36m0:00:00[0m
[?25h

## Initialize OpenAI

You need to supply the OpenAI API key (starts with `sk-...`) when prompted. You can find your API key at https://platform.openai.com/account/api-keys

In [2]:
import openai
import os
import getpass

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("OPENAI_API_KEY")

OPENAI_API_KEY··········


### Install Redis Stack

Redis will be used as Va message history store for LangChain. Instead of using in-notebook Redis Stack https://redis.io/docs/getting-started/install-stack/ you can provision your own free instance of Redis in the cloud. Get your own Free Redis Cloud instance at https://redis.com/try-free/

In [3]:
%%sh
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
sudo apt-get update  > /dev/null 2>&1
sudo apt-get install redis-stack-server  > /dev/null 2>&1
redis-stack-server --daemonize yes

deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb jammy main
Starting redis-stack-server, database path /var/lib/redis-stack


### Connect to Redis

By default this notebook would connect to the local instance of Redis Stack. If you have your own Redis Cloud instance - replace REDIS_PASSWORD, REDIS_HOST and REDIS_PORT values with your own.

In [4]:
import os


REDIS_HOST = os.getenv("REDIS_HOST", "localhost")
REDIS_PORT = os.getenv("REDIS_PORT", "6379")
REDIS_PASSWORD = os.getenv("REDIS_PASSWORD", "")
#Replace values above with your own if using Redis Cloud instance
#REDIS_HOST="redis-18374.c253.us-central1-1.gce.cloud.redislabs.com"
#REDIS_PORT=18374
#REDIS_PASSWORD="1TNxTEdYRDgIDKM2gDfasupCADXXXX"

#shortcut for redis-cli $REDIS_CONN command
# If SSL is enabled on the endpoint add --tls
if REDIS_PASSWORD!="":
  os.environ["REDIS_CONN"]=f"-h {REDIS_HOST} -p {REDIS_PORT} -a {REDIS_PASSWORD} --no-auth-warning"
else:
  os.environ["REDIS_CONN"]=f"-h {REDIS_HOST} -p {REDIS_PORT}"

# If SSL is enabled on the endpoint, use rediss:// as the URL prefix
REDIS_URL = f"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}"

In [5]:
#test Redis connection
!redis-cli $REDIS_CONN PING

PONG


In [9]:
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
#from langchain_community.vectorstores import FAISS
from langchain.vectorstores.redis import Redis
from langchain_openai import OpenAIEmbeddings
from langchain.embeddings import CacheBackedEmbeddings
from langchain.storage import RedisStore

underlying_embeddings = OpenAIEmbeddings()

store = RedisStore(redis_url=REDIS_URL)

#store = LocalFileStore("./cache/")

cached_embedder = CacheBackedEmbeddings.from_bytes_store(
    underlying_embeddings, store, namespace=underlying_embeddings.model
)

In [31]:
list(store.yield_keys())

['k1', 'text-embedding-ada-00255c1930e-bc0b-550d-a284-030f8cbfd05a', 'k2']

In [30]:
store.mset([("k1", b"v1"), ("k2", b"v2")])
print(store.mget(["k1", "k2"]))

[b'v1', b'v2']


In [14]:
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredURLLoader

In [26]:
# Add your own URLs here
urls = [
    "https://raw.githubusercontent.com/hwchase17/chat-your-data/master/state_of_the_union.txt"
]
loader = UnstructuredURLLoader(urls=urls)
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20, add_start_index = True)
texts = text_splitter.split_documents(documents)

In [28]:
%%time
db = FAISS.from_documents(documents, cached_embedder)

CPU times: user 3.1 ms, sys: 0 ns, total: 3.1 ms
Wall time: 4.35 ms


Finally let's look under the Redis covers. Data for the single session `foo` is stored in the key `message_store:foo` of the type LIST

In [29]:
!redis-cli $REDIS_CONN keys "*"
!redis-cli $REDIS_CONN get "text-embedding-ada-00255c1930e-bc0b-550d-a284-030f8cbfd05a"

1) "k1"
2) "text-embedding-ada-00255c1930e-bc0b-550d-a284-030f8cbfd05a"
3) "k2"
"[0.0014242342110415802, -0.016069661453560598, -0.020261747293095333, -0.009371729350503628, 0.0024739351947343463, 0.013254782809167174, 0.015505342385031413, -0.00506207754282503, -0.02023487491215001, -0.002974432358518394, 0.014524501411849756, 0.014766352840357643, -0.023620789323325123, -0.00019891831737587455, 0.017117682447783007, 0.0065266204416999105, 0.02910274758559015, -0.01688926814107033, 0.02472255507943817, -0.02157849209677095, -0.0007574641901583594, -0.016109969093656026, 0.022183119736718115, 0.018716586320061936, -0.006899474028824322, 0.005354314452774756, 0.03469219350899136, -0.019657119653148168, 0.02813534373420371, -0.01209255466158843, -0.010688474154179246, -0.03125253619857071, -0.02867278949046502, -0.012005218957854861, -0.027020140856295446, -0.018447864373253835, -0.016083096712710705, -0.011138586069352092, 0.017399841516386322, -0.029693939966387212, 0.01000322937139611