<a href="https://colab.research.google.com/github/kkrueger/Redis-Workshops/blob/main/07-Semantic_Caching_Redis/07-Semantic_Caching_Redis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Semantic Caching with Redis

![Redis](https://redis.com/wp-content/themes/wpx/assets/images/logo-redis.svg?auto=webp&quality=85,75&width=120)

RedisVL provides the `LLMCache` interface to turn Redis, with its vector search capability, into a semantic cache to store query results, thereby reducing the number of requests and tokens sent to the Large Language Models (LLM) service. This decreases expenses and enhances performance by reducing the time taken to generate responses.

This notebook will go over how to use `LLMCache` for your applications

First, we will install Python dependencies and import OpenAI to user their API for responding to prompts.


In [None]:
%pip -q install openai redisvl transformers sentence-transformers


Initialize OpenAI. You need to supply your OpenAI API key (starts with `sk-...`) when prompted. You can find your API key at https://platform.openai.com/account/api-keys

In [None]:
import openai
import os
import getpass
import redis

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY","")
if OPENAI_API_KEY == "":
    key=getpass.getpass(prompt='OpenAI Key: ', stream=None)
    os.environ['OPENAI_API_KEY']=key

openai.api_key = os.getenv("OPENAI_API_KEY")

### Install Redis Stack

Redis Search will be used as Vector Similarity Search engine for LangChain. Instead of using in-notebook Redis Stack https://redis.io/docs/getting-started/install-stack/ you can provision your own free instance of Redis in the cloud. Get your own Free Redis Cloud instance at https://redis.com/try-free/

In [None]:
%%sh
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
sudo apt-get update  > /dev/null 2>&1
sudo apt-get install redis-stack-server  > /dev/null 2>&1
redis-stack-server --daemonize yes

### Connect to Redis

By default this notebook would connect to the local instance of Redis Stack. If you have your own Redis Cloud instance - replace REDIS_PASSWORD, REDIS_HOST and REDIS_PORT values with your own.

In [None]:
REDIS_HOST = os.getenv("REDIS_HOST", "localhost")
REDIS_PORT = os.getenv("REDIS_PORT", "6379")
REDIS_PASSWORD = os.getenv("REDIS_PASSWORD", "")
#Replace values above with your own if using Redis Cloud instance
#REDIS_HOST=""
#REDIS_PORT=
#REDIS_PASSWORD=""

REDIS_URL = f"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}"

In [None]:
# Helper method for submitting a prompt to OpenAI
def ask_openai(question):
    response = openai.Completion.create(
      engine="text-davinci-003",
      prompt=question,
      max_tokens=200
    )
    return response.choices[0].text.strip()

In [None]:
# Test it
print(ask_openai("What is the capital of France?"))

# Initializing and using `LLMCache`

`LLMCache` will automatically create an index within Redis upon initialization for the semantic cache. The same `SearchIndex` class used in the previous tutorials is used here to perform index creation and manipulation.

In [None]:
from redisvl.llmcache.semantic import SemanticCache
cache = SemanticCache(
    redis_url=REDIS_URL,
    threshold=0.9, # semantic similarity threshold
    )

In [None]:
# Look at the index specification created for the semantic cache lookup
!rvl index info -i cache

#### Check the cache

In [None]:
# Check the cache
cache.check("What is the capital of France?")

In [None]:
# Store the question and answer
cache.store("What is the capital of France?", "Paris")

In [None]:
# Check the cache again
cache.check("What is the capital of France?")

In [None]:
# Check for a semantically similar result
cache.check("What really is the capital of France?")

In [None]:
# Decrease the semantic similarity threshold
cache.set_threshold(0.7)
cache.check("What really is the capital of France?")

In [None]:
# Adversarial example (not semantically similar enough)
cache.check("What is the capital of Spain?")

In [None]:
cache.clear()

## Performance

Next, we will measure the speedup obtained by using ``LLMCache``. We will use the ``time`` module to measure the time taken to generate responses with and without ``LLMCache``.

In [None]:
def answer_question(question: str):
    results = cache.check(question)
    if results:
        return results[0]
    else:
        answer = ask_openai(question)
        cache.store(question, answer)
        return answer

In [None]:
import time
start = time.time()
answer = answer_question("What is the capital of France?")
end = time.time()
print(f"Time taken without cache {time.time() - start}")

In [None]:
cached_start = time.time()
cached_answer = answer_question("What is the capital of France?")
cached_end = time.time()
print(f"Time Taken with cache: {cached_end - cached_start}")
print(f"Percentage of time saved: {round(((end - start) - (cached_end - cached_start)) / (end - start) * 100, 2)}%")

In [None]:
# check the stats of the index
!rvl stats -i cache

In [None]:
# remove the index and all cached items
cache.index.delete()