<a href="https://colab.research.google.com/github/mrodgers/demo-testing/blob/main/GPTCache_Demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Prompt Cache Techniques Part 3 - GPTCache July 2023

Welcome to the PromptMule CacheCast series of Generative AI Cache Demos! This collection of demos is designed to explore various cache techniques for Generative AI models, enabling faster and more efficient inference on large language models. In Part 3, we delve into the realm of "GPTcache," where we'll showcase how caching data using GPTcache can revolutionize the performance of Generative AI-based apps. By harnessing the power of GPTcache, a specialized cache mechanism for language models, we aim to further reduce response times, optimize resource utilization, and deliver an unparalleled user experience when interacting with language models. Join us on this thrilling journey as we unlock the full potential of caching in the world of Generative AI!

# GPTCache Benchmarking Demo with SQLite

This repository contains a Python script for benchmarking the performance of GPTCache using an SQLite database as the caching mechanism. The script is designed to be executed on Google Colab notebooks.

## Introduction

The GPTCache Benchmarking Demo with SQLite is a Python script that measures the execution time of GPTCache for a specific function (`llm_langchain`) that processes a given text. The demo showcases how caching data in SQLite using GPTCache can significantly improve the performance of Generative AI models, specifically for semantic search use cases.

To use GPTCache with SQLite, we need to initialize the cache and set up the OpenAI API key. The demo demonstrates the initialization process and showcases the usage of GPTCache to process a prompt question and retrieve the response.

**Note:** Before running the demo, ensure you have an OpenAI API key. Replace `YOUR_OPENAI_API_KEY` in the script with your actual API key.

## Code Explanation

The provided Python script performs the following tasks:

1. Importing Required Libraries:
   The script starts by importing the necessary libraries, including `time` for time measurement.

2. `response_text` Function:
   The `response_text` function is defined to extract the response text from the OpenAI API response.

3. Cache Initialization:
   The script imports GPTCache and other related modules, such as `cache` and `openai`. It initializes GPTCache with SQLite as the caching mechanism.

4. Prompt Question:
   The script defines a prompt question in the `question` variable. This question will be used to evaluate the GPTCache performance.

5. Benchmarking:
   The script runs a loop twice to simulate two prompt requests. It measures the time it takes to process each prompt using GPTCache. It utilizes the `openai.ChatCompletion.create` method to communicate with the GPT-3.5-turbo model provided by OpenAI.

6. Output:
   For each prompt request, the script displays the prompt question itself, the time taken for execution, and the generated answer using GPTCache.

## Usage on Google Colab

1. Open Google Colab in your web browser: [https://colab.research.google.com/](https://colab.research.google.com/).

2. In Google Colab, click on "File" > "New Notebook" to create a new notebook.

3. In the code cell, paste the following to clone the repository and run the benchmarking demo:

```python
!git clone https://github.com/mrodgers/demo-testing/blob/main/GPTCache_SQLite_Demo.ipynb
%cd cache-benchmark-demo
!pip install openai  # Install OpenAI library
!pip install gptcache  # Install GPTCache library
!python benchmark.py --api_key=YOUR_OPENAI_API_KEY
```

4. Replace `your_username` with your actual GitHub username and `YOUR_OPENAI_API_KEY` with your OpenAI API key.

5. Click on the "Run" button to execute the code. The benchmarking demo will run, and you will see the output showing the execution times.

## Contributing

Contributions to this GPTCache benchmarking demo are welcome! If you find any issues, have suggestions, or want to extend the functionality, feel free to create a pull request.

## License

The GPTCache Benchmarking Demo with SQLite is open-source software licensed under the [MIT License](LICENSE).

---

In [None]:
# @title Input OpenAI API Key { run: "auto", vertical-output: true, display-mode: "both" }
#@markdown Input your OpenAI API key here. To obtain an OpenAI API key (https://platform.openai.com/account/api-keys), OR sign up on the OpenAI website, provide necessary information, and upon approval, you'll be issued an API key to authenticate your requests to the API.

OPENAI_API_KEY = "YOUR_OPENAI_KET_HERE" #@param {type:"string"}
#@markdown ---


In [None]:
!pip install openai
!pip install langchain
!pip install GPTCache
!pip install gptcache

In [None]:
import os
from langchain.llms import OpenAI
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
llm_langchain = OpenAI(model_name="text-davinci-003")
text_to_predict = "Which is the best technical skill to learn in 2023?"
print(llm_langchain(text_to_predict))

In [None]:
import time

def response_text(openai_resp):
    return openai_resp['choices'][0]['message']['content']

print("Cache loading.....")

# To use GPTCache, that's all you need
# -------------------------------------------------
from gptcache import cache
from gptcache.adapter import openai

cache.init()
cache.set_openai_key()
# -------------------------------------------------

question = "what's github"
for _ in range(2):
    start_time = time.time()
    response = openai.ChatCompletion.create(
      model='gpt-3.5-turbo',
      messages=[
        {
            'role': 'user',
            'content': question
        }
      ],
    )
    print(f'Question: {question}')
    print("Time consuming: {:.2f}s".format(time.time() - start_time))
    print(f'Answer: {response_text(response)}\n')

In [None]:
import time

def response_text(openai_resp):
    return openai_resp['choices'][0]['message']['content']

from gptcache import cache
from gptcache.adapter import openai
from gptcache.embedding import Onnx
from gptcache.manager import CacheBase, VectorBase, get_data_manager
from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation

print("Cache loading.....")

onnx = Onnx()
data_manager = get_data_manager(CacheBase("sqlite"), VectorBase("faiss", dimension=onnx.dimension))
cache.init(
    embedding_func=onnx.to_embeddings,
    data_manager=data_manager,
    similarity_evaluation=SearchDistanceEvaluation(),
    )
cache.set_openai_key()

questions = [
    "what's github",
    "can you explain what GitHub is",
    "can you tell me more about GitHub",
    "what is the purpose of GitHub"
]

for question in questions:
    start_time = time.time()
    response = openai.ChatCompletion.create(
        model='gpt-3.5-turbo',
        messages=[
            {
                'role': 'user',
                'content': question
            }
        ],
    )
    print(f'Question: {question}')
    print("Time consuming: {:.2f}s".format(time.time() - start_time))
    print(f'Answer: {response_text(response)}\n')

In [None]:
import langchain
from gptcache import Cache
from gptcache.manager.factory import manager_factory
from gptcache.processor.pre import get_prompt
from langchain.cache import GPTCache
import hashlib
def get_hashed_name(name):
    return hashlib.sha256(name.encode()).hexdigest()
def init_gptcache(cache_obj: Cache, llm: str):
    hashed_llm = get_hashed_name(llm)
    cache_obj.init(
        pre_embedding_func=get_prompt,
        data_manager=manager_factory(manager="map", data_dir=f"map_cache_{hashed_llm}"),
    )
langchain.llm_cache = GPTCache(init_gptcache)

In [None]:
import time
import timeit

num_iterations = 1
# Define the function instruction
def instruction():
  result = llm_langchain(text_to_predict)  # this is our test prompt function to the llm
  print(f"Response: {result.strip()}")

# Perform the first run, bypassing the cache
start_time = timeit.timeit(instruction,number=1) # initial execution runs, bypasses cache, should take longest time
first_run = start_time

num_iterations = 3
# Perform multiple cache hits and average the time
cache_time = sum(timeit.timeit(instruction, number=1) for _ in range(num_iterations)) / num_iterations
delta = first_run - cache_time

print("--- WTC Benchmark ---")
print(f"Time taken for 1st execution: {first_run:.6f} seconds")
print(f"Time taken for Cache Hit execution (average): {cache_time:.6f} seconds")
print(f"The delta between Cache hit and OpenAI call is: {delta:.6f} seconds")

Thanks to code examples from:

# Examples found and used in this demo

[This example](https://gptcache.readthedocs.io/en/latest/bootcamp/openai/chat.html)
[OpenAI Example](https://platform.openai.com/docs/guides/chat/introduction)

Visit www.promptmule.com