<a href="https://colab.research.google.com/github/mrodgers/demo-testing/blob/main/GPTCache_Demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Prompt Cache Techniques Part 3 - GPTcache July 2023

Welcome to the PromptMule CacheCast series of Generative AI Cache Demos! This collection of demos is designed to explore various cache techniques for Generative AI models, enabling faster and more efficient inference on large language models. In Part 3, we delve into the realm of "GPTcache," where we'll showcase how caching data using GPTcache can revolutionize the performance of Generative AI-based apps. By harnessing the power of GPTcache, a specialized cache mechanism for language models, we aim to further reduce response times, optimize resource utilization, and deliver an unparalleled user experience when interacting with language models. Join us on this thrilling journey as we unlock the full potential of caching in the world of Generative AI!

In [3]:
# @title Input OpenAI API Key { run: "auto", vertical-output: true, display-mode: "both" }
#@markdown Input your OpenAI API key here. To obtain an OpenAI API key (https://platform.openai.com/account/api-keys), OR sign up on the OpenAI website, provide necessary information, and upon approval, you'll be issued an API key to authenticate your requests to the API.

OPENAI_API_KEY = "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" #@param {type:"string"}
#@markdown ---


In [1]:
!pip install openai
!pip install langchain
!pip install GPTCache
!pip install gptcache

Collecting openai
  Downloading openai-0.27.8-py3-none-any.whl (73 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m73.6/73.6 kB[0m [31m1.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: openai
Successfully installed openai-0.27.8
Collecting langchain
  Downloading langchain-0.0.242-py3-none-any.whl (1.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.6.0,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.5.13-py3-none-any.whl (26 kB)
Collecting langsmith<0.1.0,>=0.0.11 (from langchain)
  Downloading langsmith-0.0.14-py3-none-any.whl (29 kB)
Collecting openapi-schema-pydantic<2.0,>=1.2 (from langchain)
  Downloading openapi_schema_pydantic-1.2.4-py3-none-any.whl (90 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m90.0/90.0 kB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses

In [4]:
import os
from langchain.llms import OpenAI
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
llm_langchain = OpenAI(model_name="text-davinci-003")
text_to_predict = "Which is the best technical skill to learn in 2023?"
print(llm_langchain(text_to_predict))



The best technical skill to learn in 2023 will depend largely on the type of job you are looking to pursue. However, some of the most sought-after skills include coding, machine learning, data science, virtual reality, artificial intelligence, and blockchain.


In [5]:
import time

def response_text(openai_resp):
    return openai_resp['choices'][0]['message']['content']

print("Cache loading.....")

# To use GPTCache, that's all you need
# -------------------------------------------------
from gptcache import cache
from gptcache.adapter import openai

cache.init()
cache.set_openai_key()
# -------------------------------------------------

question = "what's github"
for _ in range(2):
    start_time = time.time()
    response = openai.ChatCompletion.create(
      model='gpt-3.5-turbo',
      messages=[
        {
            'role': 'user',
            'content': question
        }
      ],
    )
    print(f'Question: {question}')
    print("Time consuming: {:.2f}s".format(time.time() - start_time))
    print(f'Answer: {response_text(response)}\n')

Cache loading.....
Question: what's github
Time consuming: 2.74s
Answer: GitHub is a web-based platform commonly used for version control and collaboration in software development projects. It acts as a hosting service for Git repositories, which allows multiple developers to work on the same project simultaneously, track changes, and review or merge code. GitHub provides features like issue tracking, code review tools, project management tools, and a collaborative community of developers. It is widely used in the open-source software development community and also by many organizations for their private projects.

start to install package: tiktoken
successfully installed package: tiktoken
Question: what's github
Time consuming: 7.98s
Answer: GitHub is a web-based platform commonly used for version control and collaboration in software development projects. It acts as a hosting service for Git repositories, which allows multiple developers to work on the same project simultaneously, tr

In [7]:
import time


def response_text(openai_resp):
    return openai_resp['choices'][0]['message']['content']

from gptcache import cache
from gptcache.adapter import openai
from gptcache.embedding import Onnx
from gptcache.manager import CacheBase, VectorBase, get_data_manager
from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation

print("Cache loading.....")

onnx = Onnx()
data_manager = get_data_manager(CacheBase("sqlite"), VectorBase("faiss", dimension=onnx.dimension))
cache.init(
    embedding_func=onnx.to_embeddings,
    data_manager=data_manager,
    similarity_evaluation=SearchDistanceEvaluation(),
    )
cache.set_openai_key()

questions = [
    "what's github",
    "can you explain what GitHub is",
    "can you tell me more about GitHub",
    "what is the purpose of GitHub"
]

for question in questions:
    start_time = time.time()
    response = openai.ChatCompletion.create(
        model='gpt-3.5-turbo',
        messages=[
            {
                'role': 'user',
                'content': question
            }
        ],
    )
    print(f'Question: {question}')
    print("Time consuming: {:.2f}s".format(time.time() - start_time))
    print(f'Answer: {response_text(response)}\n')

Cache loading.....
start to install package: transformers
successfully installed package: transformers
start to install package: onnxruntime
successfully installed package: onnxruntime


Downloading (…)okenizer_config.json:   0%|          | 0.00/465 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/827 [00:00<?, ?B/s]

Downloading spiece.model:   0%|          | 0.00/760k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.31M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/245 [00:00<?, ?B/s]

Downloading model.onnx:   0%|          | 0.00/46.9M [00:00<?, ?B/s]

start to install package: faiss-cpu
successfully installed package: faiss-cpu
Question: what's github
Time consuming: 4.34s
Answer: GitHub is a web-based platform for version control and collaboration that allows developers to manage and store their code repositories. It provides a platform for developers to collaborate on projects, track changes to code, and host their code repositories. GitHub offers various features such as issue tracking, pull requests, code review, and project management tools. It is widely used in the development community for open-source projects, as well as for private development projects within organizations.

Question: can you explain what GitHub is
Time consuming: 0.78s
Answer: GitHub is a web-based platform for version control and collaboration that allows developers to manage and store their code repositories. It provides a platform for developers to collaborate on projects, track changes to code, and host their code repositories. GitHub offers various fe

In [6]:
from gptcache import Cache
from gptcache.manager.factory import manager_factory
from gptcache.processor.pre import get_prompt
from langchain.cache import GPTCache
import hashlib
def get_hashed_name(name):
    return hashlib.sha256(name.encode()).hexdigest()
def init_gptcache(cache_obj: Cache, llm: str):
    hashed_llm = get_hashed_name(llm)
    cache_obj.init(
        pre_embedding_func=get_prompt,
        data_manager=manager_factory(manager="map", data_dir=f"map_cache_{hashed_llm}"),
    )
langchain.llm_cache = GPTCache(init_gptcache)

NameError: ignored

In [None]:
import time
import timeit

num_iterations = 1
# Define the function instruction
def instruction():
  result = llm_langchain(text_to_predict)  # this is our test prompt function to the llm
  print(f"Response: {result.strip()}")

# Perform the first run, bypassing the cache
start_time = timeit.timeit(instruction,number=1) # initial execution runs, bypasses cache, should take longest time
first_run = start_time

num_iterations = 3
# Perform multiple cache hits and average the time
cache_time = sum(timeit.timeit(instruction, number=1) for _ in range(num_iterations)) / num_iterations
delta = first_run - cache_time

print("--- WTC Benchmark ---")
print(f"Time taken for 1st execution: {first_run:.6f} seconds")
print(f"Time taken for Cache Hit execution (average): {cache_time:.6f} seconds")
print(f"The delta between Cache hit and OpenAI call is: {delta:.6f} seconds")

Thanks to code examples from:

In [None]:
# Examples found and used in this demo

[This example](https://gptcache.readthedocs.io/en/latest/bootcamp/openai/chat.html)
[OpenAI Example](https://platform.openai.com/docs/guides/chat/introduction)
