# Caching OpenAI API calls

This tutorial shows how to use `APICache` to cache calls to OpenAI's API. Subsequent calls with the same arguments will return the cached response.

Supporting packge (`aiutils`) source [code available here.](https://github.com/ploomber/doc/blob/d5aba61f479be0afb42efa8925c81a497f9e54b1/aiutils/src/aiutils/cache.py#L16)


Let's configure logging to ensure the cache is working.

In [1]:
import logging

logging.basicConfig(level=logging.CRITICAL)
aiutils_cache_logger = logging.getLogger('aiutils.cache')
aiutils_cache_logger.setLevel(logging.INFO)

To start caching, pass the API function:

In [2]:
from aiutils.cache import APICache
from openai import OpenAI

client = OpenAI()

embeddings_create = APICache(client.embeddings.create)

Calls to `embeddings_create` will check the cache, if there is no cache, they will call the API and store the response. If there is a match, they'll return the cached response.

In [4]:
response = embeddings_create(input="some text", model="text-embedding-3-small")
embedding = response.data[0].embedding
embedding[:10]

INFO:aiutils.cache:Cache miss, calling API.


[-0.012787822633981705,
 0.016391022130846977,
 0.00927097536623478,
 -0.02223150059580803,
 -0.03259364143013954,
 -0.03322165086865425,
 0.01840064860880375,
 0.010150187648832798,
 0.018793154507875443,
 -0.017584238201379776]

This should now use the cache:

In [6]:
_ = embeddings_create(input="some text", model="text-embedding-3-small")

INFO:aiutils.cache:Cache hit, using cached response.


Changes to any of the arguments invalidate the cache:

In [7]:
_ = embeddings_create(input="some text", model="text-embedding-3-large")

INFO:aiutils.cache:Cache miss, calling API.


In [9]:
response = embeddings_create(
    input=user_query,
    model="text-embedding-3-small")

embedding_query = response.data[0].embedding

INFO:aiutils.cache:Cache hit, using cached response.


You can use any of the API functions, let's now use the completions API:

In [8]:
completions_create = APICache(client.chat.completions.create)

response = completions_create(
  model="gpt-3.5-turbo-0125",
  messages=[
    {"role": "system", "content": "You're a helpful assistant"},
    {"role": "user", "content": "Say hi!"},
  ])


INFO:aiutils.cache:Cache miss, calling API.


In [11]:
response.choices[0].message.content

'Hello! How can I assist you today?'

In [12]:
response = completions_create(
  model="gpt-3.5-turbo-0125",
  messages=[
    {"role": "system", "content": "You're a helpful assistant"},
    {"role": "user", "content": "Say hi!"},
  ])

response.choices[0].message.content

INFO:aiutils.cache:Cache hit, using cached response.


'Hello! How can I assist you today?'

## Exploring the data

The data is stored in a SQLite database, you can explore it with SQL:

In [1]:
%load_ext sql

In [2]:
import sqlite3

from aiutils import CACHE_PATH

In [3]:
conn = sqlite3.connect(CACHE_PATH)

In [4]:
%sql conn

In [5]:
%%sql
SELECT
  COUNT(*)
FROM
  api_calls

COUNT(*)
26


`qualified_name` stores the function that was used:

In [6]:
%%sql
SELECT DISTINCT
  qualified_name
FROM
  api_calls

qualified_name
openai.resources.embeddings.Embeddings.create
openai.resources.chat.completions.Completions.create


`kwargs` are the arguments used, you can use SQLite's JSON capabilities to manipulate the JSON string:

In [7]:
%%sql
SELECT
  kwargs -> '$.model' AS model
FROM
  api_calls
LIMIT
  2

model
"""text-embedding-3-small"""
"""text-embedding-3-small"""


`response` stores the response from the API:

In [8]:
%%sql
SELECT
  response -> '$.choices[0].message.content' AS content
FROM
  api_calls
WHERE
  content IS NOT NULL

content
"""The AdamW optimizer was used to train this model."""
"""Hello! How can I assist you today?"""
