## Prompt Caching

*(Coding along with deeplearning.ai's online course [Building toward Computer Use with Anthropic - Learn how an AI Assistant is built to use and accomplish tasks on computers](https://learn.deeplearning.ai/courses/building-toward-computer-use-with-anthropic/lesson/1/introduction) taught by Colt Steele)*

Prompt caching is a technique in large language model (LLM) applications where previously used prompts and their responses are stored and reused, rather than making a new LLM API call for identical or similar prompts. This reduces API costs, improves response times, and decreases computational load. When a new prompt comes in, the system first checks if a matching or similar prompt exists in the cache before sending the request to the LLM.

In [1]:
# https://learn.deeplearning.ai/courses/building-toward-computer-use-with-anthropic/lesson/6/prompt-caching
from anthropic import Anthropic
import pandas as pd

anthropic_api_key = pd.read_csv("~/tmp/anthropic/anthropic-key-1.txt", sep=" ", header=None)[0][0]
print("Don't be a fool and sent your api key to github")

client = Anthropic(api_key=anthropic_api_key)
MODEL_NAME="claude-3-5-sonnet-20241022"

Don't be a fool and sent your api key to github


#### __Loading The Book__

In [2]:
with open('../assets/files/frankenstein.txt', 'r') as file:
    book_content = file.read()

In [3]:
len(book_content)

438804

In [4]:
book_content[1000:2000]

'by Mary Wollstonecraft (Godwin) Shelley\n\n\n CONTENTS\n\n Letter 1\n Letter 2\n Letter 3\n Letter 4\n Chapter 1\n Chapter 2\n Chapter 3\n Chapter 4\n Chapter 5\n Chapter 6\n Chapter 7\n Chapter 8\n Chapter 9\n Chapter 10\n Chapter 11\n Chapter 12\n Chapter 13\n Chapter 14\n Chapter 15\n Chapter 16\n Chapter 17\n Chapter 18\n Chapter 19\n Chapter 20\n Chapter 21\n Chapter 22\n Chapter 23\n Chapter 24\n\n\n\n\nLetter 1\n\n_To Mrs. Saville, England._\n\n\nSt. Petersburgh, Dec. 11th, 17—.\n\n\nYou will rejoice to hear that no disaster has accompanied the\ncommencement of an enterprise which you have regarded with such evil\nforebodings. I arrived here yesterday, and my first task is to assure\nmy dear sister of my welfare and increasing confidence in the success\nof my undertaking.\n\nI am already far north of London, and as I walk in the streets of\nPetersburgh, I feel a cold northern breeze play upon my cheeks, which\nbraces my nerves and fills me with delight. Do you understand this\n

### Uncached Request

In [5]:
import time
def make_non_cached_api_call():
    messages = [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    # marking the bounds of the book with xml
                    "text": "<book>" + book_content + "</book>"
                },
                {
                    # our actual question in a second message block
                    "type": "text",
                    "text": "What happens in chapter 3?"
                }
            ]
        }
    ]

    start_time = time.time() # time request gets send
    response = client.messages.create(
        model=MODEL_NAME,
        max_tokens=500,
        messages=messages,
    )
    end_time = time.time() # time response is received 

    return response, end_time - start_time # returning response plus delta of time

In [6]:
non_cached_response, non_cached_time = make_non_cached_api_call()
print(f"Non-cached time: {non_cached_time:.2f} seconds") # time it took to get the response

print("\nOutput (non-cached):") 
print(non_cached_response.content) # the non cached response

Non-cached time: 94.89 seconds

Output (non-cached):
[TextBlock(citations=None, text="In Chapter 3 of Frankenstein, several key events occur:\n\n1. Victor Frankenstein, at age 17, prepares to leave for university at Ingolstadt. However, his departure is delayed when his adopted sister Elizabeth falls ill with scarlet fever.\n\n2. While caring for Elizabeth, Victor's mother catches the illness. On her deathbed, she expresses her wish for Victor and Elizabeth to marry, and then she dies.\n\n3. After his mother's death, Victor eventually departs for Ingolstadt, though he is reluctant to leave his grieving family.\n\n4. At the university, Victor meets two professors:\n- Professor M. Krempe, who dismisses Victor's interest in alchemists like Cornelius Agrippa as outdated and worthless\n- Professor M. Waldman, who takes a more sympathetic approach and encourages Victor to study modern chemistry and other branches of natural philosophy\n\n5. Waldman's influence proves crucial, as his lecture 

In [7]:
# what intersts us most here is the token usage
non_cached_response.usage

Usage(cache_creation_input_tokens=0, cache_read_input_tokens=0, input_tokens=108438, output_tokens=328)

### Cached Version

In [8]:
def make_cached_api_call():
    messages = [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "<book>" + book_content + "</book>",
                    # adding cache control
                    # caches all the input tokens up to this point
                    "cache_control": {"type": "ephemeral"}
                },
                {
                    "type": "text",
                    "text": "What happens in chapter 5?"
                }
            ]
        }
    ]

    start_time = time.time()
    response = client.messages.create(
        model=MODEL_NAME,
        max_tokens=500,
        messages=messages,
    )
    end_time = time.time()

    return response, end_time - start_time

In [9]:
time.sleep(180) # setting a timeout of three minutes 
# in order not to reach the 40000 tokens per minute limit
response1, duration1 = make_cached_api_call()

In [10]:
response1

Message(id='msg_013Vt9xkCmjdK9cp3RgaBg1w', content=[TextBlock(citations=None, text="In Chapter 5, Victor Frankenstein finally succeeds in bringing his creation to life. Here are the key events:\n\n1. On a dreary night in November, Frankenstein completes his work and brings the creature to life. However, upon seeing his creation's hideous appearance - yellow skin, black lips, watery eyes - he is immediately filled with horror and disgust.\n\n2. Unable to endure looking at his creation, Frankenstein flees from his laboratory and tries to find peace in sleep. However, he has terrible nightmares where he sees Elizabeth, his fiancée, turn into the corpse of his dead mother.\n\n3. While sleeping, the monster comes to his bedside and tries to communicate, but Frankenstein is terrified and runs away again.\n\n4. The next morning, Frankenstein wanders the streets of Ingolstadt in distress. He eventually runs into his friend Henry Clerval, who has just arrived in town.\n\n5. When they return to 

In [11]:
response1.usage # first run, we've been writing to the cache, no reading here

Usage(cache_creation_input_tokens=108427, cache_read_input_tokens=0, input_tokens=11, output_tokens=360)

In [12]:
print(f"Non-cached time: {duration1:.2f} seconds") # time it took to get the response

Non-cached time: 94.92 seconds


In [13]:
time.sleep(180) # setting a timeout of three minutes 
# in order not to reach the 40000 tokens per minute limit
response2, duration2 = make_cached_api_call()

In [14]:
response2

Message(id='msg_01QBG9onCE6tL6kshj6EPY3x', content=[TextBlock(citations=None, text="In Chapter 5 of Frankenstein, Victor Frankenstein finally succeeds in bringing his creation to life. However, he is immediately horrified by what he has created. Here are the key events:\n\n1. On a dreary November night, Victor finally animates his creature. When it comes to life, he is terrified by its appearance - its yellow skin, watery eyes, black lips, and overall hideous countenance.\n\n2. Overwhelmed with horror and disgust, Victor flees from his laboratory and abandons his creation.\n\n3. He retreats to his bedchamber, trying to find sleep and forget what he has done, but is plagued by nightmares. In his dreams, he sees Elizabeth, who turns into the corpse of his dead mother.\n\n4. When he awakens, he finds the monster at his bedside, reaching out to him and trying to speak. Victor flees again in terror.\n\n5. He takes refuge in the courtyard of his building until morning, when he starts walking

In [15]:
response2.usage

Usage(cache_creation_input_tokens=108427, cache_read_input_tokens=0, input_tokens=11, output_tokens=372)

In [16]:
print(f"Non-cached time: {duration2:.2f} seconds") # time it took to get the response

Non-cached time: 20.37 seconds


In [21]:
# let's give it another shot and see if we still have a cache
def make_cached_api_call_2():
    messages = [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "<book>" + book_content + "</book>",
                    # adding cache control
                    # caches all the input tokens up to this point
                    "cache_control": {"type": "ephemeral"}
                },
                {
                    "type": "text",
                    "text": "What happens in chapter 1?"
                }
            ]
        }
    ]

    start_time = time.time()
    response = client.messages.create(
        model=MODEL_NAME,
        max_tokens=500,
        messages=messages,
    )
    end_time = time.time()

    return response, end_time - start_time

In [None]:
time.sleep(180) # setting a timeout of three minutes 
# in order not to reach the 40000 tokens per minute limit
response3, duration3 = make_cached_api_call_2()

In [None]:
response3

In [None]:
response3.usage

In [None]:
print(f"Non-cached time: {duration3:.2f} seconds") # time it took to get the response

#### __Prompt Caching Pricing__

* Cache write tokens are 25% more expensive than base input tokens
* Cache read tokens are 90% cheaper than base input tokens
* Regular input and output tokens are priced at standard rates

### Multi-Turn Caching

In [None]:
# putting cache control on the second to last and last user message
messages=[
    # ...long conversation so far
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Hello, can you tell me more about the solar system",
                "cache_control": {"type": "ephemeral"}
            }
        ]
    },
    {
        "role": "assistant",
        "content": "Certainly! The solar system is the collection of celestial bodies that orbit our Sun. It consists of eight planets, numerous moons, asteroids, comets, and other objects. The planets, in order from closest to farthest from the Sun, are: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. Each planet has its own unique characteristics and features. Is there a specific aspect of the solar system you'd like to know more about?"
    },
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Tell me more about Mars.",
                "cache_control": {"type": "ephemeral"}
            }
        ]
    }
]

In [None]:
# putting cache control on the second to last and last user message
messages=[
    # ...long conversation so far
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Hello, can you tell me more about the solar system",
            }
        ]
    },
    {
        "role": "assistant",
        "content": "Certainly! The solar system is the collection of celestial bodies that orbit our Sun. It consists of eight planets, numerous moons, asteroids, comets, and other objects. The planets, in order from closest to farthest from the Sun, are: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. Each planet has its own unique characteristics and features. Is there a specific aspect of the solar system you'd like to know more about?"
    },
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Tell me more about Mars.",
                "cache_control": {"type": "ephemeral"}
            }
        ]
    },
    {
        "role": "assistant",
        "content": "I'd love to tell you about Mars.  Mars is...."
    },
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "That's really neat.  Tell me about Pluto!",
                "cache_control": {"type": "ephemeral"}
            }
        ]
    },
]