# Prompt Caching With Anthropic

**Prompt caching** is a valuable feature that optimizes API usage by enabling the reuse of specific prompt prefixes.

**Benefits**:

This method significantly cuts down processing time and costs for tasks or prompts with recurring elements, making it particularly useful for sophisticated LLMs, especially those that are multimodal.



**How it works?**

- The system checks whether the prompt prefix has been recently cached.

- Cache hit: if there is a match, the system will use the cached version of the prompt prefix, thus reducting the cost and the time of processing.

- Cache miss: If not, it will process the prompt completely and caches the prefix for future use.


**Constraints**:

- The prompt prefix must stay the same between various requests to enable prompt caching: Place static data at the start of your prompt message, and position user requests and variable data toward the end.

- The prompt has to include at least:
1.   1024 tokens for Claude 3.5 Sonnet/ 3 Opus.
2.   2048 tokens for Claude 3 Haiku


- The cache remains active for 5 minutes.


**How to use it?**

1.  Specify "prompt-caching-2024-07-31" either in the:
- API header like this:
anthropic-beta: prompt-caching-2024-07-31,
or
- Directly in the message call like this
betas=["prompt-caching-2024-07-31"],

2.  Add to your message this parameter:
`"cache_control": {"type": "ephemeral"}`

**Use case:**

To showcase prompt caching, I'll use the new **"PDF support"** feature, which lets me upload a PDF to Claude 3.5 Sonnet. This is an ideal use case, as the PDF content is static. I'll first include the PDF content in the message to enable prefix caching, then add my query at the end.



👉 We'll use **BlackRock Q4 2024 Outlooks report:**. This report outlines BlackRock's key investment strategies across three main areas: equities, fixed income , and alternative assets. The document provides detailed analysis and recommendations for positioning portfolios amid market volatility, falling interest rates, and the upcoming US presidential election, while emphasizing the importance of maintaining a selective approach across asset classes.

Original report: https://www.blackrock.com/institutions/en-zz/literature/whitepaper/investment-directions-q4-24-np.pdf

I excluded the last 6 pages, because they have no additional value.

[Hanane Dupouy](https://www.linkedin.com/in/hanane-d-algo-trader)

# Install libs

In [None]:
!pip install anthropic -q

In [16]:
import anthropic
import base64
from IPython.display import HTML

In [3]:
from google.colab import userdata
CLAUDE_API_KEY = userdata.get('CLAUDE_API_KEY')

# PDF loading

In [18]:
def upload_pdf(path_to_pdf):
  with open(path_to_pdf, "rb") as pdf_file:
    binary_data = pdf_file.read()
    base_64_encoded_data = base64.b64encode(binary_data)
    base64_string = base_64_encoded_data.decode('utf-8')
  return base64_string

In [20]:
# BlackRock pdf from: https://www.blackrock.com/institutions/en-zz/literature/whitepaper/investment-directions-q4-24-np.pdf
# Investment Directions
# Q4 2024: Exposures for today’s market
# Key Themes and Outlook for Q4 2024 by BlackRock
# I exluded the 6 last pages, because a lot of text about risk warnings in several countries where BlackRock is present, without any added values for Q4 2024 outlooks

path_to_pdf = local_path + 'BlackRock_investment-directions-q4-24-np-1-13.pdf'
pdf_data = upload_pdf(path_to_pdf)

# Define completion method with Prompt Caching feature:

In the message completion creation level, I'll specify the usage of the new feature supporting PDF upload **"pdfs-2024-09-25"**, alongside the prompt caching feature **"prompt-caching-2024-07-31"**.


In [21]:
client = anthropic.Anthropic(api_key = CLAUDE_API_KEY)
# For now, only claude-3-5-sonnet-20241022 supports PDFs
MODEL_NAME = "claude-3-5-sonnet-20241022"

def get_completion(messages, model=MODEL_NAME):
    completion = client.beta.messages.create(
        betas=["pdfs-2024-09-25", "prompt-caching-2024-07-31"],
        model=model,
        max_tokens=8192,
        messages=messages,
        temperature=0,
    )
    return completion

# Questions to ask

In [6]:
queries = ["What are the main investment themes discussed in the Q4 2024 outlook?",
"What are the key takeaways from BlackRock’s investment strategies for Q4 2024?",
"Which sectors are expected to benefit most from the AI build-out, according to BlackRock?",
"How might the upcoming U.S. presidential election impact investment strategies?",
"What are BlackRock’s views on the impact of trade policies and economic fragmentation on inflation?",
"How does BlackRock suggest positioning a portfolio to mitigate geopolitical risks?"]

Before start chating and using the prompt caching, let's count the number of tokens in the pdf:

# Counting Tokens

You are not charged when asking for tokens counting

In [None]:
client = anthropic.Anthropic(api_key = CLAUDE_API_KEY)

response = client.beta.messages.count_tokens(
    betas=["token-counting-2024-11-01", "pdfs-2024-09-25"],
    model="claude-3-5-sonnet-20241022",
    messages=[{
        "role": "user",
        "content": [
            {"type": "document", "source": { "type": "base64", "media_type": "application/pdf", "data": pdf_data}},
            {"type": "text", "text": "Summarize"}
        ]
    }]
)

print(response.json())

{"input_tokens":31039}


It also counts the text message, but it's a tiny number of tokens.

In [None]:
response

BetaMessageTokensCount(input_tokens=31039)

Without the PDF:

In [None]:
client = anthropic.Anthropic(api_key = CLAUDE_API_KEY)

response = client.beta.messages.count_tokens(
    betas=["token-counting-2024-11-01", "pdfs-2024-09-25"],
    model="claude-3-5-sonnet-20241022",
    messages=[{
        "role": "user",
        "content": [
          {"type": "text", "text": "Summarize"}
        ]
    }]
)

print(response.json())

{"input_tokens":10}


# Prompt Caching:

Let's recall here the completion method with both PDF support and prompt caching features:

In [14]:
client = anthropic.Anthropic(api_key = CLAUDE_API_KEY)
MODEL_NAME = "claude-3-5-sonnet-20241022"

def get_completion(messages, model=MODEL_NAME):
    completion = client.beta.messages.create(
        betas=["pdfs-2024-09-25", "prompt-caching-2024-07-31"],
        model=model,
        max_tokens=8192,
        messages=messages,
        temperature=0,
    )
    return completion

In the following, I'll compare the cost between using and not the prompt caching feature:

## Without Prompt Caching

I'm using the standard completion message, without the caching parameter:

In [None]:
def build_message (query, pdf_data):
  messages = [
      {
          "role": 'user',
          "content": [
              {"type": "document", "source": {"type": "base64", "media_type": "application/pdf", "data": pdf_data}},
              {"type": "text", "text": query}
          ]
      }
    ]
  return messages

I'll send and request Claude 3.5 Sonnet to answer 6 questions based on the BlackRock report:

In [None]:
for idx, query in enumerate(queries):
  display(HTML(f"<p style='color:red;'>Query n° {idx+1}</p>"))
  messages = build_message(query, pdf_data)
  print("--------QUERY---------")
  display(HTML(f"<p style='color:fuchsia;'>--------QUERY---------</p>"))
  display(HTML(f"<p style='color:fuchsia;'>{query}</p>"))
  print()
  completion = get_completion(messages)
  display(HTML(f"<p style='color:green;'>--------ANSWER---------</p>"))
  print(completion.content[0].text)
  print()
  display(HTML(f"<p style='color:blue;'>--------TOKENS COUNT---------</p>"))
  display(HTML(f"<p style='color:blue;'>{completion.usage}</p>"))
  # print(completion.usage)
  print()
  print("--------STOP REASON---------")
  print(completion.stop_reason)
  print("\n\n")

--------QUERY---------





Based on the Q4 2024 investment outlook document, here are the main investment themes discussed:

1. Quality and Breadth in Equities:
- Looking to take advantage of broadening earnings prospects in developed markets
- Maintaining a high-quality tilt and selectivity amid volatility
- Favoring a granular, active approach, especially in Europe
- Continuing to see tailwinds for AI beneficiaries

2. Fixed Income Opportunities:
- Recommending locking in income while rates remain elevated before further rate cuts
- Favoring the belly of the US Treasury curve
- More comfortable extending duration in European bonds
- Preference for EUR credit, especially quality credit, though opportunities exist in high yield

3. Alternative Assets and Diversification:
- Calling for a broader set of diversifiers due to macro uncertainty
- Looking to liquid alternatives to capture potential alpha in volatile markets
- Seeing strategic opportunities to address portfolio underweights in private markets
- Bonds re


--------STOP REASON---------
end_turn





--------QUERY---------





Based on the document, here are the key takeaways from BlackRock's Q4 2024 investment strategies:

1. Equity Strategy:
- Focus on quality and breadth in equities
- Taking advantage of broadening earnings prospects in developed markets
- Maintaining high-quality tilt and selectivity amid volatility
- Strong opportunity set seen in Europe
- Continued positive outlook for AI beneficiaries

2. Fixed Income Strategy:
- Recommends locking in income while rates remain elevated
- Favors the belly of the US Treasury curve
- More comfortable extending duration in European bonds
- Preference for EUR credit over USD credit
- Focus on quality in credit markets, though opportunities exist in high yield

3. Alternative Assets/Diversifiers:
- Increased focus on alternative assets for diversification
- Looking to liquid alternatives to capture potential alpha in volatile markets
- Strategic opportunity to increase allocation to private markets
- Gold seen as potential hedge against geopolitical risks
-


--------STOP REASON---------
end_turn





--------QUERY---------





According to the document, several sectors are expected to benefit from the AI build-out:

1. Utilities - The document notes that utilities are well-positioned to benefit from AI's expansion due to:
- The massive power requirements of AI infrastructure (noting that a data center using a 100k GPU cluster would require a small power plant to run)
- The sector's strong tilt toward public infrastructure needed to support AI development
- Power capacity requirements putting a cap on AI growth

2. Infrastructure - Particularly public infrastructure which is mentioned as being well-positioned for the AI-driven demand

3. Sustainable Energy - The document indicates sustainable energy exposures look well-positioned due to:
- The significant power demands of AI
- Advancing technology
- Declining costs 
- Supportive policy environment

The document emphasizes that markets may be underestimating the scale of capital expenditure and power needs required for the AI build-out phase, particularly as c


--------STOP REASON---------
end_turn





--------QUERY---------





Based on the document, here are the key potential investment impacts of the upcoming US presidential election:

1. Trade Policy Implications:
- Both candidates are likely to pursue additional export controls, especially in advanced technology
- Harris is expected to maintain current tariff policies with potential targeted China tariffs
- Trump proposes more aggressive tariffs (60% on China, 10-20% broad tariffs)
- Increased protectionism under either administration could contribute to structural inflation

2. Market Volatility:
- Election uncertainty could lead to increased market volatility heading into November
- Alpha-seeking strategies could help capitalize on stock-level dispersion during volatile periods
- UK equities are suggested as a potential hedge against trade rhetoric, given their low beta to shifts in global trade

3. Fiscal Policy:
- Neither candidate has prioritized addressing the budget deficit
- Analysis suggests either candidate could pursue fiscally expansionary pol


--------STOP REASON---------
end_turn





--------QUERY---------





RateLimitError: Error code: 429 - {'type': 'error', 'error': {'type': 'rate_limit_error', 'message': 'Number of request tokens has exceeded your per-minute rate limit (https://docs.anthropic.com/en/api/rate-limits); see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase.'}}

### **Takeaways**

**RateLimit:**

I have a RateLimitError in the query number 5, because I exceeded tne authorized number of tokens per minute: As the document was not cached, it was sent over and over (31K tokens)

`{'type': 'rate_limit_error', 'message': 'Number of request tokens has exceeded your per-minute rate limit; see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later....`

**Cost and Number of tokens:**

*   I have done 4 calls with 125542 tokens. ==> which corresponds to 31 385 tokens per call (Coherent with the tokens counting we performed before).
*   This costs me 0.40\$==> **0.10\$/request**.
*   For each request I'm sending 31050 tokens and getting 370 tokens output.

## With Prompt Caching




You need to add this prompt caching parameter to your message:
`"cache_control": {"type": "ephemeral"}`

In [22]:
def build_message_prompt_caching (query, pdf_data):
  messages = [
      {
          "role": 'user',
          "content": [
              {"type": "document", "source": {"type": "base64", "media_type": "application/pdf", "data": pdf_data},"cache_control": {"type": "ephemeral"}},
              {"type": "text", "text": query}
          ]
      }
    ]
  return messages

First, I'll send only the first query, to trigger the caching (you can send all the queries at once), to show the completion.usage object:

In [23]:
for idx, query in enumerate(queries[:1]):
  display(HTML(f"<p style='color:red;'>Query n° {idx+1}</p>"))
  messages = build_message_prompt_caching(query, pdf_data)
  print("--------QUERY---------")
  display(HTML(f"<p style='color:fuchsia;'>--------QUERY---------</p>"))
  display(HTML(f"<p style='color:fuchsia;'>{query}</p>"))
  print()
  completion = get_completion(messages)
  display(HTML(f"<p style='color:green;'>--------ANSWER---------</p>"))
  print(completion.content[0].text)
  print()
  display(HTML(f"<p style='color:blue;'>--------TOKENS COUNT---------</p>"))
  display(HTML(f"<p style='color:blue;'>{completion.usage}</p>"))
  # print(completion.usage)
  print()
  print("--------STOP REASON---------")
  print(completion.stop_reason)
  print("\n\n")

--------QUERY---------





Based on the Q4 2024 investment outlook document, here are the main investment themes discussed:

1. Quality and Breadth in Equities:
- Looking to take advantage of broadening earnings prospects in developed markets
- Maintaining a high-quality tilt and selectivity amid volatility
- Favoring a granular, active approach, especially in Europe
- Continuing to see tailwinds for AI beneficiaries

2. Fixed Income Opportunities:
- Recommending locking in income while rates remain elevated before further rate cuts
- Favoring the belly of the US Treasury curve
- More comfortable extending duration in European bonds
- Preference for EUR credit, especially quality credit, though opportunities exist in high yield

3. Alternative Assets and Diversification:
- Calling for a broader set of diversifiers due to macro uncertainty
- Looking to liquid alternatives to capture potential alpha in volatile markets
- Seeing strategic opportunities to address portfolio underweights in private markets
- Bonds re


--------STOP REASON---------
end_turn






```
--------TOKENS COUNT---------

BetaUsage(cache_creation_input_tokens=31019, cache_read_input_tokens=0, input_tokens=34, output_tokens=374)

```

As you can see the prompting cache feature has created a cache of 31019 tokens of the document in *cache_creation_input_tokens*.


- In the following requestes this cache will be used (during 5 mins).

- Pay attention to the paragraph "--------TOKENS COUNT---------"



In [24]:
for idx, query in enumerate(queries[1:]):
  display(HTML(f"<p style='color:red;'>Query n° {idx+1}</p>"))
  messages = build_message_prompt_caching(query, pdf_data)
  print("--------QUERY---------")
  display(HTML(f"<p style='color:fuchsia;'>--------QUERY---------</p>"))
  display(HTML(f"<p style='color:fuchsia;'>{query}</p>"))
  print()
  completion = get_completion(messages)
  display(HTML(f"<p style='color:green;'>--------ANSWER---------</p>"))
  print(completion.content[0].text)
  print()
  display(HTML(f"<p style='color:blue;'>--------TOKENS COUNT---------</p>"))
  display(HTML(f"<p style='color:blue;'>{completion.usage}</p>"))
  # print(completion.usage)
  print()
  print("--------STOP REASON---------")
  print(completion.stop_reason)
  print("\n\n")

--------QUERY---------





Based on the document, here are the key takeaways from BlackRock's Q4 2024 investment strategies:

1. Equity Strategy:
- Focus on quality and breadth in equities
- Taking advantage of broadening earnings prospects in developed markets
- Maintaining high-quality tilt and selectivity amid volatility
- Strong opportunity set seen in Europe
- Continued positive outlook for AI beneficiaries

2. Fixed Income Strategy:
- Recommends locking in income while rates remain elevated
- Favors the belly of the US Treasury curve
- More comfortable extending duration in European bonds
- Preference for EUR credit over USD credit
- Focus on quality in credit markets, though opportunities exist in high yield

3. Alternative Assets/Diversifiers:
- Increased focus on alternative assets for diversification
- Looking to liquid alternatives to capture potential alpha in volatile markets
- Strategic opportunity to increase allocation to private markets
- Gold seen as potential hedge against geopolitical risks
-


--------STOP REASON---------
end_turn





--------QUERY---------





According to the document, BlackRock identifies several key sectors that are expected to benefit from the AI build-out:

1. Utilities - The document notes that utilities are well-positioned to benefit from AI's expansion due to:
- The massive power requirements of AI infrastructure (noting that a data center using a 100k GPU cluster would require a small power plant to run)
- The sector's strong tilt toward public infrastructure needed to support AI development
- Power capacity requirements putting a cap on AI growth

2. Infrastructure - Particularly:
- Public infrastructure that will be needed to support AI development
- Infrastructure required to power the vast AI expansion
- Listed infrastructure exposures that provide liquid access to infrastructure sub-sectors

3. Sustainable Energy - The document mentions sustainable energy exposures are positioned to benefit due to:
- The significant power demands of AI
- Advancing technology
- Declining costs
- Supportive policy environment

Th


--------STOP REASON---------
end_turn





--------QUERY---------





Based on the document, here are the key potential investment impacts of the upcoming US presidential election:

1. Trade Policy Implications:
- Both candidates are likely to pursue additional export controls, especially in advanced technology
- Harris is expected to maintain current tariff policies with potential targeted China tariffs
- Trump proposes more aggressive tariffs (60% on China, 10-20% broad tariffs)
- Increased protectionism under either administration could contribute to structural inflation

2. Market Volatility:
- Election uncertainty could lead to increased market volatility heading into November
- Alpha-seeking strategies could help capitalize on stock-level dispersion during volatile periods
- UK equities are suggested as a potential hedge against trade rhetoric, given their low beta to shifts in global trade

3. Fiscal Policy:
- Neither candidate has prioritized addressing the budget deficit
- Analysis suggests either candidate could pursue fiscally expansionary pol


--------STOP REASON---------
end_turn





--------QUERY---------





Based on the document, BlackRock views trade policies and economic fragmentation as key structural factors that could keep inflation higher over the medium term. Specifically:

1. Trade Policy Impact:
- Both US presidential candidates are likely to pursue additional export controls, especially in advanced technology
- Under Harris, they expect maintenance of status quo with potential for targeted China tariffs
- Under Trump, they note proposed 60% tariffs on China and 10-20% broad tariffs would be "a major escalation"

2. Economic Fragmentation:
- The document states that "Increased protectionism under either administration reinforces geopolitical and economic fragmentation"
- This fragmentation is cited as "one of the structural factors we see keeping inflation higher over the medium term"

3. Related Factors:
- They note reduced legal immigration under either administration could impact labor markets
- The combination of trade restrictions and economic fragmentation appears to contri


--------STOP REASON---------
end_turn





--------QUERY---------





Based on the document, BlackRock suggests several key approaches to mitigate geopolitical risks in portfolios:

1. Gold Exposure:
- They view gold as a potential hedge against geopolitical risk
- Note strong demand and sentiment shift with $7.8B added to gold exposures globally since May
- Suggest using physical gold exposure to bolster portfolio resilience

2. Quality Focus:
- Maintain focus on high-quality exposures at the core of portfolios
- Recommend high-conviction alpha strategies in both ETF and mutual fund formats
- Quality factor remains the most popular factor this year with $24.9B in flows

3. Diversification Through Alternatives:
- Recommend liquid alternatives as an effective hedge against volatility spikes and economic shocks
- Suggest macro alternatives strategies that position long-short across countries
- Look to capture idiosyncratic risk through alternative investments

4. Regional Positioning:
- Favor UK equities as a potential insulator against trade tensions, not


--------STOP REASON---------
end_turn





## **Takeaways**

Here is the completion.usage of the second query in the list (which corresponds to query 1):
```
--------TOKENS COUNT---------

BetaUsage(cache_creation_input_tokens=0, cache_read_input_tokens=31019, input_tokens=38, output_tokens=378)

```



We can see clearly that the cache tokens were used with **cache_read_input_tokens=31019**

**Cost and Number of tokens:**

The calls with prompt caching costs me 0.19\$ for the 6 calls
Which corresponds to 0.0316$ per call (vs 0.10\$ without prompt caching).
I've sent 188326 tokens for the 6 queries ==> average 31 387 tokens per call, including the pdf and the query.

However, since the PDF tokens were cached, they weren’t processed again when resent, which helped lower the cost.

**For my humbe use case, using prompt caching reduced my cost by almost 70%!**