-
Notifications
You must be signed in to change notification settings - Fork 446
Description
Problem Statement
Just like the existing cache point arguments for system prompts and tools: cache_prompt
and cache_tools
. Wouldn't it be fairly easy to add in a similar option cache_messages
. Looking at the Bedrock model invocation logs, I see this as a big opportunity for significant cost (and latency!) improvements. Also increases the attractiveness of Bedrock as a model provider.
Proposed Solution
When cache_tools="default"
is passed to the constructor for BedrockModel
the format request method, instead of currently just passing through messages as they are received, iterates through messages to ensure {"cachePoint": {"type": self.config["cache_tools"]}}
snippets are added after each assistant message. Similar to the unpacking done for tools.
Use Case
When using Bedrock model provider, saves both time and money ! Even with current usage of both cache_tools
and cache_prompt
, still 80% of input token usage is not from the cache when it comes to the last few requests of a long-running agent. And the previous messages are mostly the same, just each request 2 messages are being added, with the rest remaining the same and perfect for caching!
Alternatives Solutions
No response
Additional Context
sdk-python/src/strands/models/bedrock.py
Line 190 in df7c327
"messages": messages, |
https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html