-
Notifications
You must be signed in to change notification settings - Fork 427
Description
Problem Statement
Developers currently face cost/latency challenges when building applications that require sequential LLM operations on the same context.
When an agent processes a large amount of data and needs to perform additional operations like structured output formatting, the SDK forces a complete reprocessing of the entire message history. For example, if an agent analyzes 100,000 tokens of content and then needs to generate a structured output, the system must process those same 100,000 tokens again, doubling the token consumption and associated costs.
This makes it difficult or impossible to build cost-effective applications that require both long-running analysis and structured outputs, as there is no built-in way to cache or efficiently reuse the already processed context.
Message caching through CachePoint is already supported. But, this addresses caching only at the start of a message. If an agent generates a large number of tokens during its execution there is no easy way to set cache points within the run. It is technically possible with Hooks, but this is a common scenario that should be made easier by the SDK.
Proposed Solution
No response
Use Case
This is useful when structured output is needed after an Agent generated many tokens during its run.
Alternatives Solutions
No response
Additional Context
No response