[Platform] Support for configuring prompt caching

Many LLM providers support adding caching markers which are used to identify repeated messages in the inputs.

It is primarily a way to save on token usage costs and accelerate responses. 

For agents with large context it is critical, since without it the costs are too high.

Caching is done by the provider, the client only marks which messages can be cached.

Anthropic has a good article on it:

https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

Gemini seems to do it automatically as per: 

https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-overview#:~:text=By%20default%2C%20Google%20automatically%20caches,accelerate%20responses%20for%20subsequent%20prompts.

Open ai also does it automatically for longer prompts but it can be explicitly configured too, see:

https://platform.openai.com/docs/guides/prompt-caching


Implementation wise it's a input processor which adds these markers, but I'm not sure if this should be exposed in the Generic Messages API. 




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Platform] Support for configuring prompt caching #337

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Platform] Support for configuring prompt caching #337

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions