Skip to content

[Platform] Support for configuring prompt caching #337

@aszenz

Description

@aszenz

Many LLM providers support adding caching markers which are used to identify repeated messages in the inputs.

It is primarily a way to save on token usage costs and accelerate responses.

For agents with large context it is critical, since without it the costs are too high.

Caching is done by the provider, the client only marks which messages can be cached.

Anthropic has a good article on it:

https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

Gemini seems to do it automatically as per:

https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-overview#:~:text=By%20default%2C%20Google%20automatically%20caches,accelerate%20responses%20for%20subsequent%20prompts.

Open ai also does it automatically for longer prompts but it can be explicitly configured too, see:

https://platform.openai.com/docs/guides/prompt-caching

Implementation wise it's a input processor which adds these markers, but I'm not sure if this should be exposed in the Generic Messages API.

Metadata

Metadata

Assignees

No one assigned

    Labels

    RFCRFC = Request For Comments (proposals about features that you want to be discussed)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions