-
-
Notifications
You must be signed in to change notification settings - Fork 102
Description
Many LLM providers support adding caching markers which are used to identify repeated messages in the inputs.
It is primarily a way to save on token usage costs and accelerate responses.
For agents with large context it is critical, since without it the costs are too high.
Caching is done by the provider, the client only marks which messages can be cached.
Anthropic has a good article on it:
https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching
Gemini seems to do it automatically as per:
Open ai also does it automatically for longer prompts but it can be explicitly configured too, see:
https://platform.openai.com/docs/guides/prompt-caching
Implementation wise it's a input processor which adds these markers, but I'm not sure if this should be exposed in the Generic Messages API.