Docs for cache behaviour #342

benwhalley · 2024-03-16T18:42:17Z

I'm trying to understand how/when LLM calls get cached, especially when using the OpenAI API.
I've looked in the docs, but can't find details.

Ideally, in development, I'd like to be able to cache/memoize calls to the API. For example, if one uses a LMQL programe which requests multiple completions, and changes the later part of the programme but leave the early phase unchanged. In this case it seems like the early requests to the API could be cached? This is especially the case if passing a seed which is now supported by the API.

The text was updated successfully, but these errors were encountered:

lbeurerkellner · 2024-04-21T18:12:57Z

This is how caching is actually implemented with OpenAI. However, with sample I think caching does not apply, since it is typically not seeded. With the seed parameter, it could be adapted accordingly.

To enable caching across multiple calls of the same request, make sure to pass a cache="tokens.pkl" parameter.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs for cache behaviour #342

Docs for cache behaviour #342

benwhalley commented Mar 16, 2024

lbeurerkellner commented Apr 21, 2024

Docs for cache behaviour #342

Docs for cache behaviour #342

Comments

benwhalley commented Mar 16, 2024

lbeurerkellner commented Apr 21, 2024