-
-
Notifications
You must be signed in to change notification settings - Fork 128
[Platform] Introduce CachedPlatform
#416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
51fa81c to
1fa590a
Compare
bf5a1fe to
5ef4417
Compare
5ef4417 to
cc5f431
Compare
8e15d3d to
46bca63
Compare
914eddf to
8112ab2
Compare
de85ef7 to
e038555
Compare
|
Let's zoom a bit out here, for two reasons:
|
Ollama does a "context caching" and/or a K/V caching, it stores the X latest messages for the model window (or pending tokens to speed TTFT), it's not a cache that returns the generated response if the request already exist.
Well, because that's the one that I use the most and the easiest to implement first but we can integrate it for every platform if that's the question, we just need to use the API contract, both Anthropic and OpenAI already does it natively 🤔 If the question is: Could we implement it at the platform layer for every platform without relying on API calls, well, that's not a big deal to be honest and we could easily integrate it 🙂 |
chr-hertel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are left overs in the bundle config of Ollama and the OllamaResultConverter - see previous comments - those changes are not relevant anymore, right?
|
IMHO, the token usage must be added in a separate PR, it requires to update existing platforms to send the informations in the result. Out of the current scope for this PR IMO. |
4f2ecac to
15948ea
Compare
|
@junaidbinfarooq I created an issue for the |
15948ea to
bd7637d
Compare
OskarStark
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After my comments
bd7637d to
d7a8429
Compare
d7a8429 to
fcf5955
Compare
fcf5955 to
c66e733
Compare
|
Thank you @Guikingone. |
Hi 👋🏻
This PR aim to introduce a caching layer for
Ollamaplatform (like OpenAI, Anthropic and more already does).