New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default Keep Alive environment variable #3094
Conversation
Is this effective on /v1/chat/completions endpoint? |
--------- Co-authored-by: Chris-AS1 <8493773+Chris-AS1@users.noreply.github.com>
@uxfion This is an environment variable for the server, so will work independently of the OpenAI endpoints (i.e. it will unload the model at whatever time you give it, regardless of how the client is access it). |
I tested it and found that it works very well. Thank you so much for your amazing work👍👍! We really love ollama😻 |
This is nice to have, thanks. I'm trying it out now. However, it looks like the client can still override the server. One feature of @Chris-AS1's solution that I liked was that, as a server admin, I could set a value that overrode client-provided keepalives, preventing one person on a team from unloading the model for everyone else. Do you think the server value should override the client value? |
Thanks so much. This will be very useful. I agree with @oxaronick that the client should not be able to override the server setting for keep_alive. |
--------- Co-authored-by: Chris-AS1 <8493773+Chris-AS1@users.noreply.github.com>
--------- Co-authored-by: Chris-AS1 <8493773+Chris-AS1@users.noreply.github.com>
The pushed solution should probably be for a var named like OLLAMA_KEEP_ALIVE_DEFAULT … since it will only be looked into, if the JSON req.keep_alive is null. Which does not allow the server to overwrite the var currently. Would be great to have one for overwriting as well. |
--------- Co-authored-by: Chris-AS1 <8493773+Chris-AS1@users.noreply.github.com>
This change adds a new environment variable called
OLLAMA_KEEP_ALIVE
which sets how long a model will be loaded into memory. It uses the same semantics as thekeep_alive
parameter in thegenerate
,chat
, andembeddings
API calls, namely:You can either use a value in seconds (e.g.
OLLAMA_KEEP_ALIVE=60
for 60 seconds), or as a duration string (e.g.OLLAMA_KEEP_ALIVE=10m
).This change works with both the API, and with the REPL.
Fixes #2508