Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default Keep Alive environment variable #3094

Merged
merged 4 commits into from Mar 13, 2024
Merged

Conversation

pdevine
Copy link
Contributor

@pdevine pdevine commented Mar 13, 2024

This change adds a new environment variable called OLLAMA_KEEP_ALIVE which sets how long a model will be loaded into memory. It uses the same semantics as the keep_alive parameter in the generate, chat, and embeddings API calls, namely:

  • if set to a positive value, it will default to whatever time was set
  • if set to zero it will unload immediately after generation
  • if set to a negative value it will remain in memory

You can either use a value in seconds (e.g. OLLAMA_KEEP_ALIVE=60 for 60 seconds), or as a duration string (e.g. OLLAMA_KEEP_ALIVE=10m).

This change works with both the API, and with the REPL.

Fixes #2508

@pdevine pdevine merged commit 47cfe58 into main Mar 13, 2024
12 checks passed
@pdevine pdevine deleted the pdevine/defaultkeepalive branch March 13, 2024 20:29
@uxfion
Copy link

uxfion commented Mar 14, 2024

Is this effective on /v1/chat/completions endpoint?

samyfodil pushed a commit to ollama-cloud/ollama-as-wasm-plugin that referenced this pull request Mar 14, 2024
---------

Co-authored-by: Chris-AS1 <8493773+Chris-AS1@users.noreply.github.com>
@pdevine
Copy link
Contributor Author

pdevine commented Mar 14, 2024

@uxfion This is an environment variable for the server, so will work independently of the OpenAI endpoints (i.e. it will unload the model at whatever time you give it, regardless of how the client is access it).

@uxfion
Copy link

uxfion commented Mar 15, 2024

@uxfion This is an environment variable for the server, so will work independently of the OpenAI endpoints (i.e. it will unload the model at whatever time you give it, regardless of how the client is access it).

I mean, when I set OLLAMA_KEEP_ALIVE=-1 and then request through /v1/chat/completions endpoint, it should always save the model in memory, right?

I tested it and found that it works very well. Thank you so much for your amazing work👍👍! We really love ollama😻

@oxaronick
Copy link

oxaronick commented Mar 19, 2024

This is nice to have, thanks. I'm trying it out now.

However, it looks like the client can still override the server. One feature of @Chris-AS1's solution that I liked was that, as a server admin, I could set a value that overrode client-provided keepalives, preventing one person on a team from unloading the model for everyone else.

Do you think the server value should override the client value?

@saffatbokul
Copy link

Thanks so much. This will be very useful.

I agree with @oxaronick that the client should not be able to override the server setting for keep_alive.

byebyebruce pushed a commit to byebyebruce/ollama that referenced this pull request Mar 26, 2024
---------

Co-authored-by: Chris-AS1 <8493773+Chris-AS1@users.noreply.github.com>
Adphi pushed a commit to Adphi/ollama that referenced this pull request Mar 30, 2024
---------

Co-authored-by: Chris-AS1 <8493773+Chris-AS1@users.noreply.github.com>
@BananaAcid
Copy link

BananaAcid commented Apr 5, 2024

The pushed solution should probably be for a var named like OLLAMA_KEEP_ALIVE_DEFAULT … since it will only be looked into, if the JSON req.keep_alive is null. Which does not allow the server to overwrite the var currently. Would be great to have one for overwriting as well.

zhewang1-intc pushed a commit to zhewang1-intc/ollama that referenced this pull request May 13, 2024
---------

Co-authored-by: Chris-AS1 <8493773+Chris-AS1@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OLLAMA_KEEP_ALIVE ENV feature
8 participants