-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for max_loaded_maps and num_parallel variables/parameter #11225
Comments
Hi @jars101 ,
|
Thank you @sgwhat , most recent versions of ollama do support both OLLAMA_MAX_LOADED_MAPS and OLLAMA_NUM_PARALELL for linux and windows. Running ollama through cuda(ipex-llm) does not seem to keep the settings since for every request on same model it reloads the model into memory. This behaviour on ollama for windows (standalone) does not occur. llm-cpp snippet log: base) C:\Users\Admin>conda activate llm-cpp (llm-cpp) C:\Users\Admin>ollama serve |
My bad, OLLAMA_NUM_PARALELL does work but OLLAMA_MAX_LOADED_MAPS does not. I went ahead and deployed a new installation of llm-cpp+ollama and I see that now i can make use of both variables. Also, ollama ps is available as well. The only problem I see when setting OLLAMA_KEEP_ALIVE to 600 seconds for instance is the following error:
Further removing OLLAMA_KEEP_ALIVE and letting it be default of 5miunutes, im observiing the same issue. |
Could you please share the output of |
It does not seem that ollama running on ipex-llm supports the most recent max_loaded_maps and num_parallel variables/parameters. Is it supported in current ollama version under llama-cpp? how does one enables it? thanks
The text was updated successfully, but these errors were encountered: