-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preload/Unload Ollama models before prompting #116
Comments
So the problem is to know when the preloading process has finished, right? An empty request starts the preload... Could it be that this request is completed once the model is finished loading? |
Would say yes. If the model answers, it is loaded. |
@bauersimon @zimmski updated the description on "check if model is loaded" |
I meant that I believe the API only completes the request once the model is loaded (I think it's happening here). So there is no need for the artificial "respond with y" query. Easy to verify with a server and CURL and two differently sized models. If the response for an empty request from the API is consistently slower for the bigger model, the response must happen only when the loading is done. |
Checked again and now I am 100% certain we don't need to send a dummy prompt as the API also responds to an empty request only after the model is loaded. The Ollama API has a response property Also tried this with some curl requests and loading In comparison, asking |
…e we only measure the raw inference time Closes #116
…e we only measure the raw inference time Closes #116
@Munsio / @ruiAzevedo19 since #121 is merged, this issue is done? Or did you encounter any other things we need to take a look at? |
For better measurements we need to preload the ollama model before prompting to it. We also need to cleanup afterwards
Tasks:
ollama ps
to check all loaded modelsThe text was updated successfully, but these errors were encountered: