Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preload/Unload Ollama models before prompting #116

Closed
4 tasks done
Munsio opened this issue May 15, 2024 · 6 comments
Closed
4 tasks done

Preload/Unload Ollama models before prompting #116

Munsio opened this issue May 15, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@Munsio
Copy link
Contributor

Munsio commented May 15, 2024

For better measurements we need to preload the ollama model before prompting to it. We also need to cleanup afterwards

Tasks:

@Munsio Munsio added the enhancement New feature or request label May 15, 2024
@Munsio Munsio added this to the v0.5.0 milestone May 15, 2024
@Munsio Munsio self-assigned this May 15, 2024
@bauersimon
Copy link
Member

So the problem is to know when the preloading process has finished, right? An empty request starts the preload... Could it be that this request is completed once the model is finished loading?

@zimmski
Copy link
Member

zimmski commented May 15, 2024

So the problem is to know when the preloading process has finished, right? An empty request starts the preload... Could it be that this request is completed once the model is finished loading?

Would say yes. If the model answers, it is loaded.

@Munsio
Copy link
Contributor Author

Munsio commented May 15, 2024

@bauersimon @zimmski updated the description on "check if model is loaded"

@bauersimon
Copy link
Member

bauersimon commented May 15, 2024

I meant that I believe the API only completes the request once the model is loaded (I think it's happening here). So there is no need for the artificial "respond with y" query. Easy to verify with a server and CURL and two differently sized models. If the response for an empty request from the API is consistently slower for the bigger model, the response must happen only when the loading is done.

@bauersimon
Copy link
Member

Checked again and now I am 100% certain we don't need to send a dummy prompt as the API also responds to an empty request only after the model is loaded. The Ollama API has a response property load_duration and while that property is not returned on an empty request, we can see here how it is computed. And indeed the checkpointLoaded := time.Now() happens directly after the special case where the empty prompt request is handled. Hence, when the empty prompt request is answered, the model is already loaded.

Also tried this with some curl requests and loading qwen:0.5b took the empty request always ~2000ms (plus/minus a few ms) and loading qwen:4b took the empty request always ~2300ms (plus/minus a few ms) to complete.

In comparison, asking quen:0.5b to "respond with y" took 4000ms... so double... probably even worse for larger models.

bauersimon added a commit that referenced this issue May 16, 2024
…e we only measure the raw inference time

Closes #116
@bauersimon bauersimon self-assigned this May 16, 2024
bauersimon added a commit that referenced this issue May 16, 2024
…e we only measure the raw inference time

Closes #116
bauersimon added a commit that referenced this issue May 16, 2024
bauersimon added a commit that referenced this issue May 16, 2024
bauersimon added a commit that referenced this issue May 17, 2024
bauersimon added a commit that referenced this issue May 17, 2024
bauersimon added a commit that referenced this issue May 17, 2024
bauersimon added a commit that referenced this issue May 17, 2024
Munsio pushed a commit that referenced this issue May 23, 2024
Munsio pushed a commit that referenced this issue May 23, 2024
Munsio pushed a commit that referenced this issue May 23, 2024
@bauersimon
Copy link
Member

@Munsio / @ruiAzevedo19 since #121 is merged, this issue is done? Or did you encounter any other things we need to take a look at?

@Munsio Munsio closed this as completed May 27, 2024
Munsio pushed a commit that referenced this issue Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants