Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle models with divergent layer sizes #5117

Merged
merged 1 commit into from
Jun 18, 2024

Conversation

dhiltgen
Copy link
Collaborator

@dhiltgen dhiltgen commented Jun 18, 2024

The recent refactoring of the memory prediction assumed all layers are the same size, but for some models (like deepseek-coder-v2) this is not the case, so our predictions were significantly off.

Without the fix:

time=2024-06-18T11:03:42.708-07:00 level=INFO source=memory.go:303 msg="offload to metal" layers.requested=-1 layers.model=28 layers.offload=28 layers.split="" memory.available="[96.0 GiB]" memory.required.full="2.4 GiB" memory.required.partial="2.4 GiB" memory.required.kv="432.0 MiB" memory.required.allocations="[2.4 GiB]" memory.weights.total="1.6 GiB" memory.weights.repeating="1.4 GiB" memory.weights.nonrepeating="164.1 MiB" memory.graph.full="72.0 MiB" memory.graph.partial="72.0 MiB"

With the fix:

time=2024-06-18T11:02:47.707-07:00 level=INFO source=memory.go:309 msg="offload to metal" layers.requested=-1 layers.model=28 layers.offload=28 layers.split="" memory.available="[96.0 GiB]" memory.required.full="9.2 GiB" memory.required.partial="9.2 GiB" memory.required.kv="432.0 MiB" memory.required.allocations="[9.2 GiB]" memory.weights.total="8.4 GiB" memory.weights.repeating="8.3 GiB" memory.weights.nonrepeating="164.1 MiB" memory.graph.full="72.0 MiB" memory.graph.partial="72.0 MiB"

Partial fix for #5113 but we'll need additional graph updates...

The recent refactoring of the memory prediction assumed all layers
are the same size, but for some models (like deepseek-coder-v2) this
is not the case, so our predictions were significantly off.
@dhiltgen dhiltgen merged commit 26d0bf9 into ollama:main Jun 18, 2024
12 checks passed
@dhiltgen dhiltgen deleted the fix_prediction branch June 18, 2024 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants