Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust mmap logic for cuda windows for faster model load #5105

Merged
merged 1 commit into from
Jun 18, 2024

Conversation

dhiltgen
Copy link
Collaborator

On Windows, recent llama.cpp changes make mmap slower in most cases, so default to off. This also implements a tri-state for use_mmap so we can detect the difference between a user provided value of true/false, or unspecified.

F16KV bool `json:"f16_kv,omitempty"`
LogitsAll bool `json:"logits_all,omitempty"`
VocabOnly bool `json:"vocab_only,omitempty"`
UseMMap TriState `json:"use_mmap,omitempty"`
Copy link
Member

@jmorganca jmorganca Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need TriState here? I think false is a good default for use_mmap and then we can enable it optimistically on platforms where it will speed up loading (macOS, CPU, etc)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The catch is we want automatic logic to toggle mmap that varies across platforms, but we have a way for the user to enable or set this, so I need to know if it's explicitly false, explicitly true, or unspecified by the user, that way the automatic logic can be disabled if the user told us exactly what they want.

Copy link
Member

@jmorganca jmorganca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good although you have some linter errors

On Windows, recent llama.cpp changes make mmap slower in most
cases, so default to off.  This also implements a tri-state for
use_mmap so we can detect the difference between a user provided
value of true/false, or unspecified.
@dhiltgen dhiltgen merged commit c9c8c98 into ollama:main Jun 18, 2024
12 checks passed
@dhiltgen dhiltgen deleted the cuda_mmap branch June 18, 2024 00:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants