Adjust mmap logic for cuda windows for faster model load #5105

dhiltgen · 2024-06-17T22:57:41Z

On Windows, recent llama.cpp changes make mmap slower in most cases, so default to off. This also implements a tri-state for use_mmap so we can detect the difference between a user provided value of true/false, or unspecified.

jmorganca · 2024-06-17T23:39:45Z

api/types.go

+	F16KV     bool     `json:"f16_kv,omitempty"`
+	LogitsAll bool     `json:"logits_all,omitempty"`
+	VocabOnly bool     `json:"vocab_only,omitempty"`
+	UseMMap   TriState `json:"use_mmap,omitempty"`


do we need TriState here? I think false is a good default for use_mmap and then we can enable it optimistically on platforms where it will speed up loading (macOS, CPU, etc)

The catch is we want automatic logic to toggle mmap that varies across platforms, but we have a way for the user to enable or set this, so I need to know if it's explicitly false, explicitly true, or unspecified by the user, that way the automatic logic can be disabled if the user told us exactly what they want.

jmorganca

Looks good although you have some linter errors

On Windows, recent llama.cpp changes make mmap slower in most cases, so default to off. This also implements a tri-state for use_mmap so we can detect the difference between a user provided value of true/false, or unspecified.

jmorganca reviewed Jun 17, 2024

View reviewed changes

jmorganca approved these changes Jun 17, 2024

View reviewed changes

Adjust mmap logic for cuda windows for faster model load

1717967

On Windows, recent llama.cpp changes make mmap slower in most cases, so default to off. This also implements a tri-state for use_mmap so we can detect the difference between a user provided value of true/false, or unspecified.

dhiltgen force-pushed the cuda_mmap branch from c01300d to 1717967 Compare June 17, 2024 23:54

dhiltgen merged commit c9c8c98 into ollama:main Jun 18, 2024
12 checks passed

dhiltgen deleted the cuda_mmap branch June 18, 2024 00:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust mmap logic for cuda windows for faster model load #5105

Adjust mmap logic for cuda windows for faster model load #5105

dhiltgen commented Jun 17, 2024

jmorganca Jun 17, 2024 •

edited

Loading

dhiltgen Jun 18, 2024

jmorganca left a comment

Adjust mmap logic for cuda windows for faster model load #5105

Adjust mmap logic for cuda windows for faster model load #5105

Conversation

dhiltgen commented Jun 17, 2024

jmorganca Jun 17, 2024 • edited Loading

Choose a reason for hiding this comment

dhiltgen Jun 18, 2024

Choose a reason for hiding this comment

jmorganca left a comment

Choose a reason for hiding this comment

jmorganca Jun 17, 2024 •

edited

Loading