CUDA: tighter VRAM scratch size for 65b/70b by JohannesGaessler · Pull Request #2551 · ggml-org/llama.cpp

JohannesGaessler · 2023-08-08T12:05:52Z

This PR is a followup to #2056 . At the time I did not have enough VRAM to properly measure the minimum required VRAM scratch sizes for 65b (and 70b was not yet published). This PR tightens VRAM scratch sizes based on testing. The specific methodology is that I hard-coded VRAM scratch sizes with a granularity of 1 MiB and determined the minimum VRAM scratch size at which perplexity calculations are not being affected. I then added a ~25% margin on top of the minimum. These are the test results that the new numbers are based on:

Model	Context size	Min. VRAM scratch size [MiB]	Delta [MiB]
65b q2_k	512	272	-
65b q2_k	1024	342	70
65b q2_k	1536	486	144
65b q2_k	2048	636	150
65b q2_k	2560	700	64
65b q2_k	3072	764	64
65b q2_k	3584	828	64
65b q2_k	4096	892	64

Model	Context size	Min. VRAM scratch size [MiB]	Delta [MiB]
70b q4_k_m	512	324	-
70b q4_k_m	1024	336	12
70b q4_k_m	1536	402	66
70b q4_k_m	2048	576	174
70b q4_k_m	2560	724	148
70b q4_k_m	3072	788	64
70b q4_k_m	3584	852	64
70b q4_k_m	4096	916	64

When I tested with smaller models the min. required VRAM scratch at some point always started increasing linearly with scratch size (measured up to 8192 context). Because the results are so close I just used the same scratch size for both 65b and 70b.

CUDA: tighter VRAM scratch size for 65b/70b

5d8b765

ggerganov approved these changes Aug 8, 2023

View reviewed changes

JohannesGaessler merged commit acfc547 into ggml-org:master Aug 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA: tighter VRAM scratch size for 65b/70b#2551

CUDA: tighter VRAM scratch size for 65b/70b#2551
JohannesGaessler merged 1 commit intoggml-org:masterfrom
JohannesGaessler:cuda-tighter-65b-70b-scratch

JohannesGaessler commented Aug 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JohannesGaessler commented Aug 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants