Implement '--keep-split' to quantize model into several shards #6688

zj040045 · 2024-04-15T14:38:10Z

Fix #6548
--keep-split allows quantize to output shards instead of a full model. The number of shards depends on the input model files

phymbert · 2024-04-17T17:53:55Z

Thanks. Do you mind to add a tests.sh as we did in #6655

zj040045 · 2024-04-18T14:22:22Z

@phymbert Done

phymbert · 2024-04-18T15:59:49Z

llama.cpp

-    LLAMA_LOG_INFO("%s: meta size = %zu bytes\n", __func__, meta_size);
+        auto weight = ml.get_weight(i);
+        struct ggml_tensor * tensor = weight->tensor;
+        if (weight->idx != (ctx_outs.size() - 1) && params->keep_split) {


I feel it not safe for future evolution as it assumes writing the tensors the same order they have been read. Could we simply check if weight->idx is not present in ctx_outs and retrieve ctx_out by tensor ?

You are right. Then model splits writing should follow this logic to support case like "0 0 0 2 2 1 1". Besides, do you think incontinuous order should be considered like "0 0 0 2 1 2 1"?

llama.h

examples/quantize/quantize.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Implement '--keep-split' to quantize model into several shards

17519e1

phymbert added the split GGUF split model sharding label Apr 17, 2024

Add test script

79bbf42

phymbert requested a review from ggerganov April 18, 2024 14:25

This comment was marked as off-topic.

Sign in to view

phymbert reviewed Apr 18, 2024

View reviewed changes

ggerganov reviewed Apr 19, 2024

View reviewed changes

llama.h Outdated Show resolved Hide resolved

examples/quantize/quantize.cpp Outdated Show resolved Hide resolved

zj040045 and others added 4 commits April 22, 2024 22:27

Update examples/quantize/quantize.cpp

6d66e60

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Split model correctly even if tensor id is out-of-order

d6e453e

Update llama_model_quantize_params

141eb51

Fix preci failures

e0a3679

ggerganov approved these changes Apr 25, 2024

View reviewed changes

ggerganov merged commit 1966eb2 into ggerganov:master Apr 25, 2024
49 of 59 checks passed

ggerganov mentioned this pull request Apr 25, 2024

tests : minor bash stuff #6902

Merged

christianazinn mentioned this pull request Apr 27, 2024

Option to split during conversion #6942

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement '--keep-split' to quantize model into several shards #6688

Implement '--keep-split' to quantize model into several shards #6688

zj040045 commented Apr 15, 2024

phymbert commented Apr 17, 2024

zj040045 commented Apr 18, 2024

This comment was marked as off-topic.

phymbert Apr 18, 2024

zj040045 Apr 22, 2024

Implement '--keep-split' to quantize model into several shards #6688

Implement '--keep-split' to quantize model into several shards #6688

Conversation

zj040045 commented Apr 15, 2024

phymbert commented Apr 17, 2024

zj040045 commented Apr 18, 2024

This comment was marked as off-topic.

phymbert Apr 18, 2024

Choose a reason for hiding this comment

zj040045 Apr 22, 2024

Choose a reason for hiding this comment