Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement '--keep-split' to quantize model into several shards #6688

Merged
merged 6 commits into from
Apr 25, 2024

Conversation

zj040045
Copy link
Contributor

Fix #6548
--keep-split allows quantize to output shards instead of a full model. The number of shards depends on the input model files

@phymbert
Copy link
Collaborator

Thanks. Do you mind to add a tests.sh as we did in #6655

@phymbert phymbert added the split GGUF split model sharding label Apr 17, 2024
@zj040045
Copy link
Contributor Author

@phymbert Done

@phymbert phymbert requested a review from ggerganov April 18, 2024 14:25

This comment was marked as off-topic.

llama.cpp Outdated
LLAMA_LOG_INFO("%s: meta size = %zu bytes\n", __func__, meta_size);
auto weight = ml.get_weight(i);
struct ggml_tensor * tensor = weight->tensor;
if (weight->idx != (ctx_outs.size() - 1) && params->keep_split) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it not safe for future evolution as it assumes writing the tensors the same order they have been read. Could we simply check if weight->idx is not present in ctx_outs and retrieve ctx_out by tensor ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Then model splits writing should follow this logic to support case like "0 0 0 2 2 1 1". Besides, do you think incontinuous order should be considered like "0 0 0 2 1 2 1"?

llama.h Outdated Show resolved Hide resolved
examples/quantize/quantize.cpp Outdated Show resolved Hide resolved
@ggerganov ggerganov merged commit 1966eb2 into ggerganov:master Apr 25, 2024
49 of 59 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
split GGUF split model sharding
Projects
Development

Successfully merging this pull request may close these issues.

Re-quantization of a split gguf file produces "invalid split file"
3 participants