Skip to content

Conversation

@ubergarm
Copy link
Contributor

Hardcodes checking number of layers to detect if a model is the lite version of deepseek.

Tested with bf16 and q8_0 version of GigaChat3-10B-A1.8B and discussed realizing it was a lite version similar to DeepSeek-V2-Lite. That model had 27 layers, but GigaChat3 has 26 and that is used to detect the lite variant as discussed here: https://huggingface.co/ai-sage/GigaChat3-10B-A1.8B/discussions/1#691fb161ac024c8eb626ab36

I'd like if anyone else could test. I'll update after testing perplexity to make sure the value looks sane. I haven't uploaded a gguf yet as the template has a parse error and wanted to get it updated before baking it in. That is discussed here: https://huggingface.co/ai-sage/GigaChat3-702B-A36B-preview-bf16/discussions/1

@ubergarm
Copy link
Contributor Author

I added a test Q8_0 gguf here: https://huggingface.co/ubergarm/GigaChat3-10B-A1.8B-GGUF/tree/main

$ export model=/mnt/data/models/ubergarm/GigaChat3-10B-A1.8B-GGUF/GigaChat3-10B-A1.8B-Q8_0.gguf
$ ./build/bin/llama-server \
    --model "$model"\
    --alias ubergarm/GigaChat3-10B-A1.8B-GGUF \
    --ctx-size 32768 \
    --parallel 1 \
    --threads 8 \
    --host 127.0.0.1 \
    --port 8080 \
    --jinja

llama_model_loader: - type  f32:  129 tensors
llama_model_loader: - type q8_0:  285 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q8_0
print_info: file size   = 10.57 GiB (8.51 BPW)

@ubergarm
Copy link
Contributor Author

Oops i need to get rid of an accidently commit file, will force push to fix.

Hardcodes checking number of layers to detect if lite version of deepseek.
@ubergarm
Copy link
Contributor Author

Perplexity seems reasonable:

  • ==> logs/perplexity-GigaChat3-10B-A1.8B-BF16.log <==
    • Final estimate: PPL = 6.7302 +/- 0.04230
  • ==> logs/perplexity-GigaChat3-10B-A1.8B-Q8_0.log <==
    • Final estimate: PPL = 6.7265 +/- 0.04226

Always a bit funky when quants have lower perplexity than the original bf16... though it happens sometimes and in this case the values are very similar within the noise.

Copy link
Collaborator

@CISC CISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment describing which models we are detecting in case this needs to be fine-tuned in the future?

deepseek lite variants include DeepSeek-V2-Lite, GigaChat3-10B-A1.8B
@CISC
Copy link
Collaborator

CISC commented Nov 21, 2025

Thank you!

@CISC CISC merged commit 23bc779 into ggml-org:master Nov 21, 2025
1 check passed
@ubergarm ubergarm deleted the ug/gigachat3-lite branch November 21, 2025 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model Model specific

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants