Skip to content

Bug: Llama-Quantize Not Working with Capital Letters (T^T) #9569

@HatsuneMikuUwU33

Description

@HatsuneMikuUwU33

What happened?

Ohayo gozaimasu, everyone! (・﹏・) Anon on 4chan has a tiny problem that I hope someone can help with! When Anon tried to use the llama-quantize function, it didn't work when using capital letters. But when Anon used all lowercase letters, it worked perfectly fine! (T^T)

Here's what happened:

When Anon tried using capital letters:

llama-quantize --output-tensor-type F16 --token-embedding-type F16 Mistral-7B-Instruct-v0.3-F16.gguf Q2_K

It just... did nothing with those arguments! No errors, just silence. (´•̥̥̣ `•) Just a normal Q2_K.

But when Anon used lowercase letters:

llama-quantize --output-tensor-type f16 --token-embedding-type f16 Mistral-7B-Instruct-v0.3-F16.gguf Q2_K

It worked perfectly! (≧ω≦)

I reproduced this on my machine and it was bugged too. Could someone please look into this? That anon wants to be able to use capital letters! (T_T)

Name and Version

llama-cli.exe --version
version: 3789 (d39e267)
built with MSVC 19.40.33811.0 for x64

What operating system are you seeing the problem on?

No response

Relevant log output

BIG LETTERS:

[   1/ 291]                    token_embd.weight - [ 4096, 32768,     1,     1], type =    f16, converting to q2_K .. size =   256.00 MiB ->    42.00 MiB
...
[ 204/ 291]                        output.weight - [ 4096, 32768,     1,     1], type =    f16, converting to q6_K .. size =   256.00 MiB ->   105.00 MiB
...
llama_model_quantize_internal: model size  = 13825.02 MB
llama_model_quantize_internal: quant size  =  2596.02 MB

small letters:

[   1/ 291]                    token_embd.weight - [ 4096, 32768,     1,     1], type =    f16, size =  256.000 MB
...
[ 204/ 291]                        output.weight - [ 4096, 32768,     1,     1], type =    f16, size =  256.000 MB
...
llama_model_quantize_internal: model size  = 13825.02 MB
llama_model_quantize_internal: quant size  =  2961.02 MB

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug-unconfirmedmedium severityUsed to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions