-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Closed
Labels
bug-unconfirmedmedium severityUsed to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)
Description
What happened?
Ohayo gozaimasu, everyone! (・﹏・) Anon on 4chan has a tiny problem that I hope someone can help with! When Anon tried to use the llama-quantize function, it didn't work when using capital letters. But when Anon used all lowercase letters, it worked perfectly fine! (T^T)
Here's what happened:
When Anon tried using capital letters:
llama-quantize --output-tensor-type F16 --token-embedding-type F16 Mistral-7B-Instruct-v0.3-F16.gguf Q2_K
It just... did nothing with those arguments! No errors, just silence. (´•̥̥̣ `•) Just a normal Q2_K.
But when Anon used lowercase letters:
llama-quantize --output-tensor-type f16 --token-embedding-type f16 Mistral-7B-Instruct-v0.3-F16.gguf Q2_K
It worked perfectly! (≧ω≦)
I reproduced this on my machine and it was bugged too. Could someone please look into this? That anon wants to be able to use capital letters! (T_T)
Name and Version
llama-cli.exe --version
version: 3789 (d39e267)
built with MSVC 19.40.33811.0 for x64
What operating system are you seeing the problem on?
No response
Relevant log output
BIG LETTERS:
[ 1/ 291] token_embd.weight - [ 4096, 32768, 1, 1], type = f16, converting to q2_K .. size = 256.00 MiB -> 42.00 MiB
...
[ 204/ 291] output.weight - [ 4096, 32768, 1, 1], type = f16, converting to q6_K .. size = 256.00 MiB -> 105.00 MiB
...
llama_model_quantize_internal: model size = 13825.02 MB
llama_model_quantize_internal: quant size = 2596.02 MBsmall letters:
[ 1/ 291] token_embd.weight - [ 4096, 32768, 1, 1], type = f16, size = 256.000 MB
...
[ 204/ 291] output.weight - [ 4096, 32768, 1, 1], type = f16, size = 256.000 MB
...
llama_model_quantize_internal: model size = 13825.02 MB
llama_model_quantize_internal: quant size = 2961.02 MB
quasar-of-mikus
Metadata
Metadata
Assignees
Labels
bug-unconfirmedmedium severityUsed to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)