Name and Version
All versions post b5125 in all OSes will be affected by this bug.
Operating systems
Other? (Please let us know in description)
Which llama.cpp modules do you know to be affected?
llama-quantize
Command line
./llama-quantize --tensor-type attn=q4_k gorilla-falcon-7b-hf-v0-F16.gguf gorilla-falcon-7b-hf-v0-Q4_K_M-kaboom.gguf q4_k_m 10
Problem description & steps to reproduce
When using --tensor-type to override a tensor that would have otherwise been quantised in fallback mode, the GGML_ASSERT(tensor->ne[0] % blck_size == 0 && "tensor row size not divisible by block size of new type") error will be triggered.
Problem occurs here because the logic ignores the tensor type was reassigned to a fallback due to its geometry not being an exact multiple of the GGML block_size.
Steps to reproduce:
- Attempt to quantise a model where any of the tensors is not an exact multiple of GGML block_size, whilst at the same time using
--tensor-type to override that tensor (see Command line example above)
- Quantisation fails with an assert error.
PR #14995 fixes this.
Credit to @ddh0 for flagging this bug.
First Bad Commit
71e90e8
Relevant log output