-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Closed
Labels
bug-unconfirmedmedium severityUsed to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)
Description
What happened?
The lm_head
layer for a Gemma2 LoRA adapter is not converted by convert_lora_to_gguf.py
, and therefore not applied at inference (ruining performance of the adapter).
How to reproduce:
Expand
- LoRA fine-tune Gemma2 with
pytorch
/peft
includinglm_head
in thetarget_modules
param:config = LoraConfig(target_modules=["lm_head"], ...)
- Save the adapter.
- Convert the adapter debugging
then the
python convert_lora_to_gguf.py <adapter folder> --base <base model folder> --outtype f32
lm_head
layer is skipped by this line inconvert_hf_to_gguf.py
(and no error is raised):if name == "lm_head.weight": logger.debug(f"Skipping get tensor {name!r} in safetensors so that convert can end normally.") return []
- Run
llama-cli
to check that indeed no lora layer is applied in the respective line in llama.cpp:./llama-cli -m base/model/path/Base-F32.gguf \ --lora lora/model/path/Lora-F32-LoRA.gguf \ -p "Hello Gemma2" -n 50
Expected behaviour
I think this is a bug because a user might have trained an adapter that is applied to the the lm_head
layer, so skipping it on conversion will destroy the adapter's performance. I think the code should either:
- raise an error saying
Cannot convert Gemma2 adapter with lm_head layer
or
- handle the lm_head layer (although it might be tricky for merging adapters as the
lm_head
layer shares the weights with theembed
layer in Gemma2, probably leading to having to create a new tensor for thelm_head
to merge the adapter to).
Comments
- I think the script
convert_lora_to_gguf.py
was introduced in PR Refactor lora adapter support #8332, so maybe the @ngxson knows if skipping thelm_head
is the desired outcome of if it is actually a bug. Otherwise I'm happy to try figure out why this happens. - This is not the case for, say, Phi3, which converts the
lm_head
lora layer correctly. - I can provide more code/models to reproduce the bug easily if that helps.
Name and Version
version: 3524 (bc0f887)
built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.4.0
What operating system are you seeing the problem on?
MacOS, but it should be a platform-independent problem.
Relevant log output
No response
Metadata
Metadata
Assignees
Labels
bug-unconfirmedmedium severityUsed to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)