-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HF's llama implementation is different from meta-llama's #30872
Comments
? The llama2 code is subject to a code licence. No one is allowed to use the original code, which is why we have a different version.
See this: https://github.com/meta-llama/llama/blob/main/LICENSE |
@ArthurZucker I can expand on this a bit, because there is an actual difference in the implementations. The difference is whether you treat sequential entries of the embeddings as (real, imaginary), or you treat the first half as real, and the second half as imaginary. I think this has to mean that weights trained with one version must be incompatible with the other version? This has come up over in torchtitan (as linked in the OP of this issue), because I've been trying to load the HF llama weights and using them with the torchtitan codebase. However, these weights only function if I use the first half/second half implementation, such as the one used in HF transformers. |
Being able to load doesn’t mean the implementations are the same. The difference in the implementations will cause the output to be different, given the same input. I'm a bit confused here... Please correct me if I'm wrong cc: @rlrs |
Okay, so this has been cleared up, I think. The HF weights are indeed different: transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py Lines 152 to 154 in 481a957
Glad to find out that it is a simple weight permutation. This should probably be clarified somewhere. |
@rlrs I probably should have specify that yes, this works because we permute the weights. But if you don't and still use the cos/sin formulation you still have a different implementation. You just do the permutation on the fly of the q and k which costs a lot. Pretty sure it is clarified that you need a conversion script to go from the orginal model to @tianyu-l I don't understand the confusion, if the implementation is different, but the produced output, meaning the generations are correct, where is the problem?
this is just completely wrong, and unsourced. When we add a model to |
Thanks for the clarification! When I said different implementations, I meant different implementations that leads to different results on the same input. (I thought this was very clear from the context but it seems not, my bad.) I agree that as long as the outputs match, implementations don't matter the most. |
cool, then we are on the same page! 🤗 |
System Info
E.g. the rotary embedding implementations are quite different.
I'm wondering
Here are the code pointers:
HF:
transformers/src/transformers/models/llama/modeling_llama.py
Line 184 in 15c74a2
meta-llama: https://github.com/meta-llama/llama3/blob/14aab0428d3ec3a9596f1dea06d9c564f9c0e35f/llama/model.py#L65
Who can help?
@ArthurZucker @gante
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py
Expected behavior
consistent implementation of llama models
The text was updated successfully, but these errors were encountered: