-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saved weights differ from the original model #30543
Comments
One another way to look at it is this:
I get two different weight tensors for this two particular loading methods. |
cc @younesbelkada @pacman100 re PEFT |
I have the same issue here, the model which is merged and unload before saving has different lm_head("wte") weights than the loaded model which is saved from merged model. I am using the |
|
@bezir Thanks for sharing a solution! I'm going to re-open, as it shouldn't be necessary to pass in this argument, but glad there's a work-around which works |
System Info
transformers 4.40.1
peft 0.10.0
Who can help?
@sanchit-gandhi @Rocketknight1 @younesbelkada
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I have fine-tuned a GPT2 Model using SFTTrainer. I merge base model and trained adapters with the code below. I also extended the vocabulary.
I test this model and the results are okay. Then I want to save this merged_model using the code below.
Lastly, I open the saved model with the code below.
The model that I load from the save_path does not work well. It repeats the same token or gives random tokens from the base vocabulary.
Model before Save
Loaded Model:
Now let's look at the weights.
Model Before Save
Loaded Model
I only call two functions save_pretrained then load_pretrained why are the weights different? I tried to change weights after
loading the model, it started to work fine again. Then, I tried to save that model, then the same problem, saved model is different than loaded model.
Expected behavior
The model weights are supposed to be the same.
The text was updated successfully, but these errors were encountered: