-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
model.to(xla_device) increases the number of named_parameters #7042
Comments
@qihqi since you are offcall this week, do you have time to follow up this issue? |
This is caused by this line: https://github.com/huggingface/transformers/blob/v4.40.2/src/transformers/models/bart/modeling_bart.py#L1530C1-L1533C79 This line merges 2 parameters together for both encoder and decoder resulting 2 parameters less. So there are 2 parameters that shares the same tensor but has different name. You can print state_dict length with Semantics of When moving to device, tensors are moved using You can get it back by running logic of _tie_weights on
|
Thank you for your response. I'm wondering why |
So it undoes what |
🐛 Bug
Copy model to xla device affects the number of model's parameters.
To Reproduce
Steps to reproduce the behavior:
xla/benchmarks/benchmark_model.py
Expected behavior
len([param for param, value in new_model.named_parameters()])
is expected to return 259Environment
2.3.0-rc12
The text was updated successfully, but these errors were encountered: