Fix weight decay exclusions in run_*_no‑trainer.py examples#42769
Conversation
|
I forgot about this PR and the issue got closed for inactivity but it still seems relevant. @Rocketknight1 👋 |
01aaede to
df9f954
Compare
|
@bot /style |
|
Style fix bot fixed some files and pushed the changes. |
Rocketknight1
left a comment
There was a problem hiding this comment.
Yes, this looks good, and sorry for the delay! The change is definitely an improvement, and I'm fine with doing something similar for the other no_trainer files if you want. We should probably just merge this PR first, though - is there anything else you want to add before I do?
Thanks for coming back to this. I'm fine as it is, I will open a separate PR for the other ones. |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
What does this PR do?
fixes #42754
I'm re-using the more robust logic from
trainer.pyfor therun_*_no_trainer.pyfileshttps://github.com/huggingface/transformers/blob/471d7ce9abbb3bc1b3bab673367378f9dbc3caac/src/transformers/trainer.py#L1199C1-L1201C32
There are like 10 others which would require this change, some also had capital
LayerNorm.weight(vs"layer_norm.weight") inno_decay = ["bias", "LayerNorm.weight"]But before propagating to other
run_*_no_trainer.pyI prefer to make sure my changes are acceptable.@wwt17 also mentioned adding to the list embeddings, I can add
nn.Embeddingtoo but this is up to 🤗 to decide.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
I think @zucchini-nlp and @SunMarc (for trainer) may be interested.members/contributors who may be interested in your PR.