Skip to content

Fix weight decay exclusions in run_*_no‑trainer.py examples#42769

Merged
Rocketknight1 merged 4 commits intohuggingface:mainfrom
casinca:fix-no-decay-retrieval-in-trainer-examples
Feb 12, 2026
Merged

Fix weight decay exclusions in run_*_no‑trainer.py examples#42769
Rocketknight1 merged 4 commits intohuggingface:mainfrom
casinca:fix-no-decay-retrieval-in-trainer-examples

Conversation

@casinca
Copy link
Contributor

@casinca casinca commented Dec 10, 2025

What does this PR do?

fixes #42754

I'm re-using the more robust logic from trainer.py for the run_*_no_trainer.py files

https://github.com/huggingface/transformers/blob/471d7ce9abbb3bc1b3bab673367378f9dbc3caac/src/transformers/trainer.py#L1199C1-L1201C32

There are like 10 others which would require this change, some also had capital LayerNorm.weight (vs "layer_norm.weight") in no_decay = ["bias", "LayerNorm.weight"]

But before propagating to other run_*_no_trainer.py I prefer to make sure my changes are acceptable.

@wwt17 also mentioned adding to the list embeddings, I can add nn.Embedding too but this is up to 🤗 to decide.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

I think @zucchini-nlp and @SunMarc (for trainer) may be interested.

@casinca
Copy link
Contributor Author

casinca commented Feb 11, 2026

I forgot about this PR and the issue got closed for inactivity but it still seems relevant.

@Rocketknight1 👋
I think at the time you guys were busy with 5.0 and this PR was overlooked.
Could you take a look and see if it sounds good to you? or could you redirect me to the appropriate reviewer? I'm not sure my initial tagging was adequate. Thanks.

@Rocketknight1 Rocketknight1 force-pushed the fix-no-decay-retrieval-in-trainer-examples branch from 01aaede to df9f954 Compare February 11, 2026 14:22
@Rocketknight1
Copy link
Member

@bot /style

@github-actions
Copy link
Contributor

github-actions bot commented Feb 11, 2026

Style fix bot fixed some files and pushed the changes.

Copy link
Member

@Rocketknight1 Rocketknight1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks good, and sorry for the delay! The change is definitely an improvement, and I'm fine with doing something similar for the other no_trainer files if you want. We should probably just merge this PR first, though - is there anything else you want to add before I do?

@casinca
Copy link
Contributor Author

casinca commented Feb 11, 2026

Yes, this looks good, and sorry for the delay! The change is definitely an improvement, and I'm fine with doing something similar for the other no_trainer files if you want. We should probably just merge this PR first, though - is there anything else you want to add before I do?

Thanks for coming back to this. I'm fine as it is, I will open a separate PR for the other ones.

@Rocketknight1 Rocketknight1 enabled auto-merge (squash) February 12, 2026 14:17
@Rocketknight1 Rocketknight1 merged commit 2caa05d into huggingface:main Feb 12, 2026
16 checks passed
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@casinca casinca deleted the fix-no-decay-retrieval-in-trainer-examples branch February 12, 2026 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Excluding weight decay not working properly on most LMs

3 participants