Fix weight decay exclusions in `run_*_no‑trainer.py` examples by casinca · Pull Request #42769 · huggingface/transformers

casinca · 2025-12-10T12:31:18Z

What does this PR do?

I'm re-using the more robust logic from trainer.py for the run_*_no_trainer.py files

https://github.com/huggingface/transformers/blob/471d7ce9abbb3bc1b3bab673367378f9dbc3caac/src/transformers/trainer.py#L1199C1-L1201C32

There are like 10 others which would require this change, some also had capital LayerNorm.weight (vs "layer_norm.weight") in no_decay = ["bias", "LayerNorm.weight"]

But before propagating to other run_*_no_trainer.py I prefer to make sure my changes are acceptable.

@wwt17 also mentioned adding to the list embeddings, I can add nn.Embedding too but this is up to 🤗 to decide.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

I think @zucchini-nlp and @SunMarc (for trainer) may be interested.

casinca · 2026-02-11T13:01:07Z

I forgot about this PR and the issue got closed for inactivity but it still seems relevant.

@Rocketknight1 👋
I think at the time you guys were busy with 5.0 and this PR was overlooked.
Could you take a look and see if it sounds good to you? or could you redirect me to the appropriate reviewer? I'm not sure my initial tagging was adequate. Thanks.

…arameters

Rocketknight1 · 2026-02-11T14:22:20Z

@bot /style

github-actions · 2026-02-11T14:22:58Z

Style fix bot fixed some files and pushed the changes.

Rocketknight1

Yes, this looks good, and sorry for the delay! The change is definitely an improvement, and I'm fine with doing something similar for the other no_trainer files if you want. We should probably just merge this PR first, though - is there anything else you want to add before I do?

casinca · 2026-02-11T14:51:45Z

Yes, this looks good, and sorry for the delay! The change is definitely an improvement, and I'm fine with doing something similar for the other no_trainer files if you want. We should probably just merge this PR first, though - is there anything else you want to add before I do?

Thanks for coming back to this. I'm fine as it is, I will open a separate PR for the other ones.

HuggingFaceDocBuilderDev · 2026-02-12T14:25:30Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

casinca added 2 commits February 11, 2026 14:22

fix(run_*_no_trainer.py): re-using trainer.py way to handle decay p…

676b553

…arameters

added back useful comment

df9f954

Rocketknight1 force-pushed the fix-no-decay-retrieval-in-trainer-examples branch from 01aaede to df9f954 Compare February 11, 2026 14:22

Apply repo consistency fixes

4871f32

Rocketknight1 approved these changes Feb 11, 2026

View reviewed changes

Merge branch 'main' into fix-no-decay-retrieval-in-trainer-examples

70a94da

casinca mentioned this pull request Feb 12, 2026

fix: Better weight decay exclusion in run_*_no‑trainer.py examples #43947

Merged

5 tasks

Rocketknight1 approved these changes Feb 12, 2026

View reviewed changes

Rocketknight1 enabled auto-merge (squash) February 12, 2026 14:17

Rocketknight1 merged commit 2caa05d into huggingface:main Feb 12, 2026
16 checks passed

casinca deleted the fix-no-decay-retrieval-in-trainer-examples branch February 12, 2026 16:25

casinca mentioned this pull request Feb 13, 2026

Excluding weight decay not working properly on most LMs #42754

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix weight decay exclusions in `run_*_no‑trainer.py` examples#42769

Fix weight decay exclusions in `run_*_no‑trainer.py` examples#42769
Rocketknight1 merged 4 commits intohuggingface:mainfrom
casinca:fix-no-decay-retrieval-in-trainer-examples

casinca commented Dec 10, 2025 •

edited

Loading

Uh oh!

casinca commented Feb 11, 2026

Uh oh!

Rocketknight1 commented Feb 11, 2026

Uh oh!

github-actions bot commented Feb 11, 2026 •

edited

Loading

Uh oh!

Rocketknight1 left a comment

Uh oh!

casinca commented Feb 11, 2026

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

casinca commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

casinca commented Feb 11, 2026

Uh oh!

Rocketknight1 commented Feb 11, 2026

Uh oh!

github-actions bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rocketknight1 left a comment

Choose a reason for hiding this comment

Uh oh!

casinca commented Feb 11, 2026

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

casinca commented Dec 10, 2025 •

edited

Loading

github-actions bot commented Feb 11, 2026 •

edited

Loading