Skip to content

Conversation

@sayantan1410
Copy link
Contributor

@sayantan1410 sayantan1410 commented Aug 6, 2024

What does this PR do?

Fix for LR in a distributed training when num_train_epoch is passed

Part of #8384

I have made changes to a single training script only.
Let me know if there are any mistakes, will be glad to fix those
@sayakpaul

@sayakpaul
Copy link
Member

Thank you.

We need to fix the code quality before we can merge. Fixing instructions are available here:
https://github.com/huggingface/diffusers/actions/runs/10272588162/job/28437541407?pr=9103

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sayantan1410
Copy link
Contributor Author

Hi, Fixed the code quality, Can you please re-run the test.
@sayakpaul

@sayakpaul sayakpaul merged commit 8e3affc into huggingface:main Aug 8, 2024
@sayakpaul
Copy link
Member

Thank you for your contribution.

sayakpaul added a commit that referenced this pull request Dec 23, 2024
* fix for lr scheduler in distributed training

* Fixed the recalculation of the total training step section

* Fixed lint error

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants