improve train loss metric logged in examples #399

pacman100 · 2022-05-26T06:42:36Z

What does this PR do?

Train loss being logged wasn't normalized and as such wasn't intuitive to understand. This also made it difficult to compare train loss between different tools such as comparing train loss from Trainer with that of Accelerate. This PR normalizes the train_loss per epoch to make is more intuitive and comparable.

HuggingFaceDocBuilderDev · 2022-05-26T06:45:33Z

The documentation is not available anymore as the PR was closed or merged.

sgugger

Thanks for fixing!

benjpau · 2022-08-02T08:02:55Z

@pacman100 @sgugger In a multi-gpu environment, does the total_loss here represent the loss of all training data in the entire epoch or just the loss of training data on the main process?

I tried to print the result of total_loss on each process, and it seems that accelerator.log() only records the total_loss on the main process. So, please, do I need to use accelerator.gather() to get the training loss on the whole epoch? I hope I can get your confirmation, many thanks!

pacman100 · 2022-08-02T10:38:54Z

Hello @benjpau , yes, you are correct, you would need to gather to get the total epoch loss. Will raise a PR to fix the examples to reflect the same.

improve metrics logged in examples

de92551

pacman100 requested a review from sgugger May 26, 2022 06:42

pacman100 changed the title ~~improve metrics logged in examples~~ improve train loss metric logged in examples May 26, 2022

sgugger approved these changes May 26, 2022

View reviewed changes

pacman100 merged commit d1f7f99 into huggingface:main May 26, 2022

pacman100 deleted the smangrul/improve-examples branch May 26, 2022 11:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve train loss metric logged in examples #399

improve train loss metric logged in examples #399

pacman100 commented May 26, 2022

HuggingFaceDocBuilderDev commented May 26, 2022 •

edited

sgugger left a comment

benjpau commented Aug 2, 2022 •

edited

pacman100 commented Aug 2, 2022

improve train loss metric logged in examples #399

improve train loss metric logged in examples #399

Conversation

pacman100 commented May 26, 2022

What does this PR do?

HuggingFaceDocBuilderDev commented May 26, 2022 • edited

sgugger left a comment

Choose a reason for hiding this comment

benjpau commented Aug 2, 2022 • edited

pacman100 commented Aug 2, 2022

HuggingFaceDocBuilderDev commented May 26, 2022 •

edited

benjpau commented Aug 2, 2022 •

edited