Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose Consecutive Hysteresis to Users #3553

Merged
merged 10 commits into from May 25, 2023

Conversation

Quentin-Anthony
Copy link
Contributor

There's already a nice consecutive_hysteresis feature in the DynamicLossScaler that replenishes the hysteresis whenever a non-overflowing iteration is encountered. This is useful for training runs that periodically face instabilities or bad samples.

This PR exposes this existing feature to users via a new config option consecutive_hysteresis, which is added to the fp16 dict like:

"fp16": {
    "enabled": true,
    "loss_scale": 0,
    "loss_scale_window": 1000,
    "hysteresis": 2,
    "consecutive_hysteresis": true,
    "min_loss_scale": 1
}

So if "consecutive_hysteresis": true, "hysteresis: 2", and every 2k iterations (outside the loss window) there's a single loss overflow, we'd restore the hysteresis to 2 after each overflow and won't update the loss scale. Alternatively, if "consecutive_hysteresis": false, we'd update the loss scale after every hysteresis=2 loss overflows.

@jeffra jeffra merged commit 0411a9f into microsoft:master May 25, 2023
19 checks passed
molly-smith pushed a commit that referenced this pull request Jun 23, 2023
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants