This repository has been archived by the owner on Nov 22, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 801
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This pull request was exported from Phabricator. Differential Revision: D18838272 |
AkshatSh
pushed a commit
to AkshatSh/pytext
that referenced
this pull request
Dec 6, 2019
Summary: Pull Request resolved: facebookresearch#1184 Current implementations of warmup in pytext either involve doing warmup and optionally inverse square root decay (TODO) or using polynomial decay (TODO). However, through my experiments, I notice for large batch training a warmup period is helpful on other schedulers as well, especially when trying to mimic results of small batch training on large batches. This diff adds support for `SchedulerWithWarmup`, underneath it holds two schedulers, WarmupScheduler and any other scheduler. After `warmup_steps`, the scheduler will switch from warmup to the specified scheduler. This allows something like Warmup with Expontential Decay. Since the scheduler is built on top of the existing warmup scheduler, any new features that come to that scheduler, will directly be applicable here. Sample Config ``` "SchedulerWithWarmup": { "warmup_scheduler": { "warmup_steps": 500 }, "scheduler": { "ExponentialLR": { "gamma": 0.95 } } } ``` Differential Revision: D18838272 fbshipit-source-id: 8bdb4616987d9030e8b09c073a8ba6d753d0fd8c
cd8dbe0
to
da29086
Compare
This pull request was exported from Phabricator. Differential Revision: D18838272 |
Differential Revision: D18725798 fbshipit-source-id: 131cd0dc983f6a8f5d7ef0a90451238681aef821
Summary: Pull Request resolved: facebookresearch#1184 Current implementations of warmup in pytext either involve doing warmup and optionally inverse square root decay (TODO) or using polynomial decay (TODO). However, through my experiments, I notice for large batch training a warmup period is helpful on other schedulers as well, especially when trying to mimic results of small batch training on large batches. This diff adds support for `SchedulerWithWarmup`, underneath it holds two schedulers, WarmupScheduler and any other scheduler. After `warmup_steps`, the scheduler will switch from warmup to the specified scheduler. This allows something like Warmup with Expontential Decay. Since the scheduler is built on top of the existing warmup scheduler, any new features that come to that scheduler, will directly be applicable here. Sample Config ``` "SchedulerWithWarmup": { "warmup_scheduler": { "warmup_steps": 500 }, "scheduler": { "ExponentialLR": { "gamma": 0.95 } } } ``` Differential Revision: D18838272 fbshipit-source-id: e5aea7434d0563e14f357b8647781ec1ff0b0868
da29086
to
e46b64a
Compare
This pull request was exported from Phabricator. Differential Revision: D18838272 |
This pull request has been merged in a56c761. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
Current implementations of warmup in pytext either involve doing warmup and optionally inverse square root decay (TODO) or using polynomial decay (TODO). However, through my experiments, I notice for large batch training a warmup period is helpful on other schedulers as well, especially when trying to mimic results of small batch training on large batches.
This diff adds support for
SchedulerWithWarmup
, underneath it holds two schedulers, WarmupScheduler and any other scheduler. Afterwarmup_steps
, the scheduler will switch from warmup to the specified scheduler.This allows something like Warmup with Expontential Decay.
Since the scheduler is built on top of the existing warmup scheduler, any new features that come to that scheduler, will directly be applicable here.
Sample Config
Differential Revision: D18838272