Scheduler with Warmup #1184

AkshatSh · 2019-12-06T08:12:53Z

Summary:
Current implementations of warmup in pytext either involve doing warmup and optionally inverse square root decay (TODO) or using polynomial decay (TODO). However, through my experiments, I notice for large batch training a warmup period is helpful on other schedulers as well, especially when trying to mimic results of small batch training on large batches.

This diff adds support for SchedulerWithWarmup, underneath it holds two schedulers, WarmupScheduler and any other scheduler. After warmup_steps, the scheduler will switch from warmup to the specified scheduler.

This allows something like Warmup with Expontential Decay.

Since the scheduler is built on top of the existing warmup scheduler, any new features that come to that scheduler, will directly be applicable here.

Sample Config

"SchedulerWithWarmup": {
  "warmup_scheduler": {
    "warmup_steps": 500
  },
  "scheduler": {
    "ExponentialLR": {
      "gamma": 0.95
    }
  }
}

Differential Revision: D18838272

facebook-github-bot · 2019-12-06T08:13:09Z

This pull request was exported from Phabricator. Differential Revision: D18838272

Summary: Pull Request resolved: facebookresearch#1184 Current implementations of warmup in pytext either involve doing warmup and optionally inverse square root decay (TODO) or using polynomial decay (TODO). However, through my experiments, I notice for large batch training a warmup period is helpful on other schedulers as well, especially when trying to mimic results of small batch training on large batches. This diff adds support for `SchedulerWithWarmup`, underneath it holds two schedulers, WarmupScheduler and any other scheduler. After `warmup_steps`, the scheduler will switch from warmup to the specified scheduler. This allows something like Warmup with Expontential Decay. Since the scheduler is built on top of the existing warmup scheduler, any new features that come to that scheduler, will directly be applicable here. Sample Config ``` "SchedulerWithWarmup": { "warmup_scheduler": { "warmup_steps": 500 }, "scheduler": { "ExponentialLR": { "gamma": 0.95 } } } ``` Differential Revision: D18838272 fbshipit-source-id: 8bdb4616987d9030e8b09c073a8ba6d753d0fd8c

facebook-github-bot · 2019-12-06T08:32:13Z

This pull request was exported from Phabricator. Differential Revision: D18838272

Differential Revision: D18725798 fbshipit-source-id: 131cd0dc983f6a8f5d7ef0a90451238681aef821

Summary: Pull Request resolved: facebookresearch#1184 Current implementations of warmup in pytext either involve doing warmup and optionally inverse square root decay (TODO) or using polynomial decay (TODO). However, through my experiments, I notice for large batch training a warmup period is helpful on other schedulers as well, especially when trying to mimic results of small batch training on large batches. This diff adds support for `SchedulerWithWarmup`, underneath it holds two schedulers, WarmupScheduler and any other scheduler. After `warmup_steps`, the scheduler will switch from warmup to the specified scheduler. This allows something like Warmup with Expontential Decay. Since the scheduler is built on top of the existing warmup scheduler, any new features that come to that scheduler, will directly be applicable here. Sample Config ``` "SchedulerWithWarmup": { "warmup_scheduler": { "warmup_steps": 500 }, "scheduler": { "ExponentialLR": { "gamma": 0.95 } } } ``` Differential Revision: D18838272 fbshipit-source-id: e5aea7434d0563e14f357b8647781ec1ff0b0868

facebook-github-bot · 2019-12-06T08:43:07Z

This pull request was exported from Phabricator. Differential Revision: D18838272

facebook-github-bot · 2019-12-06T18:49:57Z

This pull request has been merged in a56c761.

facebook-github-bot added CLA Signed Do not delete this pull request or issue due to inactivity. fb-exported labels Dec 6, 2019

AkshatSh force-pushed the export-D18838272 branch from cd8dbe0 to da29086 Compare December 6, 2019 08:32

ArmenAg and others added 2 commits December 6, 2019 00:42

Implement LAMB optimizer

eb6c120

Differential Revision: D18725798 fbshipit-source-id: 131cd0dc983f6a8f5d7ef0a90451238681aef821

AkshatSh force-pushed the export-D18838272 branch from da29086 to e46b64a Compare December 6, 2019 08:43

facebook-github-bot closed this in a56c761 Dec 6, 2019

facebook-github-bot added the Merged label Dec 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduler with Warmup #1184

Scheduler with Warmup #1184

AkshatSh commented Dec 6, 2019

facebook-github-bot commented Dec 6, 2019

facebook-github-bot commented Dec 6, 2019

facebook-github-bot commented Dec 6, 2019

facebook-github-bot commented Dec 6, 2019

Scheduler with Warmup #1184

Scheduler with Warmup #1184

Conversation

AkshatSh commented Dec 6, 2019

facebook-github-bot commented Dec 6, 2019

facebook-github-bot commented Dec 6, 2019

facebook-github-bot commented Dec 6, 2019

facebook-github-bot commented Dec 6, 2019