This repository has been archived by the owner on Nov 22, 2022. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Summary: Pull Request resolved: #1184 Current implementations of warmup in pytext either involve doing warmup and optionally inverse square root decay (TODO) or using polynomial decay (TODO). However, through my experiments, I notice for large batch training a warmup period is helpful on other schedulers as well, especially when trying to mimic results of small batch training on large batches. This diff adds support for `SchedulerWithWarmup`, underneath it holds two schedulers, WarmupScheduler and any other scheduler. After `warmup_steps`, the scheduler will switch from warmup to the specified scheduler. This allows something like Warmup with Expontential Decay. Since the scheduler is built on top of the existing warmup scheduler, any new features that come to that scheduler, will directly be applicable here. Sample Config ``` "SchedulerWithWarmup": { "warmup_scheduler": { "warmup_steps": 500 }, "scheduler": { "ExponentialLR": { "gamma": 0.95 } } } ``` Reviewed By: ArmenAg Differential Revision: D18838272 fbshipit-source-id: 1b1b107552f2f8f38ed8cc319b9b64096d0bc07c
- Loading branch information