Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

specAugment policy and schedules #48

Closed
akshatdewan opened this issue May 6, 2020 · 3 comments
Closed

specAugment policy and schedules #48

akshatdewan opened this issue May 6, 2020 · 3 comments

Comments

@akshatdewan
Copy link

Hi,

I wanted to run an experiment with LD augmentation policy (as described in the Google Brain paper ) along with D learning schedule.

I was wondering what would be the right way of doing something like with base2.conv2l.specaug.curric3.config.

I was thinking of doing:

  1. Two additional masks in transform function just by simply calling random_mask two more times.
  2. Slowing down the warm-up by increasing num from 10 to 20 or 40
  3. Reducing the exponential LR decay newbob_learning_rate_decay from 0.9 to 0.95

Would it be a reasonable thing to do?

Thanks

@albertz
Copy link
Member

albertz commented May 8, 2020

We already have variations of that, where we also play around with scheduling of SpecAugment.
E.g. see Switchboard base2.conv2l.specaug4a.
@papar22 and @ZhouW321 also have some more variations, which we will upload soon to the repo.

Note that the random_mask in that config already runs multiple times (which is stochastically sampled). That are the options min_num and max_num. If you want that the mask is always runs exactly 3 times, just set min_num=3, max_num=3.

Yes sure, you can play around with learning rate warm-up as well. My experience however is that increasing usually will not help.

Reducing the LR decay helps when you want to increase your overall training time, i.e. train more epochs. And training longer usually helps. When you look at this original SpecAugment paper, you will see that they effectively train much longer than we do.

@akshatdewan
Copy link
Author

Thanks for your answer, Albert!

I am sorry for a possibly naive question, but in the config example you mention above, the newbob_learning_rate_decay is 0.7.

My understanding is: LR_epoch_t+1=decay*LR_epoch_t. So if I am starting from a baseline model trained for 12.5 epochs using newbob_learning_rate_decay= 0.9, and I want to train another model for say 25 epochs, I should increase newbob_learning_rate_decay to say 0.95 instead of reducing it to 0.7, right?

@albertz
Copy link
Member

albertz commented May 13, 2020

Yes sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants