Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best practices for backstitch #1942

Closed
danpovey opened this issue Oct 17, 2017 · 5 comments
Closed

Best practices for backstitch #1942

danpovey opened this issue Oct 17, 2017 · 5 comments

Comments

@danpovey
Copy link
Contributor

@freewym, I want to get to a situation where most of our frequently used example scripts have suitable backstitch settings, as it does seem to give a reliable improvement.

I think rather than relying on you to do that, it may be a good idea to just make everyone aware of what the recommended settings are, with guidance for tuning it if applicable-- with some idea of how much it's expected to improve the results. Can you please comment on this issue with answers to those questions?

@freewym
Copy link
Contributor

freewym commented Oct 17, 2017

To turn on the backstitch training, there are just a few lines to add/change to the shell script:

pass the following options to steps/nnet3/chain/train.py:
--trainer.optimization.backstitch-training-scale $alpha \
--trainer.optimization.backstitch-training-interval $back_interval \

where a typical setting is:
$alpha=0.3
$back_interval=1

or to get speed-up at the cost of potentially a small degradation (which is observed in our swbd experiments):
$alpha=1.0
$back_interval=4

Meanwhile, we need to double the value of num-epochs when doing backstitch training (e.g, if num-epochs=4 with normal SGD training, then num-epochs=8 with backstitch training). If the the valid objf has not converged after doubling num-epochs, further increase it until convergence.

For TDNN-LSTM recipes of the chain model, backstitch obtains ~10% relative WER improvement on SWBD, AMI-SDM and tedlium. For TDNN-LSTM cross-entropy models, the improvement is smaller (2-4%). For non-recurrent architectures (e.g., TDNN), the improvement may be even smaller.

Note that the recommended settings above apply to our ASR tasks with chain/cross-entropy models. It may be different for other tasks like image classification (e.g., in CIFAR Resnet recipes, alpha=0.5, back-interval=1, and num-epochs is around 30% more than the one in the normal SGD training) .

@danpovey
Copy link
Contributor Author

danpovey commented Oct 17, 2017 via email

@freewym
Copy link
Contributor

freewym commented Oct 17, 2017

Most of the time with the same num-epochs backstitch is worse.
I can try increasing the init-learning-rate.

@stale
Copy link

stale bot commented Jun 19, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Stale bot on the loose label Jun 19, 2020
@stale
Copy link

stale bot commented Jul 19, 2020

This issue has been automatically closed by a bot strictly because of inactivity. This does not mean that we think that this issue is not important! If you believe it has been closed hastily, add a comment to the issue and mention @kkm000, and I'll gladly reopen it.

@stale stale bot closed this as completed Jul 19, 2020
@kkm000 kkm000 removed the stale Stale bot on the loose label Jul 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants