local attention with unidirectional lstm not converging #41

manish-kumar-garg · 2020-01-13T05:56:27Z

Hi @albertz ,
I tried with different learning rates but model seems to be not converging after changing bidirectional lstm to unidirectional lstm in case of local attention setup.
Can you suggest something?
What else should I try.

albertz · 2020-01-14T08:46:11Z

It seems a very related question was also asked in #42.
I have not tried unidir LSTMs in the encoder yet, so I don't know. You probably should play around with all the available hyper params, e.g.:

Learning rate (initial at warmup, and highest learning rate after warmup)
Learning rate warmup length (num epochs)
Pretraining. E.g. start number of layers (try 2). Initial time reduction (try increasing it, e.g. 6, 8, 16, or even 32). Try make it longer (more repetitions). Etc.
No or less SpecAugment in the beginning.
Higher batch size in the beginning. Or gradient accumulation in the beginning.
Curriculum learning, i.e. the epoch_wise_filter option.
...

Let this smallest network with highest time reduction, high batch size, less/no SpecAugment etc train like that for as long as needed, before increasing anything. This small network should first get some half-way good score. Only when you see that, at that point the pretraining can increase the depth and other things, but only carefully and slowly (such that the network does not totally break again).

manish-kumar-garg · 2020-01-16T06:09:01Z

Thanks @albertz for suggesting these.
I trained following models upto pretraining epochs (45) with following observation of loss:

Base model - asr_2018_attention - below hyperparameters:

pretrain = {"repetitions": 5, "construction_algo": custom_construction_algo}
learning_rate = 0.0008
learning_rates = list(numpy.linspace(0.0003, learning_rate, num=15))  # warmup

Uni lstm size 1024 with all hyperparameters same as 1:
Uni lstm size 1024 with all hyperparams same as 1 except:
pretrain = {"repetitions": 7, "construction_algo": custom_construction_algo}
Uni lstm size 1024 with all hyperparams same as 1 except:
learning_rate = 0.0005
Uni lstm size 1024 with all hyperparams same as 1 except warmup steps 10:
learning_rates = list(numpy.linspace(0.0003, learning_rate, num=10)) # warmup
Uni lstm size 1024 with all hyperparams same as 1 except warmup steps 20:
learning_rates = list(numpy.linspace(0.0003, learning_rate, num=20)) # warmup
In this case loss becomes nan after 10 epochs
Uni lstm size 1536 with all hyperparameters same as 1:

All the models are with global attention

manish-kumar-garg · 2020-01-16T06:14:20Z

Seems like decreasing the learning rate helps.
Also, increasing the lstm cell size to 1536 helps a bit, however, not much.

What other combination do you suggest to try next?

albertz · 2020-01-17T12:52:56Z

All the things which I wrote already (here), but basically everything else as well.

manish-kumar-garg · 2020-01-30T14:57:05Z

Lowering the learning rate lr=0.0005 and lr_init=0.0002 worked for me.
Thanks!

albertz mentioned this issue Jan 14, 2020

Implement a unidirectional variant of local attention #42

Closed

manish-kumar-garg closed this as completed Jan 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

local attention with unidirectional lstm not converging #41

local attention with unidirectional lstm not converging #41

manish-kumar-garg commented Jan 13, 2020

albertz commented Jan 14, 2020

manish-kumar-garg commented Jan 16, 2020 •

edited

Loading

manish-kumar-garg commented Jan 16, 2020

albertz commented Jan 17, 2020

manish-kumar-garg commented Jan 30, 2020

local attention with unidirectional lstm not converging #41

local attention with unidirectional lstm not converging #41

Comments

manish-kumar-garg commented Jan 13, 2020

albertz commented Jan 14, 2020

manish-kumar-garg commented Jan 16, 2020 • edited Loading

manish-kumar-garg commented Jan 16, 2020

albertz commented Jan 17, 2020

manish-kumar-garg commented Jan 30, 2020

manish-kumar-garg commented Jan 16, 2020 •

edited

Loading