spend times on training #3

kunshou123 · 2019-08-21T12:58:32Z

@sailordiary hi, thank you for your code!
i want to know whether the model have a quick convergence speed? i use Dense3D model to train but find loss cannot quickly decrease .Do you know some good lipreading models spend little time on training,thank you

sailordiary · 2019-08-21T13:11:56Z

Hi,

LipNet is one of the smallest lip reading models that I know of. It converges fairly quickly; however the GRID dataset is small so it might take a bit longer to train to optimal performance (e.g. to match the WERs reported in the paper).

On the other hand you can also experiment with simple 2D video encoders like VGG-M. I think it converges quickly too.

sailordiary · 2019-08-30T09:58:13Z

@kunshou123 , some updates: I just reproduced the overlapped speakers setup in the paper. It took 91 epochs to reach the "Baseline-NoLM" results reported in the paper (notably, I used greedy decoding, not beam search decoding, so the actual performance could be even better). Training takes about 40 min per epoch, using the parameter settings in the current revision, which are taken directly from the authors.

For those who happen to have dropped by, I plan to release the pre-trained checkpoints, as soon as I have the time to clean the dataset preparation code (clearly, preprocessing matters for the model to be useful).

sailordiary · 2019-08-30T10:03:51Z

Here are the training curves for overlapped speakers, if anyone's interested. (The discontinuities were accidental; I restored optimizer states.)

kunshou123 · 2019-08-30T11:37:16Z

oh!! Thank you very much for your reply. I have been confused in this aspect for a long time. I am a novice in lipreading, and the effect of 3d CNN training is very bad and loss cannot significantly decreased.i will try it

WeicongChen · 2020-03-29T04:37:30Z

@kunshou123 , some updates: I just reproduced the overlapped speakers setup in the paper. It took 91 epochs to reach the "Baseline-NoLM" results reported in the paper (notably, I used greedy decoding, not beam search decoding, so the actual performance could be even better). Training takes about 40 min per epoch, using the parameter settings in the current revision, which are taken directly from the authors.

For those who happen to have dropped by, I plan to release the pre-trained checkpoints, as soon as I have the time to clean the dataset preparation code (clearly, preprocessing matters for the model to be useful).

Hi, can you kindly share your preprocessing code? I am struggling with reproducing LipNet these days.

sailordiary closed this as completed Aug 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spend times on training #3

spend times on training #3

kunshou123 commented Aug 21, 2019

sailordiary commented Aug 21, 2019

sailordiary commented Aug 30, 2019 •

edited

Loading

sailordiary commented Aug 30, 2019

kunshou123 commented Aug 30, 2019

WeicongChen commented Mar 29, 2020

spend times on training #3

spend times on training #3

Comments

kunshou123 commented Aug 21, 2019

sailordiary commented Aug 21, 2019

sailordiary commented Aug 30, 2019 • edited Loading

sailordiary commented Aug 30, 2019

kunshou123 commented Aug 30, 2019

WeicongChen commented Mar 29, 2020

sailordiary commented Aug 30, 2019 •

edited

Loading