Question: why not dividing by target length in CTC loss #68

vadimkantorov · 2019-10-24T12:49:13Z

In https://github.com/NVIDIA/NeMo/blob/master/collections/nemo_asr/nemo_asr/losses.py#L51-L53 it is mentioned that NeMo does not divide by target length (which also makes losses less comparable between sequences of different size), effectively scaling up learning rate for longer sequences if I understand well.

Could you please comment on this choice (e.g. versus normalizing by sequence lengths and pushing up the learning rate)? Thank you!

okuchaiev · 2019-10-24T17:43:41Z

Yes, this is intentional. Basically, there are 2 options which I think make sense for CTCLoss:

"mean" - average everything across sequence length and batch (Notice that this is the default behavior for Pytorch)
Sum losses over sequence lengths and then average over the batch.

We found out empirically that option (2) works best. While longer sequences do make greater impact, in this case, keep in mind that in our setup: (1) we randomly shuffle examples and (2) cap the max duration to 16.7 seconds.

But, perhaps, we should expose (1) as an option.

vadimkantorov · 2019-10-24T18:33:29Z

(1) we randomly shuffle examples

Don't you sort by duration (so that duration is similar within the batch) by default?
https://github.com/NVIDIA/NeMo/blob/master/collections/nemo_asr/nemo_asr/parts/manifest.py#L129

But, perhaps, we should expose ("mean") as an option.

Yeah, I wonder if longer sequences indeed provide more reliable gradients. If it's not the case, then rising learning rate should have somewhat similar impact.

okuchaiev added the question label Oct 24, 2019

okuchaiev closed this as completed Oct 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: why not dividing by target length in CTC loss #68

Question: why not dividing by target length in CTC loss #68

vadimkantorov commented Oct 24, 2019

okuchaiev commented Oct 24, 2019

vadimkantorov commented Oct 24, 2019 •

edited

Question: why not dividing by target length in CTC loss #68

Question: why not dividing by target length in CTC loss #68

Comments

vadimkantorov commented Oct 24, 2019

okuchaiev commented Oct 24, 2019

vadimkantorov commented Oct 24, 2019 • edited

vadimkantorov commented Oct 24, 2019 •

edited