Gradient overflow (NaN problem) #2

CODEJIN · 2020-09-10T02:11:43Z

Hi,

Thank you for your code! And, I have a question. I am trying to apply power normalization(PN) to Tacotron2. However, after I changed batch norm(BN) to PN, an overflow occurred after several thousands step training. When I checked, the MaskPowerNorm class's ema_gz parameter was smaller and smaller while training, and finally it became NaN. Is there any opinion or solution?

Thanks,

Heejo

The text was updated successfully, but these errors were encountered:

sIncerass · 2020-09-16T02:04:14Z

Hi, thanks for your interest and sorry for the late reply. I would suggest to tune the \alpha_bkw parameter in the https://github.com/amirgholami/powernorm/blob/2f23ae75c4f29904175bfd2c6b8248399ff99440/fairseq/modules/norms/mask_powernorm.py#L103. The larger it is, the smaller variance it will introduce to the later training phase.

CODEJIN · 2020-09-16T14:07:58Z

Hi, thank you for your reply. However, the link you sent is not work for me. I saw Page not found...

sIncerass closed this as completed May 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient overflow (NaN problem) #2

Gradient overflow (NaN problem) #2

CODEJIN commented Sep 10, 2020

sIncerass commented Sep 16, 2020

CODEJIN commented Sep 16, 2020

Gradient overflow (NaN problem) #2

Gradient overflow (NaN problem) #2

Comments

CODEJIN commented Sep 10, 2020

sIncerass commented Sep 16, 2020

CODEJIN commented Sep 16, 2020