Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NaNs during CycleGAN training #20

Closed
revilokeb opened this issue May 4, 2017 · 4 comments
Closed

NaNs during CycleGAN training #20

revilokeb opened this issue May 4, 2017 · 4 comments

Comments

@revilokeb
Copy link

I have run approx. a dozen test runs (using train.py) on 2 datasets (maps and my own custom dataset trying to convert Synthia to Cityscapes). Every run so far is giving NaNs after a couple of epochs, sometimes after more than 70 epochs, sometimes after only a handful of epochs. Until I am getting only NaNs actual learning seems to really happen as e.g. evidenced by looking at transformed images over epoch number. I have also played with various learning rates, but even at pretty low lr NaNs seem to eventually occur.

My question: Is this something others have also observed? Second: in case this is "normal" and e.g. due to the difficulties of training GANs (min-max), what would be critical params to vary to eventually avoid training to break down?

@taesungp
Copy link
Collaborator

Hi revilokeb,

I believe it is due to repeatedly applying normalization with images of low variance.

When applying normalization like InstanceNorm, the gradients tend to blow up fast if the image has low variance. This becomes even worse because we are going through multiple normalizations in the deep network. This problem is more frequent in the CycleGAN architecture because we use InstanceNorm and deep network with many normalization layers.

I personally ran into this issue when one image in the dataset was uniformly black due to corrupt image. I think this problem can be alleviated by increasing the value of epsilon, or removing the few images that cause the problem.

@bernardohenz
Copy link

I am running into the same problem.

So are you saying that images with low-variance may lead to exploding gradients? Could some type of gradient clipping be used to avoid that?

I've been trying to debug this, but it is hard to isolate the problem =/

JiahangLiGary pushed a commit to lanbas/pytorch-CycleGAN-and-pix2pix that referenced this issue Apr 19, 2023
@jialeluD
Copy link

jialeluD commented Oct 3, 2023

The problem was solved by upgrading the version of torch.

@TattooPro
Copy link

which version of torch should I choose? The default is torch1.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants