Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on conditional image generation #6

Open
tackgeun opened this issue Oct 19, 2017 · 4 comments
Open

Question on conditional image generation #6

tackgeun opened this issue Oct 19, 2017 · 4 comments

Comments

@tackgeun
Copy link

With the help of your advice, I reproduced the main part of the paper and succeed in training your model to ImageNet with specific classes, minibus, dog.

Now, I try to reproduce 6.8 conditional image generation in the paper to use this architecture for unsupervised/weakly-supervised object segmentation.
But I had a hard time to find a trainable setup. Can you share a setting in the experiment?

[Encoder architecture]

  • Use same architecture with a discriminator.
  • If an image is given, encoder returns a vector with the dimension same as that of random noise vector (nz).
  • To generate background vector, I use Encoder(real image)=background vector
  • To generate foreground vector, I use Encoder(real image - generated image)=foreground vector

[Optimization method]

  • Original code (1) maximize log(D(x)) + log(1 - D(G(z))) with fake image then (2) maximize log(D(G(z))) with true image.
  • I use three-options in optimization (1) update auto-encoder + generator part additionally (2) update auto-encoder part with maximizing log(D(G(z)) step. (3) update only auto-encoder part additonally. But all variants are not working.

Thanks.

@jwyang
Copy link
Owner

jwyang commented Oct 19, 2017

Hi, @tackgeun ,

glad to know you have successfully get the results as the paper!

Regarding the conditional image generation,

[Encoder architecture]

  • Use same architecture with a discriminator.

yes, I also used the same architecture as the discriminator, except that the output layer is a fc layer.

  • If an image is given, encoder returns a vector with the dimension same as that of random noise vector (nz).

yes, exactly, but I put a batch normalization layer before the last fc layer.

  • To generate background vector, I use Encoder(real image)=background vector

yeah, that's what I did.

  • To generate foreground vector, I use Encoder(real image - generated image)=foreground vector

yes, that's what I did.

[Optimization method]

  • Original code (1) maximize log(D(x)) + log(1 - D(G(z))) with fake image then (2) maximize log(D(G(z))) with true image.

  • I use three-options in optimization (1) update auto-encoder + generator part additionally (2) update auto-encoder part with maximizing log(D(G(z)) step. (3) update only auto-encoder part additonally. But all variants are not working.

In my experiments, I used reconstruction loss (with weight 1e-3) plus the adversarial loss for training the generator, and the same loss as original GAN for training the discriminator.

Also, please remember to regularize the transformation parameters. The rotation should be minor or even void during training.

Hope these are helpful.

@tackgeun
Copy link
Author

tackgeun commented Oct 23, 2017

With your help, my code starts to generate an image. But reconstructed images are blurry compared to the paper. But I'm confusing that the encoder architecture and the way of updating generator weight.

Architecture

Do you mean that the last part of encoder network like this?

  • Discriminator CNN part - last conv - batch norm(dimension nz) - relu - fc

Optimization

The encoder is embedded within generator class. So I compute gradient for

  1. adversarial loss for generator(random noise)
  2. weighted reconstruction loss(encoded image).

Then, those gradients are accumulated to compute the gradient of the sum of all losses.
Following the above description, the generator's weight is updated w.r.t (1) + (2) and the encoder's weight is updated w.r.t only (2).

Thank you.

@jwyang
Copy link
Owner

jwyang commented Oct 24, 2017

Hi, @tackgeun

With your help, my code starts to generate an image. But reconstructed images are blurry compared to the paper. But I'm confusing that the encoder architecture and the way of updating generator weight.

Architecture

Do you mean that the last part of encoder network like this?

  • Discriminator CNN part - last conv - batch norm(dimension nz) - relu - fc

I used Tanh() as the last layer for the encoder, to make the output in a reasonable range.

Optimization

The encoder is embedded within generator class. So I compute gradient for

  1. adversarial loss for generator(random noise)
  2. weighted reconstruction loss(encoded image).

Then, those gradients are accumulated to compute the gradient of the sum of all losses.
Following the above description, the generator's weight is updated w.r.t (1) + (2) and the encoder's weight is updated w.r.t only (2).

Actually, I update both the generator and the encoder with (1) and (2). And the weight for reconstruction loss is much lower, 1e-3 to 1e-2.

Thank you.

@tackgeun
Copy link
Author

tackgeun commented Nov 4, 2017

@hi, I just have returned from the ICCV... and my implementation still has problems.

  • Did you use Euclidean loss for reconstruction loss? I heard that some works (pix2pix, CycleGAN) insist that L1 loss generates more fine details...
  • Did you not use size_average option for reconstruction loss?
  • Did you additional generator for the auto-encoder part? or use the same generator with the encoder by weight sharing?
  • You said that you updated both the generator and the encoder with (1) and (2). How to update the encoder parameter using random noise? part (1). Did you use reconstruction loss on random noise?

Thank you for your advice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants