New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: PatchGAN Discriminator #39

Closed
johnkorn opened this Issue Jun 1, 2017 · 9 comments

Comments

Projects
None yet
5 participants
@johnkorn

johnkorn commented Jun 1, 2017

Hi there.
I was investigating your CycleGAN paper and code. And looks like discriminator you've implemented is just a conv net, not a patchgan that was mentioned in the paper.
Maybe I've missed something. Could you point me where the processing of 70x70 patches is going on.
Thanks in advance!

@phillipi

This comment has been minimized.

Show comment
Hide comment
@phillipi

phillipi Jun 1, 2017

Collaborator

In fact, a "PatchGAN" is just a convnet! Or you could say all convnets are patchnets: the power of convnets is that they process each image patch identically and independently, which makes things very cheap (# params, time, memory), and, amazingly, turns out to work.

The difference between a PatchGAN and regular GAN discriminator is that rather the regular GAN maps from a 256x256 image to a single scalar output, which signifies "real" or "fake", whereas the PatchGAN maps from 256x256 to an NxN array of outputs X, where each X_ij signifies whether the patch ij in the image is real or fake. Which is patch ij in the input? Well, output X_ij is just a neuron in a convnet, and we can trace back its receptive field to see which input pixels it is sensitive to. In the CycleGAN architecture, the receptive fields of the discriminator turn out to be 70x70 patches in the input image!

This is all mathematically equivalent to if we had manually chopped up the image into 70x70 overlapping patches, run a regular discriminator over each patch, and averaged the results.

Maybe it would have been better if we called it a "Fully Convolutional GAN" like in FCNs... it's the same idea :)

Collaborator

phillipi commented Jun 1, 2017

In fact, a "PatchGAN" is just a convnet! Or you could say all convnets are patchnets: the power of convnets is that they process each image patch identically and independently, which makes things very cheap (# params, time, memory), and, amazingly, turns out to work.

The difference between a PatchGAN and regular GAN discriminator is that rather the regular GAN maps from a 256x256 image to a single scalar output, which signifies "real" or "fake", whereas the PatchGAN maps from 256x256 to an NxN array of outputs X, where each X_ij signifies whether the patch ij in the image is real or fake. Which is patch ij in the input? Well, output X_ij is just a neuron in a convnet, and we can trace back its receptive field to see which input pixels it is sensitive to. In the CycleGAN architecture, the receptive fields of the discriminator turn out to be 70x70 patches in the input image!

This is all mathematically equivalent to if we had manually chopped up the image into 70x70 overlapping patches, run a regular discriminator over each patch, and averaged the results.

Maybe it would have been better if we called it a "Fully Convolutional GAN" like in FCNs... it's the same idea :)

@phillipi phillipi closed this Jun 1, 2017

@taki0112

This comment has been minimized.

Show comment
Hide comment
@taki0112

taki0112 Sep 27, 2017

Can you tell me which line in the code represents patchGAN?

taki0112 commented Sep 27, 2017

Can you tell me which line in the code represents patchGAN?

@phillipi

This comment has been minimized.

Show comment
Hide comment
Collaborator

phillipi commented Sep 27, 2017

@taki0112

This comment has been minimized.

Show comment
Hide comment
@taki0112

taki0112 Oct 15, 2017

I have a question.

  1. I saw the code(class NLayerDiscriminator(nn.Module)), but I do not see the number 70 anywhere.
    So why is it called 70x70 patchGAN?
    that is, Why is it the number 70?

  2. the output of the code is 30x30x1. (X_ij)
    The patch of patchGAN was called 70x70. (ij)
    You said, you traceback and found that patch ij is 70x70, how did you do it?

taki0112 commented Oct 15, 2017

I have a question.

  1. I saw the code(class NLayerDiscriminator(nn.Module)), but I do not see the number 70 anywhere.
    So why is it called 70x70 patchGAN?
    that is, Why is it the number 70?

  2. the output of the code is 30x30x1. (X_ij)
    The patch of patchGAN was called 70x70. (ij)
    You said, you traceback and found that patch ij is 70x70, how did you do it?

@phillipi

This comment has been minimized.

Show comment
Hide comment
@phillipi

phillipi Oct 15, 2017

Collaborator
  1. The "70" is implicit, it's not written anywhere in the code but instead emerges as a mathematical consequence of the network architecture.

  2. The math is here: https://github.com/phillipi/pix2pix/blob/master/scripts/receptive_field_sizes.m

Collaborator

phillipi commented Oct 15, 2017

  1. The "70" is implicit, it's not written anywhere in the code but instead emerges as a mathematical consequence of the network architecture.

  2. The math is here: https://github.com/phillipi/pix2pix/blob/master/scripts/receptive_field_sizes.m

@emilwallner

This comment has been minimized.

Show comment
Hide comment
@emilwallner

emilwallner Feb 24, 2018

Here is a visual receptive field calculator: https://fomoro.com/tools/receptive-fields/#

I converted the math into python to make it easier to understand:

def f(output_size, ksize, stride):
    return (output_size - 1) * stride + ksize

last_layer = f(output_size=1, ksize=4, stride=1)
# Receptive field: 4
fourth_layer = f(output_size=last_layer, ksize=4, stride=1)
# Receptive field: 7
third_layer = f(output_size=fourth_layer, ksize=4, stride=2)
# Receptive field: 16
second_layer = f(output_size=third_layer, ksize=4, stride=2)
# Receptive field: 34
first_layer = f(output_size=second_layer, ksize=4, stride=2)
# Receptive field: 70

print(first_layer)

emilwallner commented Feb 24, 2018

Here is a visual receptive field calculator: https://fomoro.com/tools/receptive-fields/#

I converted the math into python to make it easier to understand:

def f(output_size, ksize, stride):
    return (output_size - 1) * stride + ksize

last_layer = f(output_size=1, ksize=4, stride=1)
# Receptive field: 4
fourth_layer = f(output_size=last_layer, ksize=4, stride=1)
# Receptive field: 7
third_layer = f(output_size=fourth_layer, ksize=4, stride=2)
# Receptive field: 16
second_layer = f(output_size=third_layer, ksize=4, stride=2)
# Receptive field: 34
first_layer = f(output_size=second_layer, ksize=4, stride=2)
# Receptive field: 70

print(first_layer)
@utkarshojha

This comment has been minimized.

Show comment
Hide comment
@utkarshojha

utkarshojha Sep 15, 2018

Hi @phillipi @junyanz ,
I understood how patch sizes are calculated implicitly by tracing back the receptive field sizes of successive convolutional layers. But don't you think batch normalization sort of harms the overall idea of patch-gan discriminator? I mean theoretically each member X_ij of the final NxN output should just be dependent on some 70x70 patch in the original image. And that any changes beyond that 70x70 patch should not result in change in the value of X_ij. But if we use batch normalization then that won't necessarily be true right?

utkarshojha commented Sep 15, 2018

Hi @phillipi @junyanz ,
I understood how patch sizes are calculated implicitly by tracing back the receptive field sizes of successive convolutional layers. But don't you think batch normalization sort of harms the overall idea of patch-gan discriminator? I mean theoretically each member X_ij of the final NxN output should just be dependent on some 70x70 patch in the original image. And that any changes beyond that 70x70 patch should not result in change in the value of X_ij. But if we use batch normalization then that won't necessarily be true right?

@phillipi

This comment has been minimized.

Show comment
Hide comment
@phillipi

phillipi Sep 16, 2018

Collaborator

That's a good point! Batchnorm does have this property. So to be precise we should say the PatchGAN architecture is equivalent to chopping up the image into 70x70 patches, making a big batch out of these patches, and running a discriminator on each patch, with batchnorm applied across the batch, then averaging the results.

Collaborator

phillipi commented Sep 16, 2018

That's a good point! Batchnorm does have this property. So to be precise we should say the PatchGAN architecture is equivalent to chopping up the image into 70x70 patches, making a big batch out of these patches, and running a discriminator on each patch, with batchnorm applied across the batch, then averaging the results.

@utkarshojha

This comment has been minimized.

Show comment
Hide comment
@utkarshojha

utkarshojha Sep 17, 2018

Yes that would be a better explanation! And thanks for your response to this.

utkarshojha commented Sep 17, 2018

Yes that would be a better explanation! And thanks for your response to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment