-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: PatchGAN Discriminator #39
Comments
In fact, a "PatchGAN" is just a convnet! Or you could say all convnets are patchnets: the power of convnets is that they process each image patch identically and independently, which makes things very cheap (# params, time, memory), and, amazingly, turns out to work. The difference between a PatchGAN and regular GAN discriminator is that rather the regular GAN maps from a 256x256 image to a single scalar output, which signifies "real" or "fake", whereas the PatchGAN maps from 256x256 to an NxN array of outputs X, where each X_ij signifies whether the patch ij in the image is real or fake. Which is patch ij in the input? Well, output X_ij is just a neuron in a convnet, and we can trace back its receptive field to see which input pixels it is sensitive to. In the CycleGAN architecture, the receptive fields of the discriminator turn out to be 70x70 patches in the input image! This is all mathematically equivalent to if we had manually chopped up the image into 70x70 overlapping patches, run a regular discriminator over each patch, and averaged the results. Maybe it would have been better if we called it a "Fully Convolutional GAN" like in FCNs... it's the same idea :) |
Can you tell me which line in the code represents patchGAN? |
Edit: see |
I have a question.
|
|
Here is a visual receptive field calculator: https://fomoro.com/tools/receptive-fields/# I converted the math into python to make it easier to understand: def f(output_size, ksize, stride):
return (output_size - 1) * stride + ksize
last_layer = f(output_size=1, ksize=4, stride=1)
# Receptive field: 4
fourth_layer = f(output_size=last_layer, ksize=4, stride=1)
# Receptive field: 7
third_layer = f(output_size=fourth_layer, ksize=4, stride=2)
# Receptive field: 16
second_layer = f(output_size=third_layer, ksize=4, stride=2)
# Receptive field: 34
first_layer = f(output_size=second_layer, ksize=4, stride=2)
# Receptive field: 70
print(first_layer) |
Hi @phillipi @junyanz , |
That's a good point! Batchnorm does have this property. So to be precise we should say the PatchGAN architecture is equivalent to chopping up the image into 70x70 patches, making a big batch out of these patches, and running a discriminator on each patch, with batchnorm applied across the batch, then averaging the results. |
Yes that would be a better explanation! And thanks for your response to this. |
Hello phillipi, Thank you! |
tensorflow extract_image_patches is differentiable func and can be used in training |
well,I understood How the PathGAN work!thx. |
Hi, I am wondering why sigmoid activation is not used for pathGAN, since the true patch should be close to 1, while the false should be close to 0. |
Thanks. Then what is the difference of the output of D without sigmoid. For example, in LSGAN, if the output of D is very large (far from 1 or 0), can the loss function work? since the real labels are still set to 1 and false labels set to 0. |
I believe in LSGAN the loss is squared distance from the labels. So if the output of D is very large, D will get a large penalty and it will learn to make a smaller output. Eventually, D should learn to output the correct labels, since those minimize the loss (and the loss is nice and smooth, just squared distance). |
I would like to share some points on why the patch number is counted by: Here is what I think. For any i (input feature map size), k (kernel size), p (zero padding size) and s (stride), the output feature map size (o) is: when calculating patch number, it is supposed that p=0, so it is very clear that the calculation process above is just the opposite of the patch number calculation process. |
Why is a padding of 1 being used in every convolution in the discriminator? If we feed the discriminator an image of size 70x70 we get an output of 6x6. Wouldn't it make more sense to not use a padding and instead get one single output 1x1 for a 70x70 input? |
I think the padding was a holdover from the DCGAN architecture. I can't remember if there is a good reason for it. Might have been to make a 256x256 input map to a 1x1 output, in the DCGAN discriminator. Zero padding also has the effect that it helps localize where you are in the image, since you can see this border of zeros when you are near an image boundary. That can sometimes be beneficial. |
It lies in here. 538th line in the networks.py |
Thank you! |
It just means the width/height of the output feature map, |
Hi, as the discriminator outputs 30x30x1 matrix, does that mean the 70x70 patch was moved over the input image 30 times in each direction (horizontal and vertical) to map to single output for all of them? |
Answered at #1106. |
Hello phillipi, |
I doubt it has a big effect. You could try removing it and see what happens. |
Thx,and i wonder whether 'PatchGAN' discriminator (convnet in fact in your responsed) is applied to a 3-d model(C-H-W-L 4dim in code)still work? |
thanks @emilwallner |
The one thing I'm struggling to understand is that the discriminator looks at 70 x 70 patches. But if I understand correctly, it's input is the conditional image concatenated with either the real image or synthesised image. So if it's only looking at small patches at a time, how does it learn the relationship between the two images? How does it check that the conditional input has actually informed the image that has been generated? |
Most of the applications used in the paper only require local color and texture transfer. In these cases, 70x70 patches might be enough (for a 256x input image). Later work (e.g., pix2pixHD) has explored using multi-scale discriminators, which can look at more pixels. |
If this structure is added to the generator, will it have a good effect? Is there any Ablation Experiment in this regard |
Great picture, like it! |
Hi there.
I was investigating your CycleGAN paper and code. And looks like discriminator you've implemented is just a conv net, not a patchgan that was mentioned in the paper.
Maybe I've missed something. Could you point me where the processing of 70x70 patches is going on.
Thanks in advance!
The text was updated successfully, but these errors were encountered: