Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tanh in the Generator last activation #522

Closed
YuvalFrommer opened this issue Feb 10, 2019 · 15 comments
Closed

Tanh in the Generator last activation #522

YuvalFrommer opened this issue Feb 10, 2019 · 15 comments

Comments

@YuvalFrommer
Copy link

Thanks for your work.
I am wondering why are you using tanh in the last activation of the generator?
Thnks again

@junyanz
Copy link
Owner

junyanz commented Feb 10, 2019

The goal is to match the range. The range of real images is [-1, 1]. Tanh outputs a value between [-1, 1].

@John1231983
Copy link

Clear!

If I normalize the whole image [HxW] to [-1,1] and then random crop to size of [H/8xW/8] and feed to the network. Clear that the range of [H/8xW/8] will not in the range [-1,1]. Should not use the tanh in the last layer? Which way do you prefer to handle it? I cannot feed the whole [HxW] due to the memory issue

@taesungp
Copy link
Collaborator

[-1, 1] is the range of the value each pixel (brightness / color of each pixel should be within -1 and 1), so it has nothing to do with the width and height of the image.

@John1231983
Copy link

@taesungp : No, I misunderstood my question. Let's I is an image with size of HxW. So the normalization will be

I=I/max(I)
I=(I-0.5)/0.5

Now, the image intensity will be in [-1,1]. If I randomly crop the image into [H/8 and W/8]. Do you think the crop image range still in [-1,1]. No. It will be in a different range.

@taesungp
Copy link
Collaborator

In the first line you should do

I = I/255.0 instead of I = I/max(I) so that it become independent of the values of the current cropped I.

@John1231983
Copy link

Yes. But after normalization, we will crop the image. I know that we should normalize after the crop image but in my case, I want to normalize before crop image.

@taesungp
Copy link
Collaborator

I think I = I/255.0 is independent of cropping. Cropping and then I/255.0 is same as doing I/255.0 and then cropping.

@John1231983
Copy link

It is correct. But the problem here is that if an image size of WxH is normalized to [-1,1]. Then crop a region in the image, the region may not in range of [-1,1], it may be [-0.5 0.5]. Then the output of tanh is [-1,1], so it makes the inconsistent range between cropped input and output of the network.

@taesungp
Copy link
Collaborator

  • Even with tanh, if the ground-truth cropped image is in the range of [-.5, .5], the generator network will learn to output [-.5, .5]. In other words, tanh does not make all outputs to have max value 1. For example, if the generator outputs zero everywhere, the image will be also zero everywhere, not [-1, -1].
  • You actually have exactly same situation with uncropped images. Some images are bright, so they will be in [0, 1] range, not [-1, 1]. Some images are greyish, so they will be within [-0.5, 0.5]. You have the same amount of problem with or without cropping.
  • Tanh merely constrains the minimum and maximum output of the generator to be -1 and 1. The network can probably do just as well with .clamp(-1, 1) instead of Tanh().

@John1231983
Copy link

I gave an example for that

import numpy as np

I = [[1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,255],
     [1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,255],
     [1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,255],
     [1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,255],
     [1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,255]]
I = np.asarray(I)
I = I/255
I= (I-0.5)/0.5
print (I.min(), I.max()) #-0.9921568627450981 1.0
I_crop= I[4:6, 4:6]
print(I_crop.min(),I_crop.max()) #-0.9607843137254902 -0.8745098039215686

@taesungp
Copy link
Collaborator

Yes...? The cropped image can be just thought as a smaller uncropped image. You are just training with smaller images.

Let's say you don't use cropping. What if the input image is

I = [[1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,19], [1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,19], [1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,19], [1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18,19], [1,2,3,4,5,6,7,8,9],[11,12,13,14,15,16,17,18, 19]] + 100

so that all values are within [101, 119]? As such, cropping does not introduce any extra problem. If images are within range [-0.5, 0.5], the generator will learn to output [-arctanh(-0.5), arctanh(0.5)].

@pyoungkangkim
Copy link

pyoungkangkim commented Dec 13, 2019

The goal is to match the range. The range of real images is [-1, 1]. Tanh outputs a value between [-1, 1].

Yes, I have heard that the range of real images is [-1, 1] elsewhere as well
However, I have two sequential questions.

  1. When I open an image using PIL, so PIL.Image('some_img.jpg')
    Does PIL automatically convert the pixel values ranging from [-1, 1] to [0, 255]? Or did you mean something different when saying the range of real images is [-1, 1]? I guess I'm not totally sure if the actual pixel numerical values actually range from [-1, 1] originally due to my misunderstanding.

What I do know is that torchvision.transforms.ToTensor divides the values ranging from [0, 255] by 255, thus scaling them to [0, 1]

  1. It was a bit odd to me that we usually first shift an image(if PIL does what question 1 says it does) to [0, 1] as original input AND THEN work with trying to output something from [-1, 1] again, then plot by shifting back to [0, 1].
    Where I thought it might be better to just take as input the original values in between [-1, 1] and then output something from [-1, 1], then plot by shifting back to [0, 1].

But it's been my belief that this actually didn't matter too much because of the normalization layers. The normalization makes the activations have a mean of 0 and a std of 1, so it doesn't matter what range the original input is in, even though its been shifted to [0, 1]. Is that a bad statement or a bad conceived notion? What are your thoughts on that?

@junyanz
Copy link
Owner

junyanz commented Dec 14, 2019

  1. The PIL image is [0, 255]. We convert it into [-1, 1] using torchvision.transforms.Normalize, as neural networks work better with zero-mean data. The input to the networks is [-1, 1] after this conversion. See this line for more details.
  2. The range for both the original images and generated images is [-1, 1].

@pyoungkangkim
Copy link

pyoungkangkim commented Dec 15, 2019

  1. The PIL image is [0, 255]. We convert it into [-1, 1] using torchvision.transforms.Normalize, as neural networks work better with zero-mean data. The input to the networks is [-1, 1] after this conversion. See this line for more details.
  2. The range for both the original images and generated images is [-1, 1].

Ah, so that is what you meant! I was worried this whole time, images originally contained values ranging from [-1,1] instead of what I been telling people(i.e [0,255])

Also yes, that would do that. I got so used to using different precomputed means and stds which doesn't give [-1, 1] for new data, that I forgot you were using .5 for all.

Does this mean that though, instead of input ranging from [0,1] and output ranging from [0,1] through Sigmoid, its better to normalize the [0,1] input to a [-1,1] input and output a [-1,1] output through Tanh, since the latter is normalized?

@junyanz
Copy link
Owner

junyanz commented Dec 15, 2019

Yes, your understanding is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants