Learned Prior #11

MoeinSorkhei · 2020-01-15T17:11:57Z

Hi,
Thank you for your great implementation.

Regarding the "learned" prior, I wanted to ask:
1- Why are you considering the prior to be a Gaussian with trainable parameters rather than a unit Gaussian, for instance.
2- For the Gaussian prior, what is the motivation behind obtaining the mean and std of the Gaussian by passing out through the CNN? Is it just because you found it to be more useful? (https://github.com/rosinality/glow-pytorch/blob/master/model.py#L285)

Thanks in advance.

The text was updated successfully, but these errors were encountered:

rosinality · 2020-01-15T23:20:54Z

I found it from official implementations. It was beneficial to training stability. I think the intuition of this is not found in the papers, but I think having spatial, content dependent priors will make overall problem easier.

MoeinSorkhei · 2020-01-16T15:50:26Z

Thanks for the response!
I will write here in case I gain more knowledge about this technique.

MoeinSorkhei · 2020-02-10T15:13:54Z

Hi,

I have some detailed questions regarding your code. I will be happy if you have the time to answer them.

In both the Actornom and AffineCoupling modules where we need to apply scale and shift operations, in both your code and the original OpenAI code you are first applying the shift and then scale. Although it should not matter very much as long as optimization is concerned, did you find any reason to switch the operations? In the paper, the scale operation is done before the shift operation (Code lines: https://github.com/rosinality/glow-pytorch/blob/master/model.py#L54 and https://github.com/rosinality/glow-pytorch/blob/master/model.py#L184).
Can you explain why there is an additional learnable (scale * 3) factor in the ZeroConv2d module (like OpenAI code)? And why is the scale is multiplied by 3? (Code line: https://github.com/rosinality/glow-pytorch/blob/master/model.py#L151)
Can you explain why the input is shifted by +2 before applying Sigmoid in the AffinCouping module (like OpenAI code)? (Code line: https://github.com/rosinality/glow-pytorch/blob/master/model.py#L182)
Why are you padding with 1 (and not 0) in the ZeroConv2D module (Code line: https://github.com/rosinality/glow-pytorch/blob/master/model.py#L149), while you are padding with 0 in the other Conv2D layers in the AffineCoupling? OpeAI pads with 0 if I am not mistaken (https://github.com/openai/glow/blob/master/tfops.py#L203).
If I am not mistaken, in the official implementation, the authors incorporate additional ActNorm right after both the Conv2D and ZeroConv2D layers (Code lines https://github.com/openai/glow/blob/master/tfops.py#L256 and https://github.com/openai/glow/blob/master/tfops.py#L309). Is there any reason you have not considered this in your code?

I appreciate your time.
Thank you in advance.

rosinality · 2020-02-10T23:59:22Z

Actually I don't know the exact reasons of these details - I have tried to replicate the paper and after it not works well I took these details from the official implementations. Anyway it is related to training stability, though.

It could make optimization process different as location parameter is also scaled.
I don't know why...Maybe it is to accelerate adaptation during initial training phases.
It will make scaling factor close to 1 - This will be helpful for information propagation.
I think they also used 1 padding. (https://github.com/openai/glow/blob/master/tfops.py#L217)
Did they used ActNorm after ZeroConv2D? I can't find it in conv2d_zeros.

MoeinSorkhei · 2020-05-12T16:28:39Z

Hi,

Thanks for your response, you are right.
I have a question about the amount of noise you add to each data point when training. As can be seen in your code here (and OpenAI's code), you are adding noise to the data based on 1/(2 ** n_bits). Since the 8-bit image pixels which naturally take values from [0-255] are divided by 255 when bein converted to [0-1] interval, why should we not choose 1/(2 ** n_bits - 1), that is, 1/255 in case of 8-bit images as the right quantity?

I appreciate your thoughts.
Thanks very much in advance.

rosinality · 2020-05-13T01:12:28Z

Hmm, wouldn't it be better to have noises slight smaller than actual pixel value changes in [0, 255]? Though I guess it will not make much differences.

MoeinSorkhei · 2020-05-13T16:24:29Z

Yes, might be. I also think it does not make a huge difference overall.

MoeinSorkhei · 2020-08-03T11:01:27Z

Actually I don't know the exact reasons of these details - I have tried to replicate the paper and after it not works well I took these details from the official implementations. Anyway it is related to training stability, though.

It could make optimization process different as location parameter is also scaled.

I don't know why...Maybe it is to accelerate adaptation during initial training phases.

It will make scaling factor close to 1 - This will be helpful for information propagation.

I think they also used 1 padding. (https://github.com/openai/glow/blob/master/tfops.py#L217)

Did they used ActNorm after ZeroConv2D? I can't find it in conv2d_zeros.

Hi, after investigating the Glow official implementation more thorougly, I wanted to make some clarifications regarding this topic:

First, I think they alway pad with with one, since they use the add_edge_padding function which pads with 1, as you referred in your response. However, in your code, you pad with 0 (Pytorch default) except for ZeroConv2D where you pad with 1. Am I right?
Secondly, about the Actnorm, the official implementation always uses actnorm after Conv2D operations that they define except for ZeroInitConv2D that does not do the actnorm, as far as I understand their code.

Thanks.

rosinality · 2020-08-03T12:44:05Z

First, I think they alway pad with with one, since they use the add_edge_padding function which pads with 1, as you referred in your response. However, in your code, you pad with 0 (Pytorch default) except for ZeroConv2D where you pad with 1. Am I right?

Slightly different. They also have used zero padding, but concatenated additional input channels that all zeros except borders.

MoeinSorkhei · 2020-08-04T20:35:26Z

First, I think they alway pad with with one, since they use the add_edge_padding function which pads with 1, as you referred in your response. However, in your code, you pad with 0 (Pytorch default) except for ZeroConv2D where you pad with 1. Am I right?

Slightly different. They also have used zero padding, but concatenated additional input channels that all zeros except borders.

Absolutely right. Do you understand what the usage of that additional input (the pad variable) is? In other words, why they concatenate with this additional input after padding with zeros?

rosinality · 2020-08-04T23:46:14Z

It will help features maps to have different values (or kind of biases) on the edge. I think zero padding can also achieves that some degree, though.

MoeinSorkhei · 2020-08-10T16:03:51Z

Right. As you mentioned, zero-padding allows for different values since the kernel that will be convolved with the input will have a bias by default, so it allows for different values although the kernel weight is multiplied by zeros (pads). But the pad variable that is concatenated to the input also allows the kernel weight to have an effect, so in theory it might allow for more degree of freedom. However, I am not sure how helpful it might be in practice.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Learned Prior #11

Learned Prior #11

MoeinSorkhei commented Jan 15, 2020

rosinality commented Jan 15, 2020

MoeinSorkhei commented Jan 16, 2020

MoeinSorkhei commented Feb 10, 2020 •

edited

Loading

rosinality commented Feb 10, 2020

MoeinSorkhei commented May 12, 2020

rosinality commented May 13, 2020

MoeinSorkhei commented May 13, 2020

MoeinSorkhei commented Aug 3, 2020

rosinality commented Aug 3, 2020

MoeinSorkhei commented Aug 4, 2020

rosinality commented Aug 4, 2020

MoeinSorkhei commented Aug 10, 2020

Learned Prior #11

Learned Prior #11

Comments

MoeinSorkhei commented Jan 15, 2020

rosinality commented Jan 15, 2020

MoeinSorkhei commented Jan 16, 2020

MoeinSorkhei commented Feb 10, 2020 • edited Loading

rosinality commented Feb 10, 2020

MoeinSorkhei commented May 12, 2020

rosinality commented May 13, 2020

MoeinSorkhei commented May 13, 2020

MoeinSorkhei commented Aug 3, 2020

rosinality commented Aug 3, 2020

MoeinSorkhei commented Aug 4, 2020

rosinality commented Aug 4, 2020

MoeinSorkhei commented Aug 10, 2020

MoeinSorkhei commented Feb 10, 2020 •

edited

Loading