Skip to content
This repository has been archived by the owner on Jan 10, 2023. It is now read-only.

Details in the implementation of BigGAN #50

Open
tsc2017 opened this issue May 15, 2020 · 2 comments
Open

Details in the implementation of BigGAN #50

tsc2017 opened this issue May 15, 2020 · 2 comments

Comments

@tsc2017
Copy link

tsc2017 commented May 15, 2020

Hi, I find that there are some details in the implementation of BigGAN worth paying attention to.

First, I notice that the default moments used for batchnorm during inference are the accumulated values:

standardize_batch.use_moving_averages = False

if use_moving_averages:
mean, variance = _moving_moments_for_inference(
mean=mean, variance=variance, is_training=is_training, decay=decay)
else:
mean, variance = _accumulated_moments_for_inference(
mean=mean, variance=variance, is_training=is_training)

Does it mean that the hyperparameter decay for batchnorm is not used at all?

standardize_batch.decay = 0.9

Second, I also notice that the shortcuts are added only when in_channels !=out_channels:

add_shortcut=in_channels != out_channels,

which is different from BigGAN-pytorch:
https://github.com/ajbrock/BigGAN-PyTorch/blob/98459431a5d618d644d54cd1e9fceb1e5045648d/layers.py#L388
https://github.com/ajbrock/BigGAN-PyTorch/blob/98459431a5d618d644d54cd1e9fceb1e5045648d/layers.py#L427
that uses shortcuts all the time and the shortcuts are learnable when in_channels !=out_channels or when the block is an upsampling or downsampling block.

Third, I find that BigGAN-pytorch omit the first relu activation in the first DBlock by setting preactivation=False, which is consistent with the implementation of WGAN-GP(I guess since the range you use for the imput of D is [0,1] instead of [-1, 1], the first relu does not harm). Also, in the shortcut connecting of the first DBlock in WGAN-GP and BigGAN-pytorch, pooling comes before convolution, while in this repo, convolution comes before pooling, as in the other DBlocks.

Do you think these discrepancy would have a significant influence on the performance of BigGAN?

Thanks

@Baran-phys
Copy link

same question.

@gwern
Copy link

gwern commented Nov 18, 2020

that uses shortcuts all the time and the shortcuts are learnable when in_channels !=out_channels or when the block is an upsampling or downsampling block.

Are you sure about that? The logic of doing the conv_sc stuff appears to be the same in both compare_gan and BigGAN-Pytorch: check channels, and if not, not.

You may have a point about the pooling/convolution order. Have you tried swapping them? I hope it doesn't make a difference. (mooch noted that compare_gan never converged to the quality of the original BigGAN or BigGAN-Pytorch, but that no one knew why; we found the same thing, the final quality, no matter how many runs we did, was never nearly as good as it should be. Convolution-then-pooling instead of pooling-then-convolution doesn't seem like it ought to matter that much... but who knows?) Do you have a diff for that or have you tried running it?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants