Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The number of parameters on BatchNormalization module #89

Open
hideakikuratsu opened this issue Feb 28, 2022 · 2 comments
Open

The number of parameters on BatchNormalization module #89

hideakikuratsu opened this issue Feb 28, 2022 · 2 comments
Labels
question Further information is requested

Comments

@hideakikuratsu
Copy link

The number of parameters of each module is calculated by following code,

def get_model_parameters_number(model):
params_num = sum(p.numel() for p in model.parameters() if p.requires_grad)
return params_num

I used this code on torch.nn.BatchNorm2d like this
import torch
bn = torch.nn.BatchNorm2d(10)
sum(p.numel() for p in bn.parameters() if p.requires_grad)
Last line returns 20, but torch.nn.BatchNorm2d also has running (moving) mean and variance as parameters, doesn't it?
so I thought the correct number of parameters on torch.nn.BatchNorm2d(10) is
the number of weight parameters = 10
the number of bias parameters = 10
the number of running mean parameters = 10
the number of running var parameters = 10
that is, 10 * 4 = 40.
so I'm appreciated if you explain this! thank you!

@morkovka1337
Copy link

Hi @hello-friend1242954
Weight and bias are parameters in the BN layer (they are updated during the back propagation). Running mean and variance are calculated during the forward pass, that's why, I think, they are not considered as parameters (since they do not require gradient).
https://d2l.ai/chapter_convolutional-modern/batch-norm.html#training-deep-networks

@hideakikuratsu
Copy link
Author

Thank you for the reply!
I agree that we have to judge whether they are counted as parameters or not by considering if they require gradient or not,
but they also undoubtedly take up some static memory/storage spaces, right?
So I thought the definition of 'parameters' is very ambiguous.
Thank you for clear explanation!!

@sovrasov sovrasov added the question Further information is requested label Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants