Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The number of parameters is doubled #89

Open
sansiro77 opened this issue Jul 1, 2021 · 5 comments
Open

The number of parameters is doubled #89

sansiro77 opened this issue Jul 1, 2021 · 5 comments

Comments

@sansiro77
Copy link
Contributor

sansiro77 commented Jul 1, 2021

Here is the simplest example.

fc1 = BayesianLinear(1, 1)
print(list(fc1.parameters()))
pytorch_total_params = sum(p.numel() for p in fc1.parameters() if p.requires_grad)

The output is:

[Parameter containing:
tensor([[0.0651]], requires_grad=True), Parameter containing:
tensor([[-7.1001]], requires_grad=True), Parameter containing:
tensor([-0.0429], requires_grad=True), Parameter containing:
tensor([-6.9712], requires_grad=True), Parameter containing:
tensor([[0.0651]], requires_grad=True), Parameter containing:
tensor([[-7.1001]], requires_grad=True), Parameter containing:
tensor([-0.0429], requires_grad=True), Parameter containing:
tensor([-6.9712], requires_grad=True)]
total parameters: 8

The parameters are fc1.weight_mu, fc1.weight_rho, fc1.bias_mu, fc1.bias_rho, fc1.weight_sampler.mu, fc1.weight_sampler.rho, fc1.bias_sampler.mu, fc1.bias_sampler.rho, respectively, which is double what is expected.

@sansiro77
Copy link
Contributor Author

sansiro77 commented Jul 1, 2021

My current solution is:

count = 0
for name, param in net.named_parameters():
    if ("sampler" not in name) and param.requires_grad:
        count += param.numel()
print(count)

@Philippe-Drolet
Copy link

I am also wondering what these parameters mean respectively, thank you

@sansiro77
Copy link
Contributor Author

I am also wondering what these parameters mean respectively, thank you

In Bayesian neural networks, each parameter ("weight" and "bias") is a random variable related to a distribution, which is Gaussian here.
"mu" is the mean and "rho" is related to sigma by self.sigma = torch.log1p(torch.exp(self.rho)).
A "sampler" samples a specific value of the distribution every time the model is calculated.

@Philippe-Drolet
Copy link

Thanks for the reply! I knew that but I mean more that from what I have seen with BNNs in general, the weight distribution is rarely a perfect normal dist centered at mu and sigma (it usually is more of a gaussian mixture) but here, every weight dist I obtain are like that. Is it variational inference that always gives perfect normal distributions?

@sansiro77
Copy link
Contributor Author

In the paper "Weight Uncertainty in Neural Networks", the authors applied Gaussian variational posterior and scale mixture prior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants