Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential bug when sampling from categorical distribution #5062

Closed
christinaheinze opened this issue Feb 5, 2018 · 12 comments · Fixed by #5093
Closed

Potential bug when sampling from categorical distribution #5062

christinaheinze opened this issue Feb 5, 2018 · 12 comments · Fixed by #5093

Comments

@christinaheinze
Copy link

Hi,

I would like to sample from a categorical distribution where the probabilities passed are the columns of a tensor p_dG:

G = 3
D = 2
p_dG = torch.Tensor(G, D)
p_dG[:, 0] = torch.Tensor([0.1, 0.8, 0.1])
p_dG[:, 1] = torch.Tensor([0.1, 0.8, 0.1])

z_list = []
p_dg = p_dG[:, 0]
print(p_dg)
dis_Z = dis.Categorical(p_dg)
for _ in range(250):
    z = dis_Z.sample()
    z_list.append(z)

z = torch.cat(z_list, dim=0)

true_z_np = z.numpy()
v, c = np.unique(true_z_np, return_counts=True)
print(v)
print(c)

The output from print(c) is [ 29 25 196]. So the distribution is completely off since the bulk of the observations should be equal to 1 (instead of being equal to 2).

In contrast, the following code works as expected:

z_list = []
dis_Z = dis.Categorical(torch.Tensor([0.1, 0.8, 0.1]))
for _ in range(250):
    z = dis_Z.sample()
    z_list.append(z)

z = torch.cat(z_list, dim=0)
true_z_np = z.numpy()
v, c = np.unique(true_z_np, return_counts=True)
print(v)
print(c)

Is this a bug or am I missing something?

I am using:

  • OS: macOS High Sierra
  • PyTorch version: 0.3.0
  • How you installed PyTorch (conda, pip, source): conda
  • Python version: Python 3.6.3 |Anaconda custom (64-bit)| (default, Oct 6 2017, 12:04:38)
  • CUDA/cuDNN version: --
  • GPU models and configuration: --
  • GCC version (if compiling from source): --

Thanks!

@ssnl
Copy link
Collaborator

ssnl commented Feb 5, 2018

Yes this is a bug. Fortunately it seems to have been fixed on master. Thanks for reporting!

@soumith
Copy link
Member

soumith commented Feb 5, 2018

@christinaheinze you can follow instructions here to easily compile for master: https://github.com/pytorch/pytorch#from-source

@soumith soumith closed this as completed Feb 5, 2018
@christinaheinze
Copy link
Author

Ok, thanks!

@christinaheinze
Copy link
Author

On master the issue has been solved for the categorial distribution but not for the multinomial distribution as it seems:

import torch.distributions as dis
import torch
import numpy as np

G = 3
D = 2
p_dG = torch.Tensor(G, D)
p_dG[:, 0] = torch.Tensor([0.1, 0.8, 0.1])
p_dG[:, 1] = torch.Tensor([0.1, 0.8, 0.1])

p_dg = p_dG[:, 0]
z = p_dg.multinomial(250, replacement=True)
true_z_np = z.numpy()
v, c = np.unique(true_z_np, return_counts=True)
print(v)
print(c)

and also

z = torch.multinomial(p_dg, 250, replacement=True)
true_z_np = z.numpy()
v, c = np.unique(true_z_np, return_counts=True)
print(v)
print(c)

@ssnl
Copy link
Collaborator

ssnl commented Feb 6, 2018

I'll take a look at this tomorrow.

@alicanb
Copy link
Collaborator

alicanb commented Feb 6, 2018

Same problem is in distributions.Multinomial as well, even though it doesn't use torch.multinomial for sampling. I'll check that out

@ssnl
Copy link
Collaborator

ssnl commented Feb 6, 2018

@alicanb It's maintaining a _categorical object internally. So it's probably that.

@alicanb
Copy link
Collaborator

alicanb commented Feb 6, 2018

On second thought, what is the bug here?

@ssnl
Copy link
Collaborator

ssnl commented Feb 6, 2018

@alicanb It's CPU torch.multinomial not working on noncontiguous tensors. I'm writing a fix now.

@ssnl
Copy link
Collaborator

ssnl commented Feb 6, 2018

@alicanb Btw, do you know why distributions.Multinomial use a distributions.Categorical and a scatter_add, instead of just using torch.multinomial? What's it that I am missing here?

@alicanb
Copy link
Collaborator

alicanb commented Feb 6, 2018

@ssnl torch.multinomial is misnomer actually, it's equivalent to torch.distributions.Categorical(...).sample(). We thought about correcting it, but it's too much of a breaking change. Here is the discussion probtorch#46

@alicanb
Copy link
Collaborator

alicanb commented Feb 6, 2018

@christinaheinze if you're trying to implement multinomial distribution btw, we have distributions.Multinomial(total_count, probs) on master that behaves correctly

bwasti added a commit to bwasti/pytorch that referenced this issue Nov 16, 2020
Summary:
Pull Request resolved: pytorch/glow#5062

Pull Request resolved: pytorch#45556

User defined classes can be used as constants.  This is useful when freezing and removing the module from the graph.

Test Plan: waitforsadcastle

Reviewed By: eellison

Differential Revision: D23994974

fbshipit-source-id: 6c494269e222b7e1b5ecddf0c460ae8c09ac0556
facebook-github-bot pushed a commit that referenced this issue Nov 17, 2020
Summary:
Pull Request resolved: pytorch/glow#5062

Pull Request resolved: #45556

User defined classes can be used as constants.  This is useful when freezing and removing the module from the graph.

Test Plan: waitforsadcastle

Reviewed By: eellison

Differential Revision: D23994974

fbshipit-source-id: 5b4a5c91158aa7f22df39d71f2658afce1d29317
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants