Potential bug when sampling from categorical distribution #5062

christinaheinze · 2018-02-05T21:19:18Z

Hi,

I would like to sample from a categorical distribution where the probabilities passed are the columns of a tensor p_dG:

G = 3
D = 2
p_dG = torch.Tensor(G, D)
p_dG[:, 0] = torch.Tensor([0.1, 0.8, 0.1])
p_dG[:, 1] = torch.Tensor([0.1, 0.8, 0.1])

z_list = []
p_dg = p_dG[:, 0]
print(p_dg)
dis_Z = dis.Categorical(p_dg)
for _ in range(250):
    z = dis_Z.sample()
    z_list.append(z)

z = torch.cat(z_list, dim=0)

true_z_np = z.numpy()
v, c = np.unique(true_z_np, return_counts=True)
print(v)
print(c)

The output from print(c) is [ 29 25 196]. So the distribution is completely off since the bulk of the observations should be equal to 1 (instead of being equal to 2).

In contrast, the following code works as expected:

z_list = []
dis_Z = dis.Categorical(torch.Tensor([0.1, 0.8, 0.1]))
for _ in range(250):
    z = dis_Z.sample()
    z_list.append(z)

z = torch.cat(z_list, dim=0)
true_z_np = z.numpy()
v, c = np.unique(true_z_np, return_counts=True)
print(v)
print(c)

Is this a bug or am I missing something?

I am using:

OS: macOS High Sierra
PyTorch version: 0.3.0
How you installed PyTorch (conda, pip, source): conda
Python version: Python 3.6.3 |Anaconda custom (64-bit)| (default, Oct 6 2017, 12:04:38)
CUDA/cuDNN version: --
GPU models and configuration: --
GCC version (if compiling from source): --

Thanks!

The text was updated successfully, but these errors were encountered:

ssnl · 2018-02-05T22:06:21Z

Yes this is a bug. Fortunately it seems to have been fixed on master. Thanks for reporting!

soumith · 2018-02-05T22:08:39Z

@christinaheinze you can follow instructions here to easily compile for master: https://github.com/pytorch/pytorch#from-source

christinaheinze · 2018-02-05T22:09:35Z

Ok, thanks!

christinaheinze · 2018-02-05T22:38:16Z

On master the issue has been solved for the categorial distribution but not for the multinomial distribution as it seems:

import torch.distributions as dis
import torch
import numpy as np

G = 3
D = 2
p_dG = torch.Tensor(G, D)
p_dG[:, 0] = torch.Tensor([0.1, 0.8, 0.1])
p_dG[:, 1] = torch.Tensor([0.1, 0.8, 0.1])

p_dg = p_dG[:, 0]
z = p_dg.multinomial(250, replacement=True)
true_z_np = z.numpy()
v, c = np.unique(true_z_np, return_counts=True)
print(v)
print(c)

and also

z = torch.multinomial(p_dg, 250, replacement=True)
true_z_np = z.numpy()
v, c = np.unique(true_z_np, return_counts=True)
print(v)
print(c)

ssnl · 2018-02-06T01:14:34Z

I'll take a look at this tomorrow.

alicanb · 2018-02-06T21:06:20Z

~~Same problem is in distributions.Multinomial as well, even though it doesn't use torch.multinomial for sampling. I'll check that out~~

ssnl · 2018-02-06T21:15:30Z

@alicanb It's maintaining a _categorical object internally. So it's probably that.

alicanb · 2018-02-06T21:16:04Z

On second thought, what is the bug here?

ssnl · 2018-02-06T21:16:59Z

@alicanb It's CPU torch.multinomial not working on noncontiguous tensors. I'm writing a fix now.

ssnl · 2018-02-06T21:19:04Z

@alicanb Btw, do you know why distributions.Multinomial use a distributions.Categorical and a scatter_add, instead of just using torch.multinomial? What's it that I am missing here?

alicanb · 2018-02-06T22:10:00Z

@ssnl torch.multinomial is misnomer actually, it's equivalent to torch.distributions.Categorical(...).sample(). We thought about correcting it, but it's too much of a breaking change. Here is the discussion probtorch#46

alicanb · 2018-02-06T22:12:12Z

@christinaheinze if you're trying to implement multinomial distribution btw, we have distributions.Multinomial(total_count, probs) on master that behaves correctly

Summary: Pull Request resolved: pytorch/glow#5062 Pull Request resolved: pytorch#45556 User defined classes can be used as constants. This is useful when freezing and removing the module from the graph. Test Plan: waitforsadcastle Reviewed By: eellison Differential Revision: D23994974 fbshipit-source-id: 6c494269e222b7e1b5ecddf0c460ae8c09ac0556

Summary: Pull Request resolved: pytorch/glow#5062 Pull Request resolved: #45556 User defined classes can be used as constants. This is useful when freezing and removing the module from the graph. Test Plan: waitforsadcastle Reviewed By: eellison Differential Revision: D23994974 fbshipit-source-id: 5b4a5c91158aa7f22df39d71f2658afce1d29317

soumith closed this as completed Feb 5, 2018

soumith reopened this Feb 6, 2018

soumith added bug high priority labels Feb 6, 2018

ssnl mentioned this issue Feb 6, 2018

Fix CPU torch.multinomial with noncontiguous prob tensor #5093

Merged

soumith closed this as completed in #5093 Feb 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential bug when sampling from categorical distribution #5062

Potential bug when sampling from categorical distribution #5062

christinaheinze commented Feb 5, 2018

ssnl commented Feb 5, 2018

soumith commented Feb 5, 2018

christinaheinze commented Feb 5, 2018

christinaheinze commented Feb 5, 2018

ssnl commented Feb 6, 2018

alicanb commented Feb 6, 2018 •

edited

ssnl commented Feb 6, 2018

alicanb commented Feb 6, 2018

ssnl commented Feb 6, 2018

ssnl commented Feb 6, 2018

alicanb commented Feb 6, 2018 •

edited

alicanb commented Feb 6, 2018 •

edited

Potential bug when sampling from categorical distribution #5062

Potential bug when sampling from categorical distribution #5062

Comments

christinaheinze commented Feb 5, 2018

ssnl commented Feb 5, 2018

soumith commented Feb 5, 2018

christinaheinze commented Feb 5, 2018

christinaheinze commented Feb 5, 2018

ssnl commented Feb 6, 2018

alicanb commented Feb 6, 2018 • edited

ssnl commented Feb 6, 2018

alicanb commented Feb 6, 2018

ssnl commented Feb 6, 2018

ssnl commented Feb 6, 2018

alicanb commented Feb 6, 2018 • edited

alicanb commented Feb 6, 2018 • edited

alicanb commented Feb 6, 2018 •

edited

alicanb commented Feb 6, 2018 •

edited

alicanb commented Feb 6, 2018 •

edited