Multinomial without replacement produces samples that have zero probability #50034

ngimel · 2021-01-04T07:05:29Z

🐛 Bug

Since moving to more efficient algorithm for sampling multinomial without replacement, we don't check if probability tensor has enough non-zero elements to sample the requested number of samples. We only check if at least one of the probabilities is positive, and that number of samples is less than number of classes.

To Reproduce

Below, probability tensor has 3 non-zero element, so we should be able to generate at most 3 samples without replacement. When requesting 4, numpy errors out, but pytorch produces a sample with 0 probability (4).
Steps to reproduce the behavior:

In [4]: np.random.choice(5, 4, p=[0.1, 0, 0.3, 0.6, 0], replace=False)                                                                                                                                                              
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-e39b22db940b> in <module>
----> 1 np.random.choice(5, 4, p=[0.1, 0, 0.3, 0.6, 0], replace=False)

mtrand.pyx in numpy.random.mtrand.RandomState.choice()

ValueError: Fewer non-zero entries in p than size

In [7]: torch.multinomial(torch.tensor([0.1, 0, 0.3, 0.6, 0]), 4, replacement=False)                                                                                                                                                
Out[7]: tensor([2, 3, 0, 4])

Checking the number of non-zeros in the probability tensor is an additional perf penalty (already in many cases checking the probability array is longer than actual sampling), so we should decide if we want this, or if we should have "unsafe" API avoiding checks.

cc @fritzo @neerajprad @alicanb @vishwakftw @nikitaved @mruberry @rgommers @heitorschueroff @pbelevich @t-vi
cc @fritzo @neerajprad @alicanb @vishwakftw @nikitaved @mruberry @rgommers @heitorschueroff @pbelevich

The text was updated successfully, but these errors were encountered:

shapovalov · 2021-06-28T16:59:57Z

Thanks for reporting it; I also faced this bug. Can this behaviour be considered part of the API now?
I rely on the function throwing an error to switch to the replacement=True mode for offending rows in weights.
I can rewrite the code to sample with replacement and then overwrite some of the rows but if the function will throw again in a future release, my code will become incorrect once more. :)

lendle · 2022-12-06T20:43:20Z

I'm also facing this bug. If the input array does have at least num_samples non-negative values per row, is it safe to assume that torch.multinomial behaves correctly? If so I can at least manually check that my input is good with something like (input.count_nonzero(1) > num_items).all()

SaschaFroelich · 2023-05-28T14:03:34Z

I ran into this same problem. Ideally, sampling from a multinomial distribution with replacement=True or replacement=False should return the same result when the input vector contains probabilities.

min-jean-cho · 2023-05-31T01:20:43Z

For sampling with/without replacement, we need to first align on whether torch.multinomial is like np.random.choice or is sampling from multinomial distribution.

np.random.choice is not multinomial distribution. np.random.choice generates sample points (x) that follows UNIVARIATE categorical distribution. The output shape is (num_samples). Here, sampling with/without replacement makes sense.

Multinomial distribution does not have a concept like replacement. If X follows MULTIVARIATE probability distribution, there should be a dimensionality k. The sampled point x is a vector, not a scalar or single value.
e.g., Multinomial (multivariate) Distribution M(p)
p=[p_1, p_2, ..., p_k]
x = [x_1, x_2, ..., x_k]
The output shape is (num_samples, k).

>>> np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0])
array([3, 2, 2])
>>> np.random.multinomial(20, [0.1, 0, 0.3, 0.6, 0], size=1)
array([[ 4,  0,  4, 12,  0]])
>>> np.random.multinomial(20, [0.1, 0, 0.3, 0.6, 0], size=2)
array([[ 1,  0,  6, 13,  0],
       [ 4,  0,  4, 12,  0]])

min-jean-cho · 2023-05-31T01:37:04Z

May I also know what is an unsafe API? If torch.multinomial is like np.random.choice, then yes, it should error out.

myazdani · 2023-05-31T06:22:00Z

@min-jean-cho the expected behavior should be running:

 import torch

torch.manual_seed(0)
probs = torch.tensor([1.0] + [0.0] * 999)
wrongs = 0
for i in range(1000000):
    sampled = torch.multinomial(probs, num_samples=1)
    if sampled != 0:
        wrongs += 1

print(f"{100 * wrongs / 1000000:.4f}")

should print 0 wrongs, but in PyTorch 2 results in 0.0042% while in PyTorch 1.0 it shows 0%.

@ngimel FWIW, if I set the probs as a double then the issue goes away in PyTorch 2.0. I don't think we should expect the behavior change based on the data type.

Specifically, running below results in 0 wrongs:

torch.manual_seed(0)
probs = torch.tensor([1.0] + [0.0] * 999,  dtype=torch.double)
wrongs = 0
for i in range(1000000):
    sampled = torch.multinomial(probs, num_samples=1)
    if sampled != 0:
        wrongs += 1

print(f"{100 * wrongs / 1000000:.4f}")

min-jean-cho · 2023-05-31T16:23:48Z

@myazdani, please try on the master branch. The above issue has been fixed by #101720.

mmattb · 2023-09-12T21:38:58Z

I'm a bit confused by the initial bug report above, and responding to @lendle: I can reproduce this if we have a single 0.0 in a larger tensor and sample only a single value. That tells me this doesn't have to do with replacement or having more non-negative entries than the number of samples.

I don't follow all the versions mentioned above, so not sure if I should have a fix; my version is: '2.0.1+cu117'.

Repro:

In [100]: iterations = 0
     ...: while True:
     ...:     iterations += 1
     ...:     if torch.multinomial(dist, 1).item() == 2:
     ...:         print("uh oh", iterations)
     ...:         break
uh oh 9923822

>>> print(dist)
tensor([0.3300, 0.3400, 0.0000, 0.3300])

>>> print(dist.sum())
tensor(1.)

In [102]: dist[2] == 0
Out[102]: tensor(True)

mruberry added the module: ux label Jan 4, 2021

ngimel mentioned this issue Mar 25, 2022

[bug] torch.multinomial should throw error as documented #74771

Open

ngimel mentioned this issue May 30, 2023

torch.multinomial selects elements with zero weight #48841

Closed

min-jean-cho mentioned this issue Aug 22, 2023

Problems of torch.multinomial #107406

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multinomial without replacement produces samples that have zero probability #50034

Multinomial without replacement produces samples that have zero probability #50034

ngimel commented Jan 4, 2021 •

edited by pytorch-probot bot

shapovalov commented Jun 28, 2021

lendle commented Dec 6, 2022

SaschaFroelich commented May 28, 2023 •

edited

min-jean-cho commented May 31, 2023

min-jean-cho commented May 31, 2023

myazdani commented May 31, 2023

min-jean-cho commented May 31, 2023

mmattb commented Sep 12, 2023 •

edited

Multinomial without replacement produces samples that have zero probability #50034

Multinomial without replacement produces samples that have zero probability #50034

Comments

ngimel commented Jan 4, 2021 • edited by pytorch-probot bot

🐛 Bug

To Reproduce

shapovalov commented Jun 28, 2021

lendle commented Dec 6, 2022

SaschaFroelich commented May 28, 2023 • edited

min-jean-cho commented May 31, 2023

min-jean-cho commented May 31, 2023

myazdani commented May 31, 2023

min-jean-cho commented May 31, 2023

mmattb commented Sep 12, 2023 • edited

ngimel commented Jan 4, 2021 •

edited by pytorch-probot bot

SaschaFroelich commented May 28, 2023 •

edited

mmattb commented Sep 12, 2023 •

edited