New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
torch.multinomial chooses elements with zero weight #13867
Comments
@jcjohnson can you confirm that you are running the latest pytorch when running this script? |
@zou3519 I just reinstalled from the nightly build, version 1.0.0.dev20181112. Can you point me to the earlier bugfix? |
My bad, it looks like we fixed this for CUDA but we did not test on CPU: #4858. We'll look into it and get it fixed, thank you for the report :) |
That's weird -- I'm seeing this issue only on CUDA, and it works properly when I cast weights to CPU. |
Got it, I didn't realize your weights were on CUDA. I can reproduce the assertion using your weights, so something is indeed wrong with the multinomial implementation |
I'm wondering if floating point error could be to blame. One interesting thing to note that
|
Isn't that correct? 1.6399e-05 is small but positive. However many of the weights are quite small (and will become even smaller if multinomial internally renormalizes to sum to one) so I wouldn't be surprised if some floating point error were to blame. |
Of course -- my apologies, I was reading that too quickly. |
No worries, I'm grateful for the fast response =) |
@jcjohnson @zou3519 I think the problem is more with how we are seeding a Mersenne Twister engine. I recently learned that the 19937 states of a Mersenne Twister engine is very prone to getting into a bad state when one seeds the engine with a number with many 0 bits ("all zeros causes it to not work at all, whereas lots of zero bits are merely bad" - http://www.pcg-random.org/posts/cpp-seeding-surprises.html). I ran your script with seed = 10, and it breaks the assertion at trial 17. Your script passes in my current PR #13070, (the PR is almost done and is waiting on some builds). I have changed the CUDA generator engine for multinomial to philox engine and I suppose the script passes because the philox engine doesn't have as many states as a Mersenne twister engine and we are seeding it properly with a 64 bit number. |
In https://pytorch.org/docs/stable/torch.html?highlight=multinomial#torch.multinomial
Why torch.multinomial outputs [1,2,0,0] ? Since replacement=False, it can not generate same indexes. |
Is there any update on this? It has been two months. |
Hi @jcjohnson . Apologies for the super long delay! My PR referred above became huge for review, so I'm currently breaking that up into two parts. I promise to push the two parts by end of this week. |
Thanks! Your PR looks pretty nontrivial indeed, so I'm not surprised it has taken a while to get sorted out. I'm looking forward to it! |
So in terms of a minimal fix: I think the main options for a minimal fix ("1.0.1") are
I would expect the second to be the least risky fix because it seems to add the least logic. |
I seem to have a simpler repro:
I'll have the PR in a few moments. |
The cumsum over the probabilities can be not monotonically non-decreasing. Thus it is hard to detect zero probability classes using just the cumsum. This changes the binary search postprocessing to use the (non-cumulated) distribution instead. Thank you, @jcjohnson, for the bug report with reproducing case. Fixes: pytorch#13867
Summary: The cumsum over the probabilities can be not monotonically non-decreasing. Thus it is hard to detect zero probability classes using just the cumsum. This changes the binary search postprocessing to use the (non-cumulated) distribution instead. Thank you, jcjohnson, for the bug report with reproducing case. Fixes: #13867 Pull Request resolved: #16075 Differential Revision: D13695565 Pulled By: soumith fbshipit-source-id: 02c4d6f868f0050c1ae7d333f4317c5610e49cd9
馃悰 Bug
torch.multinomial
occasionally samples elements with zero weight. This should never happen.To Reproduce
I've been unable to reproduce this issue with randomly generated weights, so I've included a particular value of weights from my application that triggers this behavior:
These weights are all nonnegative (but contain a lot of zeros), have a nonzero sum, and contain no NaNs or Infs.
I fail the assertion on trial 6.
Environment
PyTorch version: 1.0.0.dev20181112
Is debug build: No
CUDA used to build PyTorch: 9.0.176
OS: Ubuntu 16.04.4 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
CMake version: version 3.5.1
Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 9.0.176
GPU models and configuration:
GPU 0: Quadro GP100
GPU 1: Quadro GP100
Nvidia driver version: 396.51
cuDNN version: Could not collect
Versions of relevant libraries:
[pip] Could not collect
[conda] pytorch 0.4.1 py37_py36_py35_py27__9.0.176_7.1.2_2 pytorch
[conda] pytorch-nightly 1.0.0.dev20181112 py3.7_cuda9.0.176_cudnn7.1.2_0 pytorch
[conda] torchvision 0.2.1
[conda] torchvision 0.2.1 py37_1 pytorch
The text was updated successfully, but these errors were encountered: