-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different Dropout behavior on macOS and Linux #121595
Comments
I'm afraid this is expected, see the first line in the reproducibility doc: https://pytorch.org/docs/stable/notes/randomness.html |
Sorry about the missing Linux (CPU)2.3.0.dev20240311+cu121
tensor([[2.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.8966, 0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.6206, 0.0000, 0.0000, 0.0000],
[0.5516, 0.4920, 0.0000, 0.0000, 0.0000, 0.0000],
[0.4350, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]]) macOS (CPU)2.3.0.dev20240311
tensor([[2.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.7600, 0.6194, 0.6206, 0.0000, 0.0000, 0.0000],
[0.0000, 0.4920, 0.4924, 0.0000, 0.0000, 0.0000],
[0.0000, 0.3966, 0.0000, 0.3776, 0.0000, 0.0000]]) |
Thanks for the response @albanD , which section from the doc do you mean in particular so that I can quote it with an explanation. I am still surprised that it produces different results though. Other scientific packages like NumPy, SciPy, Scikit-Learn etc are consistent on macos and Linux. Operations like random weight initialization in PyTorch are also consistent across macOS and Linux. It seems that only Dropout is impacted. E.g., import torch
torch.manual_seed(123)
torch.nn.Linear(4, 4).weight returns Parameter containing:
tensor([[-0.2039, 0.0166, -0.2483, 0.1886],
[-0.4260, 0.3665, -0.3634, -0.3975],
[-0.3159, 0.2264, -0.1847, 0.1871],
[-0.4244, -0.3034, -0.1836, -0.0983]], requires_grad=True) on both macOS and Linux. |
The first paragraph: That being said, we do try to be consistent but it is not always easy and is hard to enforce. I'm afraid this is not going to be very high priority as, as you pointed out, it has been like that for a while. |
Thanks. For some reason my brain skipped over this and focused on the "between CPU and GPU executions" part.
Sounds fair! |
I try to dig little deeper to this issue. It seems to be a consistency problem between MKL and non-MKL implemation of Dropout use
Using this code:
the output is:
With the same code, the output is We can see the discrepancy between two versions of @rasbt IMO, if you are using mac silicon, it lacks MKL support but official linux pytorch release has MKL support. This is the reason why linux and mac cpu results differ. @albanD This is not across platform or device, is this expected? |
IMO this still fits expected behavior pattern. According to #69967 MKL implementation is much faster, and still statistical criteria for the RNG. But perhaps one can disable it if say deterministic mode is enabled |
If MKL generates rng parellelly to speed up, just like CUDA, then the result inevitable will be different since RNG algorithm is different. As you suggest, can we use non-MKL implementaion when |
We could do that but it won't allow you to rely on the fact that the generated number will be the same accross version and platform as other things might lead to the same behavior in the future. I don't think we want to do anything here since the discrepancy is from build-time availability and lead to significant speedup. |
@albanD I agree. At least can I add this behaviour to bernoulli and dropout's document? IMO, this behaviour is out of most of users' expectations. |
I don't think bernoulli and dropout are special here. If we do something, we could add a comment to every single random function pointing to https://pytorch.org/docs/stable/notes/randomness.html . |
馃悰 Describe the bug
Even when using a fixed random seed, the dropout masks in macOS and Linux are different. Here is some code to reproduce this:
Below are the results. Note that this is consistent across PyTorch versions (I tested 2.1.0 and 2.2.1) and whether the code is run on the CPU or GPU:
macOS torch 2.1.0, CPU
macOS torch 2.2.1, CPU
Google Colab (Linux) torch 2.1.0, CPU
Google Colab (Linux) torch 2.2.1, CPU
Google Colab (Linux) torch 2.1.0, CPU
Versions
MacOS environment
Linux environment
cc @pbelevich
The text was updated successfully, but these errors were encountered: