Bits types cannot be used under deterministic mode #109802

ngimel · 2023-09-21T17:22:51Z

#104995 made it impossible to use bits types under deterministic mode:

import torch
torch.set_deterministic_debug_mode("warn")
x=torch.empty(4, dtype=torch.bits8)

produces RuntimeError: "fill_empty_deterministic_" not implemented for 'Bits8'
While I'm sympathetic to the goals of #104995, I think that a blanket approach forcing filling all empty tensors is too strict. In many cases, determinism mode is set in production runs (because if only deterministic ops are used it imposes very little overhead and provides nice guarantees), and with valid code the empty calls are not a problem. The approach taken in #104995 requires paying filling penalty unconditionally, and also requires implementing fill op for all bits types which pytorch currently doesn't do.
Deterministic empty tensors is a valuable option to have, but in my opinion should not be mixed with existing deterministic mode.

cc @mruberry @kurtamohler

The text was updated successfully, but these errors were encountered:

mikaylagawarecki · 2023-09-21T20:46:36Z

cc @kurtamohler @albanD

vadimkantorov · 2023-09-21T22:39:08Z

One way could be introducing some special arg torch.empty(..., swear_that_i_fill_it_later_deterministically=True) :)

kurtamohler · 2023-09-28T18:57:23Z

We could separate the empty fill feature into its own global setting, so that users can keep it disabled to get better performance.

We could add torch.enable_fill_empty(mode: bool) and torch.is_fill_empty_enabled() -> bool.

It seems like this feature could potentially be useful for troubleshooting even if determinism is not enabled. If we do this, we should probably mention it on the Reproducibility page.

However, I wonder whether fill empty mode should be turned on by default when the user turns on deterministic mode, just to be safe

albanD · 2023-09-28T20:05:19Z

I wonder whether fill empty mode should be turned on by default when the user turns on deterministic mode,

I'm not sure to follow what you mean? It already is.

We just need to implement fill_empty_deterministic_ for the bits* dtypes. And filling them with 0s sounds like a good default for bits dtypes.

kurtamohler · 2023-09-28T20:47:51Z

I wonder whether fill empty mode should be turned on by default when the user turns on deterministic mode,

I'm not sure to follow what you mean? It already is.

Yes, it is currently, but I was talking about what potentially should happen if we decided to separate the fill empty feature into its own global setting.

There are really two separate issues that Natalia brought up:

Fill empty does not support bits dtypes
Fill empty degrades performance, and it would be nice if the user could turn it off if they know that they are not using empty data as inputs to operations

I was only addressing the second issue in my earlier comment

ngimel · 2023-09-29T02:57:45Z

I'm not sure to follow what you mean? It already is.

Agree with @kurtamohler, it already is, but I argue it shouldn't be, because non-deterministic empty is not a problem for valid programs that nonetheless want to use deterministic mode, so there's no point in paying this overhead.

albanD · 2023-09-29T08:56:29Z

I would still expect that, at least by default, when in deterministic mode, the function returns always the same thing. This is useful for example for testing :D

What about:

We add a new torch.utils.deterministic namespace
We add a torch.utils.deterministic.fill_uninitialized_memory that controls this behavior
Have fill_uninitialized_memory=True by default

albanD · 2023-09-29T16:15:20Z

From offline discussion with Natalia, that sounds good. @kurtamohler will you have a bit of time to add that please?

kurtamohler · 2023-09-29T17:17:20Z

Sounds good! Yep I can add that

kurtamohler · 2023-10-05T21:42:30Z

I am almost ready to submit a PR that adds torch.utils.deterministic.fill_uninitialized_memory. But I'm trying to decide how to write a test for the fill_uninitialized_memory == False case. For context, here is the test_deterministic_empty test for the existing fill_uninitialized_memory == True case: link

One idea is to just assert that the empty tensor is not full of NaN/max_int when fill_uninitialized_memory == True. For instance, self.assertFalse(res.isnan().all()). But the problem is that there is a possibility that the allocated buffer just happened to be full of NaN/max_int already.

I modified test_deterministic_empty to test the fill_uninitialized_memory == False case using the above assert. If I run it in a loop, the assert fails in one out of every handful of iterations. But, I found that if I zero out all of the tensors that get filled with NaN/max_int during the test before they go out of scope, then the assert does not fail after a large number of iterations that I tested it with. I ran test_torch.py -k fill_uninitialized_memory 140 times in a loop with no failure. I also tried adding a 10,000 iteration loop around all of the code in test_deterministic_empty, and I ran test_torch.py -k fill_uninitialized_memory 10 times without seeing any failures. However, there still is a nonzero probability that any one of these asserts can fail.

If uninitialized memory was completely random, then the probability that an int8 tensor of size 10 fails the assert is (1 / 2**8) ** 10 = 2**(-80), which is very small. But uninitialized memory is not completely random, so the probability of failure is really unknown.

So I don't think that self.assertFalse(res.isnan().all()) is a great way to test this.

I think it would be better to somehow check if at::native::fill_empty_deterministic_ was called. I'm considering adding a toggleable warning to at::native::fill_empty_deterministic_, which is turned off by default, and cannot be toggled in the public API. The tests can turn on that warning and then assert that it was not emitted in the fill_uninitialized_memory == False case, confirming that at::native::fill_empty_deterministic_ is not called. And we can also assert that the warning does get emitted in the fill_uninitialized_memory == True case. Is that a fair solution?

albanD · 2023-10-09T17:37:42Z

You're calling at::fill_empty_deterministic_() right? If so, you can use something like the LoggingMode to detect if this function is being called or not.

kurtamohler · 2023-10-09T19:59:17Z

What is LoggingMode? I searched the repo for it and didn't get any results. Are you talking about c10::LogAPIUsage?

albanD · 2023-10-09T22:07:42Z

Sorry, LoggingTensorMode, here is how it is used:

pytorch/test/test_python_dispatch.py

Lines 1089 to 1094 in 733368a

    
               def test_torch_dispatch_mode_basic(self) -> None: 
        
                   with capture_logs(is_mode=True) as logs: 
        
                       with LoggingTensorMode(): 
        
                           torch.empty([]) 
        
                   self.assertExpectedInline('\n'.join(logs), """\ 
        
           $0: f32[] = torch._ops.aten.empty.memory_format([], device=device(type='cpu'), pin_memory=False)""")

kurtamohler · 2023-10-09T23:43:03Z

That seems like a good idea. I'm not 100% sure how to get it working though.

It seems like I'll need to move fill_empty_deterministic_ from at::native to at and add it to native_functions.yaml. That part is simple enough.

However, I'm not sure how to make the at::fill_empty_deterministic_ call show up in the logs captured by LoggingTensorMode. I tried adding a call to an already existing at operator in at::native::empty_cpu, and it doesn't show up in the logs captured by LoggingTensorMode. I just added result = at::resize(result, result.sizes()) to the deterministic case, and I ran this:

import torch

def get_logs(fn):
    from torch.testing._internal.logging_tensor import LoggingTensorMode, capture_logs
    with capture_logs(is_mode=True) as logs:
        with LoggingTensorMode():
            fn()
    return logs

torch.use_deterministic_algorithms(True)
print(get_logs(lambda: torch.empty(10)))

I confirmed in gdb that at::native::_resize does get called when I expected, but the log capture script just prints this:

["$0: f32[10] = torch._ops.aten.empty.memory_format(['10'], device=device(type='cpu'), pin_memory=False)"]

albanD · 2023-10-10T16:10:18Z

The mode will only capture calls that go through the dispatcher. Calls into the at::native namespace don't go through the dispatcher and thus will not show up there.
Also thinking more about this, this might not work for this use case as you won't be able to see "nested calls" in the logging mode: if you already see the empty() call, you won't see its constituents :/

kurtamohler · 2023-10-10T17:43:07Z

Ah I see, that explains why I didn't see the call to result = at::resize(result, result.sizes()) in the log. So is it alright if I add a togglable warning like I mentioned before?

albanD · 2023-10-10T22:59:51Z

The toggable warning sounds a bit over-engineered to check that we don't do something. Especially if we don't do anything, it's hard to have code in there to say we're not doing it.

I'm personally fine with only testing the deterministic path (like we do today) and manually check that when the flag is disabled we have the expected behavior.

kurtamohler · 2023-10-16T17:59:01Z

I'm not sure how to add support for bits types to fill_empty_deterministic_. It doesn't seem to be supported in torch.empty even with deterministic mode turned off.

>>> import torch
>>> torch.empty(4, dtype=torch.bits8)
...
RuntimeError: "_local_scalar_dense_cpu" not implemented for 'Bits8'

Part of #109802 Pull Request resolved: #111377 Approved by: https://github.com/albanD

…ch#111377) Part of pytorch#109802 Pull Request resolved: pytorch#111377 Approved by: https://github.com/albanD

Part of #109802 Pull Request resolved: #111377 Approved by: https://github.com/albanD, https://github.com/aaronenyeshi

…ch#111377) Part of pytorch#109802 Pull Request resolved: pytorch#111377 Approved by: https://github.com/albanD

…ch#111377) Part of pytorch#109802 Pull Request resolved: pytorch#111377 Approved by: https://github.com/albanD, https://github.com/aaronenyeshi

…ch#111377) Part of pytorch#109802 Pull Request resolved: pytorch#111377 Approved by: https://github.com/albanD

…ch#111377) Part of pytorch#109802 Pull Request resolved: pytorch#111377 Approved by: https://github.com/albanD, https://github.com/aaronenyeshi

mikaylagawarecki added module: determinism triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Sep 21, 2023

kurtamohler self-assigned this Sep 23, 2023

kurtamohler mentioned this issue Oct 16, 2023

Add torch.utils.deterministic.fill_uninitialized_memory flag #111377

Closed

pytorchmergebot pushed a commit that referenced this issue Oct 26, 2023

Add torch.utils.deterministic.fill_uninitialized_memory flag (#111377)

f178537

Part of #109802 Pull Request resolved: #111377 Approved by: https://github.com/albanD

andreigh pushed a commit to andreigh/pytorch that referenced this issue Oct 26, 2023

Add torch.utils.deterministic.fill_uninitialized_memory flag (pytor…

16fe178

…ch#111377) Part of pytorch#109802 Pull Request resolved: pytorch#111377 Approved by: https://github.com/albanD

kurtamohler added a commit to kurtamohler/pytorch that referenced this issue Nov 1, 2023

Add torch.utils.deterministic.fill_uninitialized_memory flag (pytor…

3d1fa28

…ch#111377) Part of pytorch#109802 Pull Request resolved: pytorch#111377 Approved by: https://github.com/albanD

pytorchmergebot pushed a commit that referenced this issue Nov 1, 2023

Add torch.utils.deterministic.fill_uninitialized_memory flag (#111377)

fd20954

Part of #109802 Pull Request resolved: #111377 Approved by: https://github.com/albanD, https://github.com/aaronenyeshi

xuhancn pushed a commit to xuhancn/pytorch that referenced this issue Nov 7, 2023

Add torch.utils.deterministic.fill_uninitialized_memory flag (pytor…

2e352b2

…ch#111377) Part of pytorch#109802 Pull Request resolved: pytorch#111377 Approved by: https://github.com/albanD

Skylion007 pushed a commit to Skylion007/pytorch that referenced this issue Nov 14, 2023

Add torch.utils.deterministic.fill_uninitialized_memory flag (pytor…

7128079

…ch#111377) Part of pytorch#109802 Pull Request resolved: pytorch#111377 Approved by: https://github.com/albanD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bits types cannot be used under deterministic mode #109802

Bits types cannot be used under deterministic mode #109802

ngimel commented Sep 21, 2023 •

edited by pytorch-bot bot

mikaylagawarecki commented Sep 21, 2023

vadimkantorov commented Sep 21, 2023

kurtamohler commented Sep 28, 2023 •

edited

albanD commented Sep 28, 2023

kurtamohler commented Sep 28, 2023 •

edited

ngimel commented Sep 29, 2023

albanD commented Sep 29, 2023

albanD commented Sep 29, 2023

kurtamohler commented Sep 29, 2023

kurtamohler commented Oct 5, 2023

albanD commented Oct 9, 2023

kurtamohler commented Oct 9, 2023

albanD commented Oct 9, 2023

kurtamohler commented Oct 9, 2023

albanD commented Oct 10, 2023

kurtamohler commented Oct 10, 2023 •

edited

albanD commented Oct 10, 2023

kurtamohler commented Oct 16, 2023

Bits types cannot be used under deterministic mode #109802

Bits types cannot be used under deterministic mode #109802

Comments

ngimel commented Sep 21, 2023 • edited by pytorch-bot bot

mikaylagawarecki commented Sep 21, 2023

vadimkantorov commented Sep 21, 2023

kurtamohler commented Sep 28, 2023 • edited

albanD commented Sep 28, 2023

kurtamohler commented Sep 28, 2023 • edited

ngimel commented Sep 29, 2023

albanD commented Sep 29, 2023

albanD commented Sep 29, 2023

kurtamohler commented Sep 29, 2023

kurtamohler commented Oct 5, 2023

albanD commented Oct 9, 2023

kurtamohler commented Oct 9, 2023

albanD commented Oct 9, 2023

kurtamohler commented Oct 9, 2023

albanD commented Oct 10, 2023

kurtamohler commented Oct 10, 2023 • edited

albanD commented Oct 10, 2023

kurtamohler commented Oct 16, 2023

ngimel commented Sep 21, 2023 •

edited by pytorch-bot bot

kurtamohler commented Sep 28, 2023 •

edited

kurtamohler commented Sep 28, 2023 •

edited

kurtamohler commented Oct 10, 2023 •

edited