Skip to content

Commit

Permalink
Update base for Update on "Extend SampleInput str representation with…
Browse files Browse the repository at this point in the history
… tensor data."

As in the title. The aim of this addition is to make debugging certain CI failures (that cannot be reproduced locally) easier. For instance, currently we see messages like
```
Exception: Caused by sample input at index 0: SampleInput(input=Tensor[size=(20,), device="cuda:0", dtype=torch.float64], args=(), kwargs={}, broadcasts_input=False, name='')
```
that is not really useful (as all those sample parameters can often be detected by other means) without showing actual sample data. The sample data can then be related to the `index` part in the error messages like:
```
Mismatched elements: 2 / 20 (10.0%)
Greatest absolute difference: nan at index (10,) (up to 1e-05 allowed)
Greatest relative difference: nan at index (10,) (up to 1e-07 allowed)
```

As an example of usefulness of this PR, consider the following failure message:
```
inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCPU::test_comprehensive_polygamma_polygamma_n_0_cpu_int32 ('RERUN', {'yellow': True}) [1.5510s] [ 70%]
inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCPU::test_comprehensive_polygamma_polygamma_n_0_cpu_int32 ('RERUN', {'yellow': True}) [0.0473s] [ 70%]
inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCPU::test_comprehensive_polygamma_polygamma_n_0_cpu_int32 FAILED [0.0493s] [ 70%]

==================================== RERUNS ====================================
__ TestInductorOpInfoCPU.test_comprehensive_polygamma_polygamma_n_0_cpu_int32 __
Traceback (most recent call last):
<snip>
AssertionError: Tensor-likes are not close!

Mismatched elements: 9 / 25 (36.0%)
Greatest absolute difference: inf at index (0, 0) (up to 1e-05 allowed), inf vs 20177651499008.0
Greatest relative difference: inf at index (0, 0) (up to 1.3e-06 allowed)

The above exception was the direct cause of the following exception:

<snip>
Exception: Caused by sample input at index 0: SampleInput(input=Tensor[size=(5, 5), device="cpu", dtype=torch.int32, data=[-8, 6, 9, 0, 0, 5, 5, 7, 6, 5, 1, -5, 2, -1, 8, -4, 0, -6, 3, -5]], args=(1), kwargs={}, broadcasts_input=False, name='')
```
from which we learn that `torch.polygamma` result is actually correct because `polygamma(0, -8) -> inf` while the used reference value (20177651499008.0) is wrong (see #106692 for more details).





[ghstack-poisoned]
  • Loading branch information
pearu committed Feb 10, 2024
2 parents 57083d5 + 2c87221 commit 7d1c4cb
Show file tree
Hide file tree
Showing 3 changed files with 13 additions and 9 deletions.
14 changes: 7 additions & 7 deletions test/test_mps.py
Original file line number Diff line number Diff line change
Expand Up @@ -551,13 +551,13 @@ def mps_ops_modifier(ops):
# - MPS output: tensor([102.6681, inf])
# In the latter case, inf is probably correct (this is what scipy does).
'polygamma': [torch.float32, torch.uint8],
'polygammapolygamma_n_0': [torch.float32, torch.int16, torch.int32, torch.int64, torch.int8],
'polygammapolygamma_n_2': [torch.float32, torch.int16, torch.int32, torch.int64, torch.int8],
'polygammapolygamma_n_1': [torch.float32, torch.int16, torch.int32, torch.int64, torch.int8],
'polygammapolygamma_n_3': [torch.float32, torch.int16, torch.int32, torch.int64, torch.int8],
'polygammapolygamma_n_4': [torch.float32, torch.int16, torch.int32, torch.int64, torch.int8],
'special.polygamma': [torch.float32, torch.int16, torch.int32, torch.int64, torch.int8],
'special.polygammaspecial_polygamma_n_0': [torch.float32, torch.int16, torch.int32, torch.int64, torch.int8],
'polygammapolygamma_n_0': [torch.float32, torch.int16, torch.int8],
'polygammapolygamma_n_2': [torch.float32, torch.int16, torch.int8],
'polygammapolygamma_n_1': [torch.float32, torch.int16, torch.int8],
'polygammapolygamma_n_3': [torch.float32, torch.int16, torch.int8],
'polygammapolygamma_n_4': [torch.float32, torch.int16, torch.int8],
'special.polygamma': [torch.float32, torch.int16, torch.int32, torch.int8],
'special.polygammaspecial_polygamma_n_0': [torch.float32, torch.int16, torch.int8],

# Failures due to precision issues (due to fast-math). These has been fixed in MacOS 13.3+
'tan': [torch.float32],
Expand Down
4 changes: 2 additions & 2 deletions torch/_guards.py
Original file line number Diff line number Diff line change
Expand Up @@ -645,9 +645,9 @@ def extract_stack():
self = TracingContext.try_get()
if self is None:
return traceback.StackSummary()
stack = list(self.frame_summary_stack)
stack = self.frame_summary_stack
if self.loc_in_frame is not None:
stack.append(self.loc_in_frame)
stack = stack + [self.loc_in_frame]
return traceback.StackSummary.from_list(stack)

# Call this when you want to call into some code that isn't necessarily
Expand Down
4 changes: 4 additions & 0 deletions torch/testing/_internal/common_modules.py
Original file line number Diff line number Diff line change
Expand Up @@ -4138,6 +4138,10 @@ def module_error_inputs_torch_nn_Pad3d(module_info, device, dtype, requires_grad
),
ModuleInfo(torch.nn.Embedding,
module_inputs_func=module_inputs_torch_nn_Embedding,
decorators=[
DecorateInfo(toleranceOverride({torch.float32: tol(atol=1e-4, rtol=1e-4)}),
'TestModule', 'test_non_contiguous_tensors',
device_type='mps')],
skips=(
DecorateInfo(unittest.skip("Skipped!"), 'TestModule', 'test_memory_format'),)
),
Expand Down

0 comments on commit 7d1c4cb

Please sign in to comment.