Skip to content

Accuracy mismatch for test_block_addmm for float16 and bfloat16 #86681

@IvanYashchuk

Description

@IvanYashchuk

🐛 Describe the bug

#85551 added new implementation and tests for block_addmm with float16 and bfloat16 dtypes. Unfortunately, OSS CI doesn't catch the problems as these dtypes are filtered with SM53OrLater and SM80OrLater.

FAIL: test_block_addmm_block_size_2_int32_noncontiguous_False_cuda_bfloat16 (__main__.TestSparseCSRCUDA)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/torch/testing/_internal/common_utils.py", line 1990, in wrapper
    method(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/testing/_internal/common_utils.py", line 1990, in wrapper
    method(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/testing/_internal/common_device_type.py", line 378, in instantiated_test
    result = test(self, **param_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/testing/_internal/common_device_type.py", line 853, in dep_fn
    return fn(slf, *args, **kwargs)
  File "test_sparse_csr.py", line 1602, in test_block_addmm
    self.run_test_block_addmm_addmv(make_transposed_addmm_op(torch.addmm),
  File "test_sparse_csr.py", line 1497, in run_test_block_addmm_addmv
    self.assertEqual(actual, expected)
  File "/usr/local/lib/python3.8/dist-packages/torch/testing/_internal/common_utils.py", line 2454, in assertEqual
    assert_equal(
  File "/usr/local/lib/python3.8/dist-packages/torch/testing/_comparison.py", line 1093, in assert_equal
    raise error_metas[0].to_error(msg)
AssertionError: Tensor-likes are not close!
Mismatched elements: 1 / 40 (2.5%)
Greatest absolute difference: 0.029296875 at index (3, 1) (up to 0.001 allowed)
Greatest relative difference: 0.05928853754940711 at index (3, 1) (up to 0.016 allowed)
======================================================================
FAIL: test_block_addmm_block_size_2_int32_noncontiguous_False_cuda_float16 (__main__.TestSparseCSRCUDA)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/torch/testing/_internal/common_utils.py", line 1990, in wrapper
    method(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/testing/_internal/common_utils.py", line 1990, in wrapper
    method(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/testing/_internal/common_device_type.py", line 378, in instantiated_test
    result = test(self, **param_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/testing/_internal/common_device_type.py", line 853, in dep_fn
    return fn(slf, *args, **kwargs)
  File "test_sparse_csr.py", line 1601, in test_block_addmm
    self.run_test_block_addmm_addmv(torch.addmm, c, a, b, op_b, op_out, dtype=dtype, device=device, ref=ref)
  File "test_sparse_csr.py", line 1497, in run_test_block_addmm_addmv
    self.assertEqual(actual, expected)
  File "/usr/local/lib/python3.8/dist-packages/torch/testing/_internal/common_utils.py", line 2454, in assertEqual
    assert_equal(
  File "/usr/local/lib/python3.8/dist-packages/torch/testing/_comparison.py", line 1093, in assert_equal
    raise error_metas[0].to_error(msg)
AssertionError: Tensor-likes are not close!
Mismatched elements: 3 / 40 (7.5%)
Greatest absolute difference: 0.00390625 at index (1, 9) (up to 0.001 allowed)
Greatest relative difference: 0.0029154518950437317 at index (2, 4) (up to 0.001 allowed)

Versions

Latest master.

cc @nikitaved @pearu @cpuhrsch @amjames @bhosmer @ngimel

Metadata

Metadata

Assignees

Labels

module: bfloat16module: cudaRelated to torch.cuda, and CUDA support in generalmodule: halfRelated to float16 half-precision floatsmodule: sparseRelated to torch.sparse

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions