Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix broken linalg unittests on ARM platform #125438

Open
malfet opened this issue May 2, 2024 · 2 comments
Open

Fix broken linalg unittests on ARM platform #125438

malfet opened this issue May 2, 2024 · 2 comments
Labels
module: arm Related to ARM architectures builds of PyTorch. Includes Apple M1 module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul module: tests Issues related to tests (not the torch.testing module) triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@malfet
Copy link
Contributor

malfet commented May 2, 2024

🐛 Describe the bug

If test_linalg.py is run on Mac M1 4 tests are failing:

======================================================================
ERROR: test_vector_norm_cpu_bfloat16 (__main__.TestLinalgCPU)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/nshulga/git/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 2757, in wrapper
    method(*args, **kwargs)
  File "/Users/nshulga/git/pytorch/pytorch/torch/testing/_internal/common_device_type.py", line 432, in instantiated_test
    raise rte
  File "/Users/nshulga/git/pytorch/pytorch/torch/testing/_internal/common_device_type.py", line 419, in instantiated_test
    result = test(self, **param_kwargs)
  File "/Users/nshulga/git/pytorch/pytorch-tmp/test/test_linalg.py", line 1289, in test_vector_norm
    run_test_case(
  File "/Users/nshulga/git/pytorch/pytorch-tmp/test/test_linalg.py", line 1262, in run_test_case
    result_dtype_reference = vector_norm_reference(input, ord, dim=dim, keepdim=keepdim, dtype=norm_dtype)
  File "/Users/nshulga/git/pytorch/pytorch-tmp/test/test_linalg.py", line 1246, in vector_norm_reference
    result = torch.linalg.norm(input_maybe_flat, ord, dim=dim, keepdim=keepdim, dtype=dtype)
RuntimeError: Found dtype Float but expected BFloat16

To execute this test, run the following from the base repo dir:
     python test/test_linalg.py -k test_vector_norm_cpu_bfloat16

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

======================================================================
ERROR: test_vector_norm_cpu_float16 (__main__.TestLinalgCPU)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/nshulga/git/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 2757, in wrapper
    method(*args, **kwargs)
  File "/Users/nshulga/git/pytorch/pytorch/torch/testing/_internal/common_device_type.py", line 432, in instantiated_test
    raise rte
  File "/Users/nshulga/git/pytorch/pytorch/torch/testing/_internal/common_device_type.py", line 419, in instantiated_test
    result = test(self, **param_kwargs)
  File "/Users/nshulga/git/pytorch/pytorch-tmp/test/test_linalg.py", line 1289, in test_vector_norm
    run_test_case(
  File "/Users/nshulga/git/pytorch/pytorch-tmp/test/test_linalg.py", line 1262, in run_test_case
    result_dtype_reference = vector_norm_reference(input, ord, dim=dim, keepdim=keepdim, dtype=norm_dtype)
  File "/Users/nshulga/git/pytorch/pytorch-tmp/test/test_linalg.py", line 1246, in vector_norm_reference
    result = torch.linalg.norm(input_maybe_flat, ord, dim=dim, keepdim=keepdim, dtype=dtype)
RuntimeError: Found dtype Float but expected Half

To execute this test, run the following from the base repo dir:
     python test/test_linalg.py -k test_vector_norm_cpu_float16

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

======================================================================
ERROR: test_vector_norm_cpu_float32 (__main__.TestLinalgCPU)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/nshulga/git/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 2757, in wrapper
    method(*args, **kwargs)
  File "/Users/nshulga/git/pytorch/pytorch/torch/testing/_internal/common_device_type.py", line 432, in instantiated_test
    raise rte
  File "/Users/nshulga/git/pytorch/pytorch/torch/testing/_internal/common_device_type.py", line 419, in instantiated_test
    result = test(self, **param_kwargs)
  File "/Users/nshulga/git/pytorch/pytorch-tmp/test/test_linalg.py", line 1289, in test_vector_norm
    run_test_case(
  File "/Users/nshulga/git/pytorch/pytorch-tmp/test/test_linalg.py", line 1262, in run_test_case
    result_dtype_reference = vector_norm_reference(input, ord, dim=dim, keepdim=keepdim, dtype=norm_dtype)
  File "/Users/nshulga/git/pytorch/pytorch-tmp/test/test_linalg.py", line 1246, in vector_norm_reference
    result = torch.linalg.norm(input_maybe_flat, ord, dim=dim, keepdim=keepdim, dtype=dtype)
RuntimeError: Found dtype Double but expected Float

To execute this test, run the following from the base repo dir:
     python test/test_linalg.py -k test_vector_norm_cpu_float32

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

======================================================================
FAIL: test_addmv_cpu_float16 (__main__.TestLinalgCPU)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/nshulga/git/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 2757, in wrapper
    method(*args, **kwargs)
  File "/Users/nshulga/git/pytorch/pytorch/torch/testing/_internal/common_device_type.py", line 419, in instantiated_test
    result = test(self, **param_kwargs)
  File "/Users/nshulga/git/pytorch/pytorch-tmp/test/test_linalg.py", line 5611, in test_addmv
    self._test_addmm_addmv(torch.addmv, t, m, v)
  File "/Users/nshulga/git/pytorch/pytorch-tmp/test/test_linalg.py", line 5576, in _test_addmm_addmv
    self.assertEqual(res1, res3)
  File "/Users/nshulga/git/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 3640, in assertEqual
    raise error_metas.pop()[0].to_error(
AssertionError: Tensor-likes are not close!

Mismatched elements: 50 / 50 (100.0%)
Greatest absolute difference: 0.08056640625 at index (35,) (up to 0.001 allowed)
Greatest relative difference: 0.74755859375 at index (1,) (up to 0.001 allowed)

To execute this test, run the following from the base repo dir:
     python test/test_linalg.py -k test_addmv_cpu_float16

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

----------------------------------------------------------------------
Ran 1032 tests in 44.906s

FAILED (failures=1, errors=3, skipped=83)

Versions

CI

cc @mruberry @ZainRizvi @jianyuh @nikitaved @pearu @walterddr @xwang233 @lezcano @snadampal

@malfet malfet added module: tests Issues related to tests (not the torch.testing module) module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul labels May 2, 2024
@drisspg drisspg added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: arm Related to ARM architectures builds of PyTorch. Includes Apple M1 labels May 3, 2024
pytorchmergebot pushed a commit that referenced this issue May 3, 2024
Removes obscure "Issue with numpy version on arm" added by #82213
And replaces it with 4 targeted skips:
 - test_addmv for `float16`
- test_vector_norm for `float16`, `bfloat16` and `float32`

Followups to fix them are tracked in #125438
Pull Request resolved: #125377
Approved by: https://github.com/kit1980
@Aidyn-A
Copy link
Collaborator

Aidyn-A commented May 3, 2024

Same numerical mismatches test_addmv_cpu_float16 were observed on Grace CPU (ARM).

@Aidyn-A
Copy link
Collaborator

Aidyn-A commented May 3, 2024

To be fair, I was the one last modified vector_norm in #125175. But I did not see Found dtype X but expected Y on Grace. Only test_addmv_cpu_float16 show up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: arm Related to ARM architectures builds of PyTorch. Includes Apple M1 module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul module: tests Issues related to tests (not the torch.testing module) triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

3 participants