`mm`: improve error handling and error messages. #9621

ysiraichi · 2025-09-04T14:52:28Z

This PR refactors the mm operation implementation by improving its error message, and returning a status type value.

Key Changes:

Make tensor_methods::mm return Status
Refactor XLANativeFunctions::mm overloads to handle the status values
Improve error messages and error handling

Example 1: input is not a matrix

a = torch.rand(2, 4, 8, device=device)
b = torch.rand(8, 2, device=device)
torch.mm(a, b)

Before:

Traceback (most recent call last):
  File "examples/matmul.py", line 25, in <module>
    torch.mm(a, b)
RuntimeError: Cannot infer shape for dot operation: f32[2,4,8] <dot> f32[8,2]. Contracting dimension sizes are not compatible.

Status Propagation Trace:
    From: ShapeOfXlaOp at torch_xla/csrc/shape_helper.cpp:9

Exception raised from ThrowStatusError at torch_xla/csrc/status.cpp:128 (most recent call first):

After:

Traceback (most recent call last):
  File "examples/matmul.py", line 25, in <module>
    torch.mm(a, b)
RuntimeError: mm(): expected the first input tensor f32[2,4,8] to be a matrix (i.e. a 2D tensor).

Status Propagation Trace:
    From: CheckMMInputIsMatrix at torch_xla/csrc/tensor_methods.cpp:486 (error: mm(): expected the first input tensor f32[2,4,8] to be a matrix (i.e. a 2D tensor).)
    From: mm at torch_xla/csrc/tensor_methods.cpp:2381
    From: mm at torch_xla/csrc/aten_xla_type.cpp:2498

Exception raised from ThrowStatusError at torch_xla/csrc/status.cpp:128 (most recent call first):

Example 2: incompatible shapes

a = torch.rand(2, 4, device=device)
b = torch.rand(8, 2, device=device)
torch.mm(a, b)

Before:

Traceback (most recent call last):
  File "examples/matmul.py", line 25, in <module>
    torch.mm(a, b)
RuntimeError: Cannot infer shape for dot operation: f32[2,4] <dot> f32[8,2]. Contracting dimension sizes are not compatible.

Status Propagation Trace:
    From: ShapeOfXlaOp at torch_xla/csrc/shape_helper.cpp:9

Exception raised from ThrowStatusError at torch_xla/csrc/status.cpp:128 (most recent call first):

After:

Traceback (most recent call last):
  File "examples/matmul.py", line 25, in <module>
    torch.mm(a, b)
RuntimeError: mm(): cannot matrix-multiply tensors f32[2,4] and f32[8,2]. Expected the size of dimension 1 of the first input tensor (4) to be equal the size of dimension 0 of the second input tensor (8).

Status Propagation Trace:
    From: CheckMMMatrixSizesAreCompatible at torch_xla/csrc/tensor_methods.cpp:498 (error: mm(): cannot matrix-multiply tensors f32[2,4] and f32[8,2]. Expected the size of dimension 1 of the first input tensor (4) to be equal the size of dimension 0 of the second input tensor (8).)
    From: mm at torch_xla/csrc/tensor_methods.cpp:2383
    From: mm at torch_xla/csrc/aten_xla_type.cpp:2498

Exception raised from ThrowStatusError at torch_xla/csrc/status.cpp:128 (most recent call first):

ysiraichi added 2 commits September 4, 2025 11:51

Improve error messages for mm.

aafc8bc

Fix lint.

8476008

ysiraichi requested review from zhanyong-wan and ghpvnist September 4, 2025 18:29

Fix C++ lint.

3708ac1

zhanyong-wan approved these changes Sep 5, 2025

View reviewed changes

Cache device (re-trigger CI)

e0fe284

ysiraichi merged commit 92dcabc into master Sep 5, 2025
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`mm`: improve error handling and error messages. #9621

`mm`: improve error handling and error messages. #9621

Uh oh!

ysiraichi commented Sep 4, 2025

Uh oh!

Uh oh!

Uh oh!

mm: improve error handling and error messages. #9621

mm: improve error handling and error messages. #9621

Uh oh!

Conversation

ysiraichi commented Sep 4, 2025

Example 1: input is not a matrix

Example 2: incompatible shapes

Uh oh!

Uh oh!

Uh oh!

`mm`: improve error handling and error messages. #9621

`mm`: improve error handling and error messages. #9621