RuntimeError: Undefined type BFloat16 from matmul above certain sizes (maybe MPS only) #121583

Vargol · 2024-03-09T15:12:40Z

🐛 Describe the bug

When using torch.matmul (and the @ operator) on bfloat16 tensors above a certain size I'm getting the error
RuntimeError: Undefined type BFloat16

Example code

import torch

sdxl_latent_rgb_factors = torch.tensor(
            [
                #   R        G        B
                [0.3816, 0.4930, 0.5320],
                [-0.3753, 0.1631, 0.1739],
                [0.1770, 0.3588, -0.2048],
                [-0.4350, -0.2644, -0.4289],
            ],
            dtype=torch.bfloat16,
            device='mps',
        )

print('-------181,181,4--works----------')

samples = torch.zeros(  181 , 181, 4, 
                     dtype=torch.bfloat16,
                    device='mps',
)

x = torch.matmul(samples, sdxl_latent_rgb_factors)

print('-------182,182,4--fails--------')

samples2 = torch.zeros(  182 , 182, 4, 
                     dtype=torch.bfloat16,
                    device='mps',
)

x = torch.matmul(samples2, sdxl_latent_rgb_factors)

results in

 python ../Diffusers/matmul2.py 
-------181,181,4--works----------
-------182,182,4--fails--------
Traceback (most recent call last):
  File "/Volumes/SSD2TB/AI/Cascade/../Diffusers/matmul2.py", line 31, in <module>
    x = torch.matmul(samples2, sdxl_latent_rgb_factors)
RuntimeError: Undefined type BFloat16

Versions

Collecting environment information...
PyTorch version: 2.3.0.dev20240309
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 14.2.1 (arm64)
GCC version: Could not collect
Clang version: 15.0.0 (clang-1500.3.9.4)
CMake version: version 3.24.4
Libc version: N/A

Python version: 3.10.13 (main, Nov  9 2023, 13:59:31) [Clang 15.0.0 (clang-1500.0.40.1)] (64-bit runtime)
Python platform: macOS-14.2.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M3

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.3.0.dev20240309
[pip3] torchaudio==2.2.0.dev20240301
[pip3] torchvision==0.18.0.dev20240309
[conda] Could not collect

cc @kulinseth @albanD @malfet @DenisVieriu97 @razarmehr

The text was updated successfully, but these errors were encountered:

malfet · 2024-03-11T14:29:35Z

Ok, I know the problem: MPS supports BFloat16, but metal is not, and as MPS matmul is buggy for large matrices, there is a naive Metal matmul implementation, that yet not support bf16

Will only work on MacOS14 or newer, so compile the shader with `MTLLanguageVersion_3_1` when appropriate Fixes #121583

Will only work on MacOS14 or newer, so compile the shader with `MTLLanguageVersion_3_1` when appropriate Fixes #121583 Pull Request resolved: #121731 Approved by: https://github.com/albanD (cherry picked from commit 5498804)

Will only work on MacOS14 or newer, so compile the shader with `MTLLanguageVersion_3_1` when appropriate Fixes #121583 Pull Request resolved: #121731 Approved by: https://github.com/albanD (cherry picked from commit 5498804) Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>

atalman · 2024-04-11T13:27:38Z

validated with 2.3:

-------181,181,4--works----------
-------182,182,4--fails--------

Vargol mentioned this issue Mar 9, 2024

[bug]: I don't thin bfloat16 config option is being used. invoke-ai/InvokeAI#5799

Open

1 task

albanD added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: bfloat16 module: mps Related to Apple Metal Performance Shaders framework labels Mar 11, 2024

malfet self-assigned this Mar 11, 2024

malfet added a commit that referenced this issue Mar 12, 2024

[MPS] Fix naive matmul for BFloat16

21440af

Will only work on MacOS14 or newer, so compile the shader with `MTLLanguageVersion_3_1` when appropriate Fixes #121583

malfet added a commit that referenced this issue Mar 12, 2024

[MPS] Fix naive matmul for BFloat16

520986e

Will only work on MacOS14 or newer, so compile the shader with `MTLLanguageVersion_3_1` when appropriate Fixes #121583

malfet mentioned this issue Mar 12, 2024

[MPS] Fix naive matmul for BFloat16 #121731

Closed

malfet added a commit that referenced this issue Mar 13, 2024

[MPS] Fix naive matmul for BFloat16

1cb3fea

Will only work on MacOS14 or newer, so compile the shader with `MTLLanguageVersion_3_1` when appropriate Fixes #121583

pytorchmergebot closed this as completed in 5498804 Mar 13, 2024

pytorchbot mentioned this issue Apr 3, 2024

[MPS] Fix naive matmul for BFloat16 #123289

Merged

atalman mentioned this issue Apr 3, 2024

[v.2.3.0] Release Tracker #121760

Closed

atalman added this to the 2.3.0 milestone Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Undefined type BFloat16 from matmul above certain sizes (maybe MPS only) #121583

RuntimeError: Undefined type BFloat16 from matmul above certain sizes (maybe MPS only) #121583

Vargol commented Mar 9, 2024 •

edited by pytorch-bot bot

malfet commented Mar 11, 2024

atalman commented Apr 11, 2024

RuntimeError: Undefined type BFloat16 from matmul above certain sizes (maybe MPS only) #121583

RuntimeError: Undefined type BFloat16 from matmul above certain sizes (maybe MPS only) #121583

Comments

Vargol commented Mar 9, 2024 • edited by pytorch-bot bot

🐛 Describe the bug

Versions

malfet commented Mar 11, 2024

atalman commented Apr 11, 2024

Vargol commented Mar 9, 2024 •

edited by pytorch-bot bot