[MPS] Bug on training CNN+LSTM #83144

dominicshanshan · 2022-08-10T05:09:28Z

🐛 Describe the bug

Following training on M1MAX GPU

when I training a CNN+LSTM model on Pytorch v1.12.1, it goes with this error
loc("total derivative last state"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/20d6c351-ee94-11ec-bcaf-7247572f23b4/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":219:0)): error: input types 'tensor<1x82x64xf32>' and 'tensor<1x32x64xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).

this does not happened on previous Pytorch V11.2.0, I guess something wrong with new LSTM result matrix transformation?

Versions

Collecting environment information...
PyTorch version: 1.12.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 12.5 (arm64)
GCC version: Could not collect
Clang version: 13.1.6 (clang-1316.0.21.2.5)
CMake version: Could not collect
Libc version: N/A

Python version: 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:07:06) [Clang 13.0.1 ] (64-bit runtime)
Python platform: macOS-12.5-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.23.1
[pip3] torch==1.12.1
[pip3] torchaudio==0.12.1
[pip3] torchinfo==1.7.0
[pip3] torchvision==0.13.1
[conda] numpy 1.23.1 py310h220015d_0
[conda] numpy-base 1.23.1 py310h742c864_0
[conda] pytorch 1.12.1 py3.10_0 pytorch
[conda] torchaudio 0.12.1 py310_cpu pytorch
[conda] torchinfo 1.7.0 pyhd8ed1ab_0 conda-forge
[conda] torchvision 0.13.1 py310_cpu pytorch

cc @kulinseth @albanD

qqaatw · 2022-08-10T07:52:43Z

Can you please provide a minimal repro?

Also, did you set batch_first of LSTM to True?

dominicshanshan · 2022-08-10T08:20:06Z

Can you please provide a minimal repro?

Also, did you set batch_first of LSTM to True?

the model is rather complicated, I will try to prepare a short demo on it, but the error was exactly like here (#78429), can take this as reference, due to this error message, is hard for me to infer whether it is a bad nn.LayerNorm or nn.LSTM. For the second question, yes, I indeed set 'batch_first = True'

dominicshanshan · 2022-08-10T08:23:04Z

the same script running on v1.12.0 was fine, but I have manually adjust the result matrix shape from nn.LSTM, mpsbackend() is different than cuDNNbackend()

qqaatw · 2022-08-10T08:37:28Z

The problem is that the backward of LSTM on MPS backend has a computational correctness issue that hasn't been resolved yet, and it currently doesn't correctly take care of batch_first, which previously happened to the forward as well, which is why you had to manually transpose the resulting matrix of LSTM in v1.12.0.

Now, the batch_first issue is fixed in the forward pass of LSTM on v1.12.1 but not fixed in the backward pass because of the other issue remaining.

Here is a related issue: #80306

DenisVieriu97 · 2022-08-10T17:39:30Z

PyTorch version: 1.12.1

@dominicshanshan could you please try a newer version of PyTorch, such as the latest nightly? If using pip, you can use: pip3 install --pre torch --extra-index-url https://download.pytorch.org/whl/nightly/cpu to get the latest nightly build.

dominicshanshan · 2022-08-11T04:33:30Z

PyTorch version: 1.12.1

@dominicshanshan could you please try a newer version of PyTorch, such as the latest nightly? If using pip, you can use: pip3 install --pre torch --extra-index-url https://download.pytorch.org/whl/nightly/cpu to get the latest nightly build.

yep, but can i choose to build the nightly version from conda install pytorch torchvision torchaudio -c pytorch-nightly? since I use conda env to do the MPS testing, if yes, then I will try and report back! appreciate you help

dominicshanshan · 2022-08-13T02:33:06Z

@DenisVieriu97 , just tried on latest nightly, still have the same error as v1.12.1

loc("total derivative last state"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/20d6c351-ee94-11ec-bcaf-7247572f23b4/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":219:0)): error: input types 'tensor<1x82x64xf32>' and 'tensor<1x128x64xf32>' are not broadcast compatible

env info:
PyTorch version: 1.13.0.dev20220812
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 12.5 (arm64)
GCC version: Could not collect
Clang version: 13.1.6 (clang-1316.0.21.2.5)
CMake version: Could not collect
Libc version: N/A

Python version: 3.10.4 (main, Mar 31 2022, 03:37:37) [Clang 12.0.0 ] (64-bit runtime)
Python platform: macOS-12.5-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.23.1
[pip3] torch==1.13.0.dev20220812
[pip3] torchinfo==1.7.0
[conda] numpy 1.23.1 py310h220015d_0
[conda] numpy-base 1.23.1 py310h742c864_0
[conda] pytorch 1.13.0.dev20220812 py3.10_0 pytorch-nightly
[conda] torchinfo 1.7.0 pyhd8ed1ab_0 conda-forge

DenisVieriu97 · 2022-09-28T04:47:36Z

@dominicshanshan thanks for trying latest nightly and for the update!
Could you please provide a minimal repro code that you are using to reproduce this crash? Thanks!

dominicshanshan · 2022-10-01T03:50:19Z

busy these days, sorry for reply late. The model is kind of private, but I will try to provide a toy code for you

dominicshanshan · 2022-10-03T06:41:30Z

@dominicshanshan thanks for trying latest nightly and for the update!
Could you please provide a minimal repro code that you are using to reproduce this crash? Thanks!

import torch
import torch.nn as nn

device = "mps" if torch.backends.mps.is_available() else "cpu"
torch.manual_seed(42)
data = torch.randn((5, 3, 10), device=device)
lstm = nn.LSTM(10, 20, 1, batch_first=True).to(device)
h0 = torch.zeros((1, 5, 20), device=device)
c0 = torch.zeros((1, 5, 20), device=device)
output, _ = lstm(data, (h0, c0))

The native implementation of LSTM has been fixed on macOS 13. On macOS 12, the multi-layer LSTM still has a numerical correctness issue that cannot be resolved on OS's side. Thus, we fall back the multi-layer LSTM on macOS 12 to LSTMCell iteration. It might have performance impact but will make LSTM on macOS 12 fully usable. Fixes: #90421 Issues related: #80306, #83144 Pull Request resolved: #90909 Approved by: https://github.com/albanD, https://github.com/kulinseth

soulitzer added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: mps Related to Apple Metal Performance Shaders framework labels Aug 11, 2022

HendrikSchmidt mentioned this issue Sep 20, 2022

Get training to work in parallel on different architectures rvandewater/YAIB#3

Closed

HendrikSchmidt mentioned this issue Nov 29, 2022

Enable experiments rvandewater/YAIB#71

Merged

16 tasks

qqaatw mentioned this issue Dec 15, 2022

[MPS] Fall back multi-layer LSTM on macOS 12 #90909

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MPS] Bug on training CNN+LSTM #83144

[MPS] Bug on training CNN+LSTM #83144

dominicshanshan commented Aug 10, 2022 •

edited by pytorch-bot bot

qqaatw commented Aug 10, 2022

dominicshanshan commented Aug 10, 2022

dominicshanshan commented Aug 10, 2022 •

edited

qqaatw commented Aug 10, 2022

DenisVieriu97 commented Aug 10, 2022

dominicshanshan commented Aug 11, 2022

dominicshanshan commented Aug 13, 2022 •

edited

DenisVieriu97 commented Sep 28, 2022

dominicshanshan commented Oct 1, 2022

dominicshanshan commented Oct 3, 2022

[MPS] Bug on training CNN+LSTM #83144

[MPS] Bug on training CNN+LSTM #83144

Comments

dominicshanshan commented Aug 10, 2022 • edited by pytorch-bot bot

🐛 Describe the bug

Versions

qqaatw commented Aug 10, 2022

dominicshanshan commented Aug 10, 2022

dominicshanshan commented Aug 10, 2022 • edited

qqaatw commented Aug 10, 2022

DenisVieriu97 commented Aug 10, 2022

dominicshanshan commented Aug 11, 2022

dominicshanshan commented Aug 13, 2022 • edited

DenisVieriu97 commented Sep 28, 2022

dominicshanshan commented Oct 1, 2022

dominicshanshan commented Oct 3, 2022

dominicshanshan commented Aug 10, 2022 •

edited by pytorch-bot bot

dominicshanshan commented Aug 10, 2022 •

edited

dominicshanshan commented Aug 13, 2022 •

edited