MPSNDArray Error: buffer is not large enough #79181

theRoodest · 2022-06-09T02:41:49Z

🐛 Describe the bug

Installed latest PyTorch nightly build for MPS on M1, 1.13.0dev20220608, set environmental variable PYTORCH_ENABLE_MPS_FALLBACK=1 to resolve issue with aten::index.Tensor. Ran Stable-Baselines3 to train a PPO agent. Conducted one successful iteration then received an error:

import torch

from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env

# Parallel environments
env = make_vec_env("CartPole-v1", n_envs=4)
device = torch.device('mps')

model = PPO("MlpPolicy", env, device=device, verbose=1)
model.learn(total_timesteps=25000)
model.save("ppo_cartpole")

/AppleInternal/Library/BuildRoots/8d3bda53-8d9c-11ec-abd7-fa6a1964e34e/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:782: failed assertion '[MPSNDArray, initWithBuffer:descriptor:] Error: buffer is not large enough. Must be 1024 bytes
'

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

Versions

Collecting environment information...
PyTorch version: 1.13.0.dev20220608
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 12.3.1 (arm64)
GCC version: Could not collect
Clang version: 13.1.6 (clang-1316.0.21.2.5)
CMake version: Could not collect
Libc version: N/A

Python version: 3.8.13 (default, Mar 28 2022, 06:13:39)  [Clang 12.0.0 ] (64-bit runtime)
Python platform: macOS-12.3.1-arm64-i386-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.21.4
[pip3] pytorch-ignite==0.4.6
[pip3] torch==1.13.0.dev20220608
[pip3] torchvision==0.14.0a0+f9f721d
[conda] pytorch                   1.13.0.dev20220608         py3.8_0    pytorch-nightly
[conda] torch                     1.13.0.dev20220602          pypi_0    pypi
[conda] torchvision               0.14.0a0+f9f721d          pypi_0    pypi

cc @ezyang @gchanan @zou3519 @kulinseth @albanD

The text was updated successfully, but these errors were encountered:

malfet · 2022-06-13T15:23:06Z

Likely a duplicate of #78916
And I can reproduce the crash easily (though TORCH_SHOW_CPP_STACKTRACES=1 does not work for it :( ), so here is the backtrace with lldb attached:

    frame #0: 0x000000019cd52d98 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x000000019cd87ee0 libsystem_pthread.dylib`pthread_kill + 288
    frame #2: 0x000000019ccc2340 libsystem_c.dylib`abort + 168
    frame #3: 0x000000019ccc1754 libsystem_c.dylib`__assert_rtn + 272
  * frame #4: 0x00000001a57787a8 Metal`MTLReportFailure.cold.1 + 56
    frame #5: 0x00000001a57622bc Metal`MTLReportFailure + 480
    frame #6: 0x00000001a63b6984 MPSCore`___lldb_unnamed_symbol641$$MPSCore + 428
    frame #7: 0x00000002009bac90 MetalPerformanceShadersGraph`___lldb_unnamed_symbol2960$$MetalPerformanceShadersGraph + 536
    frame #8: 0x000000012bf040c0 libtorch_cpu.dylib`at::native::mps::_gatherViewTensor(at::Tensor const&, id<MTLBuffer>, at::native::mps::MPSCachedGraph*, at::Tensor&) + 176
    frame #9: 0x000000012bf0457c libtorch_cpu.dylib`at::native::mps::Placeholder::Placeholder(MPSGraphTensor*, at::Tensor const&, NSArray<NSNumber*>*) + 208
    frame #10: 0x000000012bf7d808 libtorch_cpu.dylib`at::native::structured_gather_out_mps::impl(at::Tensor const&, long long, at::Tensor const&, bool, at::Tensor const&) + 1468
    frame #11: 0x0000000129b6de60 libtorch_cpu.dylib`at::(anonymous namespace)::wrapper_gather(at::Tensor const&, long long, at::Tensor const&, bool) + 128
    frame #12: 0x000000012a98ff2c libtorch_cpu.dylib`c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, long long, at::Tensor const&, bool), &(torch::autograd::VariableType::(anonymous namespace)::gather(c10::DispatchKeySet, at::Tensor const&, long long, at::Tensor const&, bool))>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, long long, at::Tensor const&, bool> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, long long, at::Tensor const&, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, long long, at::Tensor const&, bool) + 1068
    frame #13: 0x0000000129222fd0 libtorch_cpu.dylib`at::_ops::gather::call(at::Tensor const&, long long, at::Tensor const&, bool) + 304
    frame #14: 0x00000001044875f0 libtorch_python.dylib`torch::autograd::THPVariable_gather(_object*, _object*, _object*) + 692

malfet · 2022-06-14T04:02:29Z

Here is the one line reproducer to the problem:

% python3 -c "import torch;x=(torch.rand(64, 1, device='mps')*1000).to(dtype=torch.int64); y=x.as_strided(size=(64,2), stride=(1, 0));z=y.as_strided(size=(64, 2), stride=(1, 0));z.to('cpu')"
/AppleInternal/Library/BuildRoots/b6051351-c030-11ec-96e9-3e7866fcf3a1/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:782: failed assertion `[MPSNDArray, initWithBuffer:descriptor:] Error: buffer is not large enough. Must be 1024 bytes
'
zsh: abort      python3 -c

Fixes #79181 [ghstack-poisoned]

Fixes #79181 ghstack-source-id: 50c6f28d6a6f0e4d31bb2cea180836f20cb7e88d Pull Request resolved: #79521

abhudev · 2022-06-14T20:42:33Z

We aren't able to reproduce this issue, could you please update to latest nightly and try again?

Summary: Fixes #79181 Pull Request resolved: #79521 Approved by: https://github.com/kulinseth Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/81cd276d616e7396b839ac8ceb984517fc87ea5f Reviewed By: dagitses Differential Revision: D37156716 Pulled By: malfet fbshipit-source-id: 61c9ccb5e0464ecf3f5ef13372eb1a029987f152

Fixes pytorch#79181 Pull Request resolved: pytorch#79521 Approved by: https://github.com/kulinseth

* MPS: Fixes (#78930) Cast integer to float in UnaryOps Add tensor dtype in key generation Enable FP16 scalars and use placeholder for alpha tensor in add/sum ops Fixes #ISSUE_NUMBER Pull Request resolved: #78930 Approved by: https://github.com/albanD * MPS: Binary cast fix by proper type promotion and remove spurious copy warning (#79185) Fixes #78019, #78020 Fixes #79185 Pull Request resolved: #79185 Approved by: https://github.com/albanD, https://github.com/razarmehr * MPS: add exponential op (#79188) Add exponential distribution Fixes #ISSUE_NUMBER Pull Request resolved: #79188 Approved by: https://github.com/razarmehr, https://github.com/albanD * [MPS] Delete unused vars from OperationUtils.mm Pull Request resolved: #79514 Approved by: https://github.com/kulinseth, https://github.com/albanD * [MPS] Fix getDefaultGenerator and copy_kernel_mps Returning reference to stack memory is really bad Pull Request resolved: #79515 Approved by: https://github.com/albanD * [MPS][BE]Do not use `new/delete[]` in `chainViewOperation` `std::array` will do just fine Pull Request resolved: #79516 Approved by: https://github.com/albanD * [MPS] Support stride of stride Fixes #79181 Pull Request resolved: #79521 Approved by: https://github.com/kulinseth * MPS: TopK raise an error if K>16 (#79677) * Error out in TopK when k>16. * Add a test case too. Fixes #78915 Pull Request resolved: #79677 Approved by: https://github.com/albanD * [MPS]: Add fix for squeezed input axes handling in BCE loss (#79676) Fixes #79527 Pull Request resolved: #79676 Approved by: https://github.com/razarmehr, https://github.com/albanD * MPS: Add amax and amin Ops with tests (#79682) * Add amax and amin with tests Fixes #ISSUE_NUMBER Pull Request resolved: #79682 Approved by: https://github.com/albanD * [MPS] Fix torch.uint8 support (#80049) `ScalarType.Byte` should be cast to `MPSDataTypeUInt8` And support for `torch.int8` as well as test those conversions in `TestMPS.test_to` Fixes #80006 Pull Request resolved: #80049 Approved by: https://github.com/albanD * [MPS] Fix binary ops between int32 tensor with int64 scalar (#80220) For some reason, tensor *op* scalar does not follow the normal binary promotion rules So cast output tensor to expected type if needed It seems that one should have casted input tensors to expected output tensor type, but it does not really work for boolean binary ops, so... Add output tensor type/shape to cached graph key Extend `TestMPS. test_add_scalars` to test for this regression Fixes #79835 Pull Request resolved: #80220 Approved by: https://github.com/albanD * [MPS] Add equal operator (#80195) Which is, in essence is composite of `eq`->`all`->`item` `native/mps/operators/Equal.cpp` is an almost verbatim copy of `native/cuda/Equal.cpp` Fix codegen by generating MPSFunctions headers Pull Request resolved: #80195 Approved by: https://github.com/albanD * [MPS] add `aten::normal.Tensor_float` `aten::normal.float_Tensor` `aten::normal.Tensor_Tensor` (#80297) Fixes #ISSUE_NUMBER Pull Request resolved: #80297 Approved by: https://github.com/albanD, https://github.com/kulinseth * [MPS] Add flip (#80214) Fixes #ISSUE_NUMBER Pull Request resolved: #80214 Approved by: https://github.com/DenisVieriu97, https://github.com/albanD * [MPS] Add logical ops (#80216) This PR adds `logical_not`, `logical_and`, `logical_or`, `logical_xor`. Pull Request resolved: #80216 Approved by: https://github.com/albanD, https://github.com/kulinseth * [MPS] Add glu (#79866) Adds mps op for `aten::glu.out`. Pull Request resolved: #79866 Approved by: https://github.com/kulinseth, https://github.com/albanD * [MPS] Fix std/var cache issue (#80502) Use `getTensorsStringKey` which has tensor shape info added as part of the key to prevent cache lookup issue when the shape of input tensor is changed. Fixes #80499 Pull Request resolved: #80502 Approved by: https://github.com/malfet, https://github.com/kulinseth * Add scatter support for view operations (#79939) * Add scatter support for view operations; #78074, #78886, #79672 * Update test_slicing_replace_column to properly test different sizes * Handle in-place changes for binary ops; add new testcase * Add new view ops testing scatter; add MPSDebugConfig.h config file for debugging purposes * Merge gatherViewTensor and scatterViewTensor into a generic function * Add scatter on demand in scatterViewOperation instead of caching it into a generic graph * Create separate graphs for scatter and gather; * Create scatter graph at scatter time Fixes #ISSUE_NUMBER Pull Request resolved: #79939 Approved by: https://github.com/razarmehr * MPS: Fix handling of 1D tensors in linear backward (#80759) Fixes ##79784 Pull Request resolved: #80759 Approved by: https://github.com/ezyang * [MPS] Move the View ops to a separate file and reduce the number of graphs created (#80491) This is dependent on the PR to go in first: #79939 Remove the data_ptr from the View Graph key which reduces the number of graphs created significantly. Don't wait when copying from MPS to MPS tensors Pull Request resolved: #80491 Approved by: https://github.com/malfet * [MPS] Add softplus backward (#79873) Fixes #ISSUE_NUMBER Pull Request resolved: #79873 Approved by: https://github.com/malfet * [MPS] Add argmin (#80828) This PR 1. adds argmin 2. refactors `reduction_type` in `ReduceOps.mm` with enum. Co-authored by Kulin Seth <kulinseth@gmail.com> Pull Request resolved: #80828 Approved by: https://github.com/malfet * [MPS] Fix LSTM batch_first output transposed (#80597) The output of LSTM with `batch_first` should be transposed back to batch first format. Fixes #80306 Pull Request resolved: #80597 Approved by: https://github.com/kulinseth * [MPS][BE] Introduce MPSUnaryCachedGraph (#81033) I.e. CachedGraph that has input and output tensors Also, add `MPSGraphCache::LookUpAs` template, which combines LookUp with static_cast to target type Pull Request resolved: #81033 Approved by: https://github.com/kulinseth * [MPS] Add test consistency from OpInfo based tests from PR 78504 (#79532) Pull Request resolved: #79532 Approved by: https://github.com/albanD, https://github.com/malfet * [MPS] Add huber loss (#80163) Fixes #ISSUE_NUMBER Pull Request resolved: #80163 Approved by: https://github.com/kulinseth, https://github.com/malfet * Remove two tests dependent on the MPS serialization checkin. * Fix lint error (FLAKE8) F401 * Remove the serialization test from test_mps as its support is not there in 1.12.1. Co-authored-by: Kulin Seth <kulinseth@gmail.com> Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com> Co-authored-by: Kulin Seth <kulin_seth@apple.com> Co-authored-by: Abhishek Pathak <abhipathak97@gmail.com> Co-authored-by: Nikita Shulga <nshulga@fb.com> Co-authored-by: qqaatw <qqaatw@gmail.com> Co-authored-by: Ramin Azarmehr <razarmehr@apple.com>

zou3519 added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: mps Related to Apple Metal Performance Shaders framework labels Jun 9, 2022

malfet added the high priority label Jun 13, 2022

pytorch-bot bot added the triage review label Jun 13, 2022

albanD removed the triage review label Jun 13, 2022

malfet mentioned this issue Jun 14, 2022

[MPS] Support stride of stride #79521

Closed

malfet added a commit that referenced this issue Jun 14, 2022

[MPS] Support stride of stride

0f38c9c

Fixes #79181 [ghstack-poisoned]

malfet added a commit that referenced this issue Jun 14, 2022

[MPS] Support stride of stride

ef2b6c3

Fixes #79181 ghstack-source-id: 50c6f28d6a6f0e4d31bb2cea180836f20cb7e88d Pull Request resolved: #79521

pytorchmergebot closed this as completed in 81cd276 Jun 14, 2022

kulinseth pushed a commit to kulinseth/pytorch that referenced this issue Jul 9, 2022

[MPS] Support stride of stride

b16b012

Fixes pytorch#79181 Pull Request resolved: pytorch#79521 Approved by: https://github.com/kulinseth

atalman pushed a commit to atalman/pytorch that referenced this issue Jul 22, 2022

[MPS] Support stride of stride

ffe06ac

Fixes pytorch#79181 Pull Request resolved: pytorch#79521 Approved by: https://github.com/kulinseth

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPSNDArray Error: buffer is not large enough #79181

MPSNDArray Error: buffer is not large enough #79181

theRoodest commented Jun 9, 2022 •

edited by pytorch-bot bot

Loading

malfet commented Jun 13, 2022 •

edited

Loading

malfet commented Jun 14, 2022

abhudev commented Jun 14, 2022

MPSNDArray Error: buffer is not large enough #79181

MPSNDArray Error: buffer is not large enough #79181

Comments

theRoodest commented Jun 9, 2022 • edited by pytorch-bot bot Loading

🐛 Describe the bug

Versions

malfet commented Jun 13, 2022 • edited Loading

malfet commented Jun 14, 2022

abhudev commented Jun 14, 2022

theRoodest commented Jun 9, 2022 •

edited by pytorch-bot bot

Loading

malfet commented Jun 13, 2022 •

edited

Loading