-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Description
🐛 Describe the bug
This is related to #84039 and #86279. The error is slightly different that the first (232 vs 231) and the second is specific to matmul. The machine has 64 GB of unified RAM and is not a physical limitation.
I discovered this bug when attempting to perform LoRa on Llama 8B. I can replicate the error by using a similar script as shown with the first bug report.
from torch import einsum, ones
import argparse
parser = argparse.ArgumentParser(description='mpsndarray test')
parser.add_argument('--n_samples', type=int, default=2)
args = parser.parse_args()
n_samples = args.n_samples
einsum('b i d, b j d -> b i j', ones(16 * n_samples, 4096, 40, device='mps'), ones(16 * n_samples, 4096, 40, device='mps')).shape
print(n_samples, 'passed')
I receive the error when running python repro.py --n_samples 17 or higher values.
Versions
PyTorch version: 2.3.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 14.5 (arm64)
GCC version: Could not collect
Clang version: 15.0.0 (clang-1500.3.9.4)
CMake version: Could not collect
Libc version: N/A
Python version: 3.10.14 (main, Aug 19 2024, 18:36:18) [Clang 15.0.0 (clang-1500.3.9.4)] (64-bit runtime)
Python platform: macOS-14.5-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Apple M2 Ultra
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] onnx==1.16.2
[pip3] torch==2.3.1
[pip3] torchaudio==2.3.1
[pip3] torchvision==0.18.1
[conda] Could not collect