-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Memory access fault by GPU node-1" error in Conv3d. #718
Comments
This looks like an issue with MIOpen. Transferring over. |
@daniellowell can reproduce issue. Logging shows it is this rocBLAS call:
|
@amcamd Can you test the two configs above. @singvision is seeing a segfault. It is pointing to rocBLAS, but it could be the way MIOpen is configuring the parameters. |
/cc @bragadeesh |
any progress on this issue? @daniellowell @amcamd |
problem still exist on ROCM 2.10. |
is there someone following up? I encounter this error, too. is it a bug? |
yes, I send back my Radeon VII to seller and have switched to RTX 2070 because this problem. |
🐛 Bug
Got "Memory access fault by GPU node-1" when training my model, now I can reproduce the problem in a very simple script.
the env is ROCM 2.9.6, Radeon VII, I compiled pytorch from the most recent source on master branch.
details as follow.
To Reproduce
import torch
import torch.nn as nn
t=torch.rand(2,32,64,128,160).to('cuda')
t2=nn.Conv3d(32, 16, kernel_size=3, stride=1, padding=1, bias=False).to('cuda')(t) #error occurs.
Python 3.7.5 (default, Oct 25 2019, 15:51:11)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
Environment
ROCM Version: 2.9.6
PyTorch version: 1.4.0a0+21ab112
Is debug build: No
CUDA used to build PyTorch: Could not collect
OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.12.0
Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
Versions of relevant libraries:
[pip] numpy==1.17.3
[pip] torch==1.4.0a0+21ab112
[pip] torchvision==0.2.0
[conda] mkl 2019.4 243
[conda] mkl-include 2019.4 243
The text was updated successfully, but these errors were encountered: