-
Notifications
You must be signed in to change notification settings - Fork 15
-
Notifications
You must be signed in to change notification settings - Fork 15
Error "illegal memory access was encountered" during U-Net training #1
Comments
Did you also update cudnn torch package after upgrading? |
Yes, I even reinstalled Torch from the scratch. |
With default configuration of 64 batch size it takes around 6GB of GPU Space. And around 8 GB of Memory space. |
I have enough memory on my GPU - 12G. I tried batch size of 1, but got the same problem. |
require 'nn'
require 'cunn'
softOutCalc = nn.Sequential():add(nn.SpatialSoftMax())
softOutCalc = softOutCalc:cuda()
ips = torch.rand(8,2,80,80)
ips = ips:cuda()
softOutCalc:forward(ips) Can you check if this piece of code works? |
The same error!
|
I've run test.sh from torch installation, and have FAILED tests.
....
Googling for this. |
These 2 failed tests seem unimportant. |
torch/cunn#292 - I had similar problem. The issue is with the installation of torch and cunn and its libraries. It got solved as we upgraded all the drivers. You can reopen the issue and let us see if we get any support. |
Could you tell the version of drivers you use, please? |
Our Setup: Ubuntu 14.04, TITAN X, CUDA 7.5, CuDNN V5 and nvidia drivers with
|
I use exactly the same setup, except older driver v. 352.93. Going to update. Thanks for your significant assist in finding the root cause of the problem. Looking forward to run U-Net training. |
I too got the same error even with the latest driver. We have reopened cunn issue. I have made a quick fix/hack to get it working. Can you check if it works now? |
Your fix helped, without Nvidia driver update. Thanks! |
Thanks so much for sharing your code! I'm trying to run it from the start, but have a problem during training phase. Appreciate you support in finding a root cause.
The command I run to train U-NET, paths are adjusted for the defaults:
$ th main.lua
produces error log
Environment: Ubuntu 14.04, Titan X, CUDA 7.5, cuDNN v.5
Possible root causes:
SpatialMaxPooling
module, following this discussion https://groups.google.com/forum/m/#!msg/torch7/Ru-I6vP2ql0/s2vOsKoVBgAJFinally, I simplified the NN to include no modules, but the problem persists. So, the
SpatialMaxPooling
is not problematic.Thanks!
The text was updated successfully, but these errors were encountered: