Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDNN_STATUS_MAPPING_ERROR #2124

Closed
ErnstTmp opened this issue Mar 29, 2016 · 8 comments
Closed

CUDNN_STATUS_MAPPING_ERROR #2124

ErnstTmp opened this issue Mar 29, 2016 · 8 comments

Comments

@ErnstTmp
Copy link

Hello,

I have a larger Graph-Model that runs fine with a sequence_length of 500 . If I change the sequence_length to 5000, I get a CUDNN_STATUS_MAPPING_ERROR. I tried it twice, and the error happens at exactly the same iteration, stacktrace is below.

The GPU is a Titan X with 12. GB Memory

What can I do to trace the error further?

Thanks, Ernst

Epoch 4096/10000
1/2 [==============>...............] - ETA: 2s - loss: 0.3742Traceback (most recent call last):
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Theano-0.8.0.dev0-py3.5.egg/theano/compile/function_module.py", line 859, in call
outputs = self.fn()
RuntimeError: GpuDnnConvGradI: error doing operation: CUDNN_STATUS_MAPPING_ERROR

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "UFCNN1_5000.py", line 977, in
nb_epoch=epoch)
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Keras-0.3.2-py3.5.egg/keras/models.py", line 1795, in fit_generator
accuracy=show_accuracy)
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Keras-0.3.2-py3.5.egg/keras/models.py", line 1475, in train_on_batch
return self._train(ins)
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Keras-0.3.2-py3.5.egg/keras/backend/theano_backend.py", line 450, in call
return self.function(*inputs)
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Theano-0.8.0.dev0-py3.5.egg/theano/compile/function_module.py", line 871, in call
storage_map=getattr(self.fn, 'storage_map', None))
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Theano-0.8.0.dev0-py3.5.egg/theano/gof/link.py", line 314, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Theano-0.8.0.dev0-py3.5.egg/theano/compile/function_module.py", line 859, in call
outputs = self.fn()
RuntimeError: GpuDnnConvGradI: error doing operation: CUDNN_STATUS_MAPPING_ERROR
Apply node that caused the error: GpuDnnConvGradI{algo='time_once', inplace=True}(GpuContiguous.0, GpuContiguous.0, GpuAllocEmpty.0, GpuDnnConvDesc{border_mode='full', subsample=(1, 1), conv_mode='conv', precision='float32'}.0, Constant{1.0}, Constant{0.0})
Toposort index: 283
Inputs types: [CudaNdarrayType(float32, 4D), CudaNdarrayType(float32, 4D), CudaNdarrayType(float32, (False, False, False, True)), <theano.gof.type.CDataType object at 0x7fba712634a8>, Scalar(float32), Scalar(float32)]
Inputs shapes: [(3, 150, 5000, 1), (1, 3, 58460, 1), (1, 150, 53461, 1), 'No shapes', (), ()]
Inputs strides: [(750000, 5000, 1, 0), (0, 58460, 1, 0), (0, 53461, 1, 0), 'No strides', (), ()]
Inputs values: ['not shown', 'not shown', 'not shown', <capsule object NULL at 0x7fba5a029390>, 1.0, 0.0]
Inputs name: ('kernel', 'grad', 'output', 'descriptor', 'alpha', 'beta')

Outputs clients: [[GpuDimShuffle{0,1,2,x}(GpuDnnConvGradI{algo='time_once', inplace=True}.0)]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

@NasenSpray
Copy link

An access to GPU memory space failed, which is
usually caused by a failure to bind a texture.
To correct: prior to the function call, unbind any
previously bound textures.
Otherwise, this may indicate an internal error/bug
in the library.

Report it to Nvidia.

@nouiz
Copy link
Contributor

nouiz commented Mar 29, 2016

try lib.cnmem=0.9 or lower then the value you used.

On Tue, Mar 29, 2016 at 3:49 PM, NasenSpray notifications@github.com
wrote:

An access to GPU memory space failed, which is
usually caused by a failure to bind a texture.
To correct: prior to the function call, unbind any
previously bound textures.
Otherwise, this may indicate an internal error/bug
in the library.

Report it to Nvidia.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#2124 (comment)

@ErnstTmp
Copy link
Author

Frederic,

thank you very much, I 'll try that now!

Thanks
Ernst

On 03/29/2016 10:30 PM, Frédéric Bastien wrote:

try lib.cnmem=0.9 or lower then the value you used.

On Tue, Mar 29, 2016 at 3:49 PM, NasenSpray notifications@github.com
wrote:

An access to GPU memory space failed, which is
usually caused by a failure to bind a texture.
To correct: prior to the function call, unbind any
previously bound textures.
Otherwise, this may indicate an internal error/bug
in the library.

Report it to Nvidia.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#2124 (comment)


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#2124 (comment)

@fluency03
Copy link

Maybe you are running out of the GPU memory. Set cnmem lower.

@ErnstTmp
Copy link
Author

Chang,

thank you very much, I 'll do that!

Thanks,
Ernst

On 03/30/2016 02:40 PM, Chang Liu wrote:

You are running our of the GPU memory. Set cnmem lower.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#2124 (comment)

@ErnstTmp
Copy link
Author

ErnstTmp commented Apr 2, 2016

Hi Guys,
thank you very much for your help.

in my current optimisation, I had this error at epoch 410 with CNMeM Limit 90 %.

I changed that to 80 % and rerun the same optimisation, and I got the same error at the same iteration, see below. So it might look like the CNMEM setting is understood by the GPU, but changing it did not change the error.

Kind regards
Ernst

Epoch 410/500
11/20 [===============>..............] - ETA: 26s - loss: 0.5517 - acc: 0.7910Using Theano backend.
Using gpu device 0: GeForce GTX TITAN X (CNMeM is enabled with initial size: 80.0% of memory, CuDNN 4007)
Traceback (most recent call last):
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Theano-0.8.0-py3.5.egg/theano/compile/function_module.py", line 859, in call
outputs = self.fn()

@ErnstTmp
Copy link
Author

For the record: switching to CUDNN 5.0, CUDA 7.5.18 and Theano 0.9 dev (at ubuntu 14.04) seems to have removed the problem.

Thanks,
Ernst

@NoushNabi
Copy link

NoushNabi commented Nov 7, 2018

I faced the same error and figured out my pytorch version does not match CUDA version on my machine. I installed newer version of pytorch and it worked.
CUDA version 9.2.88
pytorch 0.4.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants