CUDNN_STATUS_MAPPING_ERROR #2124

ErnstTmp · 2016-03-29T19:17:38Z

Hello,

I have a larger Graph-Model that runs fine with a sequence_length of 500 . If I change the sequence_length to 5000, I get a CUDNN_STATUS_MAPPING_ERROR. I tried it twice, and the error happens at exactly the same iteration, stacktrace is below.

The GPU is a Titan X with 12. GB Memory

What can I do to trace the error further?

Thanks, Ernst

Epoch 4096/10000
1/2 [==============>...............] - ETA: 2s - loss: 0.3742Traceback (most recent call last):
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Theano-0.8.0.dev0-py3.5.egg/theano/compile/function_module.py", line 859, in call
outputs = self.fn()
RuntimeError: GpuDnnConvGradI: error doing operation: CUDNN_STATUS_MAPPING_ERROR

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "UFCNN1_5000.py", line 977, in
nb_epoch=epoch)
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Keras-0.3.2-py3.5.egg/keras/models.py", line 1795, in fit_generator
accuracy=show_accuracy)
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Keras-0.3.2-py3.5.egg/keras/models.py", line 1475, in train_on_batch
return self._train(ins)
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Keras-0.3.2-py3.5.egg/keras/backend/theano_backend.py", line 450, in call
return self.function(*inputs)
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Theano-0.8.0.dev0-py3.5.egg/theano/compile/function_module.py", line 871, in call
storage_map=getattr(self.fn, 'storage_map', None))
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Theano-0.8.0.dev0-py3.5.egg/theano/gof/link.py", line 314, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Theano-0.8.0.dev0-py3.5.egg/theano/compile/function_module.py", line 859, in call
outputs = self.fn()
RuntimeError: GpuDnnConvGradI: error doing operation: CUDNN_STATUS_MAPPING_ERROR
Apply node that caused the error: GpuDnnConvGradI{algo='time_once', inplace=True}(GpuContiguous.0, GpuContiguous.0, GpuAllocEmpty.0, GpuDnnConvDesc{border_mode='full', subsample=(1, 1), conv_mode='conv', precision='float32'}.0, Constant{1.0}, Constant{0.0})
Toposort index: 283
Inputs types: [CudaNdarrayType(float32, 4D), CudaNdarrayType(float32, 4D), CudaNdarrayType(float32, (False, False, False, True)), <theano.gof.type.CDataType object at 0x7fba712634a8>, Scalar(float32), Scalar(float32)]
Inputs shapes: [(3, 150, 5000, 1), (1, 3, 58460, 1), (1, 150, 53461, 1), 'No shapes', (), ()]
Inputs strides: [(750000, 5000, 1, 0), (0, 58460, 1, 0), (0, 53461, 1, 0), 'No strides', (), ()]
Inputs values: ['not shown', 'not shown', 'not shown', <capsule object NULL at 0x7fba5a029390>, 1.0, 0.0]
Inputs name: ('kernel', 'grad', 'output', 'descriptor', 'alpha', 'beta')

Outputs clients: [[GpuDimShuffle{0,1,2,x}(GpuDnnConvGradI{algo='time_once', inplace=True}.0)]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

NasenSpray · 2016-03-29T19:49:54Z

An access to GPU memory space failed, which is
usually caused by a failure to bind a texture.
To correct: prior to the function call, unbind any
previously bound textures.
Otherwise, this may indicate an internal error/bug
in the library.

Report it to Nvidia.

nouiz · 2016-03-29T20:29:49Z

try lib.cnmem=0.9 or lower then the value you used.

On Tue, Mar 29, 2016 at 3:49 PM, NasenSpray notifications@github.com
wrote:

An access to GPU memory space failed, which is
usually caused by a failure to bind a texture.
To correct: prior to the function call, unbind any
previously bound textures.
Otherwise, this may indicate an internal error/bug
in the library.

Report it to Nvidia.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#2124 (comment)

ErnstTmp · 2016-03-29T20:46:43Z

Frederic,

thank you very much, I 'll try that now!

Thanks
Ernst

On 03/29/2016 10:30 PM, Frédéric Bastien wrote:

try lib.cnmem=0.9 or lower then the value you used.

On Tue, Mar 29, 2016 at 3:49 PM, NasenSpray notifications@github.com
wrote:

An access to GPU memory space failed, which is
usually caused by a failure to bind a texture.
To correct: prior to the function call, unbind any
previously bound textures.
Otherwise, this may indicate an internal error/bug
in the library.

Report it to Nvidia.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#2124 (comment)

—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#2124 (comment)

fluency03 · 2016-03-30T12:39:41Z

Maybe you are running out of the GPU memory. Set cnmem lower.

ErnstTmp · 2016-03-30T15:40:15Z

Chang,

thank you very much, I 'll do that!

Thanks,
Ernst

On 03/30/2016 02:40 PM, Chang Liu wrote:

You are running our of the GPU memory. Set cnmem lower.

—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#2124 (comment)

ErnstTmp · 2016-04-02T19:42:51Z

Hi Guys,
thank you very much for your help.

in my current optimisation, I had this error at epoch 410 with CNMeM Limit 90 %.

I changed that to 80 % and rerun the same optimisation, and I got the same error at the same iteration, see below. So it might look like the CNMEM setting is understood by the GPU, but changing it did not change the error.

Kind regards
Ernst

Epoch 410/500
11/20 [===============>..............] - ETA: 26s - loss: 0.5517 - acc: 0.7910Using Theano backend.
Using gpu device 0: GeForce GTX TITAN X (CNMeM is enabled with initial size: 80.0% of memory, CuDNN 4007)
Traceback (most recent call last):
File "/home/ernst/anaconda2/envs/anaconda3/lib/python3.5/site-packages/Theano-0.8.0-py3.5.egg/theano/compile/function_module.py", line 859, in call
outputs = self.fn()

ErnstTmp · 2016-04-24T20:06:21Z

For the record: switching to CUDNN 5.0, CUDA 7.5.18 and Theano 0.9 dev (at ubuntu 14.04) seems to have removed the problem.

Thanks,
Ernst

NoushNabi · 2018-11-07T23:37:07Z

I faced the same error and figured out my pytorch version does not match CUDA version on my machine. I installed newer version of pytorch and it worked.
CUDA version 9.2.88
pytorch 0.4.1

ErnstTmp closed this as completed Apr 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDNN_STATUS_MAPPING_ERROR #2124

CUDNN_STATUS_MAPPING_ERROR #2124

ErnstTmp commented Mar 29, 2016

NasenSpray commented Mar 29, 2016

nouiz commented Mar 29, 2016

ErnstTmp commented Mar 29, 2016

fluency03 commented Mar 30, 2016

ErnstTmp commented Mar 30, 2016

ErnstTmp commented Apr 2, 2016

ErnstTmp commented Apr 24, 2016

NoushNabi commented Nov 7, 2018 •

edited

CUDNN_STATUS_MAPPING_ERROR #2124

CUDNN_STATUS_MAPPING_ERROR #2124

Comments

ErnstTmp commented Mar 29, 2016

NasenSpray commented Mar 29, 2016

nouiz commented Mar 29, 2016

ErnstTmp commented Mar 29, 2016

fluency03 commented Mar 30, 2016

ErnstTmp commented Mar 30, 2016

ErnstTmp commented Apr 2, 2016

ErnstTmp commented Apr 24, 2016

NoushNabi commented Nov 7, 2018 • edited

NoushNabi commented Nov 7, 2018 •

edited