Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cudnn PoolForward launch failed exception #98

Open
hzlmn opened this issue Aug 10, 2020 · 2 comments
Open

Cudnn PoolForward launch failed exception #98

hzlmn opened this issue Aug 10, 2020 · 2 comments

Comments

@hzlmn
Copy link

hzlmn commented Aug 10, 2020

Hello, thanks for your work on package. We periodically get such exceptions with cudnn. Any hints what can cause such problem?
Env:
tensorflow-gpu==1.14
cuda 10.1
cudnn 7.6.5.32
mtcnn==0.0.9

tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: cudnn PoolForward launch failed
	 [[node rnet/pool1 (defined at usr/local/lib/python3.6/dist-packages/mtcnn/layer_factory.py:175) ]]
  (1) Internal: cudnn PoolForward launch failed
	 [[node rnet/pool1 (defined at usr/local/lib/python3.6/dist-packages/mtcnn/layer_factory.py:175) ]]
	 [[rnet/prob1/_111]]
0 successful operations.
0 derived errors ignored.
11:39
"caught error while running engine ops
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: cudnn PoolForward launch failed
	 [[{{node rnet/pool1}}]]
  (1) Internal: cudnn PoolForward launch failed
	 [[{{node rnet/pool1}}]]
	 [[rnet/prob1/_111]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/app/leapi/worker/pipeline_item_celery.py", line 118, in run_engine_proc
    out_payload = engine.run_ops(task.pipeline.operations, payload)
  File "/app/leapi/pipeline/engine.py", line 80, in run_ops
    payload = self.run_op(op, payload, warmup)
  File "/app/leapi/pipeline/engine.py", line 87, in run_op
    payload_out = f(payload, **op._kwargs)
  File "/app/leapi/util/timeit.py", line 10, in timed
    result = f(*args, **kwargs)
  File "/app/leapi/pipeline/neural.py", line 949, in resize_upscale_with_faces
    p = self.detect_and_extract_faces(p, face_method=face_method)
  File "/app/leapi/util/timeit.py", line 10, in timed
    result = f(*args, **kwargs)
  File "/app/leapi/pipeline/neural.py", line 559, in detect_and_extract_faces
    faces_json = self.mtcnn_detector.detect_faces(win)
  File "/usr/local/lib/python3.6/dist-packages/mtcnn/mtcnn.py", line 418, in detect_faces
    result = stage(img, result[0], result[1])
  File "/usr/local/lib/python3.6/dist-packages/mtcnn/mtcnn.py", line 528, in __stage2
    out = self.__rnet.feed(tempimg1)
  File "/usr/local/lib/python3.6/dist-packages/mtcnn/network.py", line 108, in feed
    return self._feed(image)
  File "/usr/local/lib/python3.6/dist-packages/mtcnn/mtcnn.py", line 103, in _feed
    return self._session.run(['rnet/fc2-2/fc2-2:0', 'rnet/prob1:0'], feed_dict={'rnet/input:0': image})
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: cudnn PoolForward launch failed
	 [[node rnet/pool1 (defined at usr/local/lib/python3.6/dist-packages/mtcnn/layer_factory.py:175) ]]
  (1) Internal: cudnn PoolForward launch failed
	 [[node rnet/pool1 (defined at usr/local/lib/python3.6/dist-packages/mtcnn/layer_factory.py:175) ]]
	 [[rnet/prob1/_111]]
0 successful operations.
0 derived errors ignored.
Errors may have originated from an input operation.
@owlhtchen
Copy link

I also got "tensorflow.python.framework.errors_impl.InternalError: cudnn PoolForward launch failed", did you end up fixing this issue? I am using tensorflow-gpu 1.12, cuda 9.0, cudnn 7.6.5.

@hzlmn
Copy link
Author

hzlmn commented Mar 5, 2022

To be honest, i did not remember :) i guess i ended up playing with versions and env config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants