Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED #332

Closed
ouening opened this issue Jan 27, 2019 · 36 comments
Closed

Comments

@ouening
Copy link

ouening commented Jan 27, 2019

When I train voc data, the error happened. My GPU is RTX2080 8G * 2,tensorflow-gpu:1.12,keras2.2.4

Epoch 1/50 2019-01-28 00:16:00.441512: E tensorflow/stream_executor/cuda/cuda_blas.cc:652] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED Traceback (most recent call last): File "train.py", line 192, in <module> _main(annotation_path=anno) File "train.py", line 65, in _main callbacks=[logging, checkpoint]) File "/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1418, in fit_generator initial_epoch=initial_epoch) File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py", line 217, in fit_generator class_weight=class_weight) File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1217, in train_on_batch outputs = self.train_function(ins) File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__ return self._call(inputs) File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2675, in _call fetched = self._callable_fn(*array_vals) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1439, in __call__ run_metadata_ptr) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__ c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : m=346112, n=32, k=64 [[{{node conv2d_3/convolution}} = Conv2D[T=DT_FLOAT, _class=["loc:@batch_normalization_3/cond/FusedBatchNorm/Switch"], data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](leaky_re_lu_2/LeakyRelu, conv2d_3/kernel/read)]] [[{{node yolo_loss/while_1/LoopCond/_2963}} = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_6607_yolo_loss/while_1/LoopCond", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopyolo_loss/while_1/strided_slice_1/stack_2/_2805)]]

@ShuteLee
Copy link

Hey bro, have you figured it out? I met the same issue.

@bingoxumo
Copy link

i also met the same issue when i run the yolo-v3,did you solve this problem?

@ouening
Copy link
Author

ouening commented Mar 22, 2019

I met the same error! My GPU is RTX2080 8G * 2,tensorflow-gpu:1.12,keras2.2.4, Ubuntu18.04. Can somebody solve it?

@tak-s
Copy link

tak-s commented Mar 23, 2019

Try the following statement at the beginning of the code.

import keras.backend as K
cfg = K.tf.ConfigProto(gpu_options={'allow_growth': True})
K.set_session(K.tf.Session(config=cfg))

@ouening
Copy link
Author

ouening commented Mar 24, 2019

Try the following statement at the beginning of the code.

import keras.backend as K
cfg = K.tf.ConfigProto(gpu_options={'allow_growth': True})
K.set_session(K.tf.Session(config=cfg))

Hi, I still got some errors:
Load weights model_data/yolo_weights.h5. Freeze the first 249 layers of total 252 layers. Train on 3439 samples, val on 382 samples, with batch size 32. Epoch 1/50 2019-03-24 10:53:58.419070: E tensorflow/stream_executor/cuda/cuda_blas.cc:652] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED Traceback (most recent call last): File "train.py", line 206, in <module> _main() File "train.py", line 81, in _main verbose=1) File "/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1418, in fit_generator initial_epoch=initial_epoch) File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py", line 217, in fit_generator class_weight=class_weight) File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1217, in train_on_batch outputs = self.train_function(ins) File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__ return self._call(inputs) File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2675, in _call fetched = self._callable_fn(*array_vals) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1439, in __call__ run_metadata_ptr) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__ c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : m=1384448, n=32, k=64 [[{{node conv2d_3/convolution}} = Conv2D[T=DT_FLOAT, _class=["loc:@batch_normalization_3/cond/FusedBatchNorm/Switch"], data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](leaky_re_lu_2/LeakyRelu, conv2d_3/kernel/read)]] [[{{node yolo_loss/while_1/LoopCond/_2963}} = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_6607_yolo_loss/while_1/LoopCond", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopyolo_loss/while_1/strided_slice_1/stack_2/_2805)]]
any solution for it?

@HanGaaaaa
Copy link

hello i met the same error, my env is cuda9.0 cudnn 7.4 tensorflow-gpu1.12.0,my gpu is RTX 2080, this is my work computer, but my own computer has same env only gpu is 940 can run same project well,how can i do with this error,someone can help me?

@ShuteLee
Copy link

ShuteLee commented Apr 1, 2019 via email

@S0soo
Copy link

S0soo commented Apr 9, 2019

i also met same error, my gpu is RTX 2080ti, tensorflow-gpu 1.8.0, cuda 9.0, but in the GTX 1080ti, tensorflow-gpu 1.4.0, cuda 8.0, the program can run normally. Can someone give some advice? thanks

@ouening
Copy link
Author

ouening commented Apr 9, 2019

I have solved this problem:
Install patchs for cuda9, there are 4 patchs that can be download from website:cuda9 patchs

@ouening ouening closed this as completed Apr 9, 2019
@guolihong
Copy link

Hello! Did you solve it?How?

@ShuteLee
Copy link

ShuteLee commented Aug 4, 2019

Hello! Did you solve it?How?

I fixed this issue just by installing the CUDA Toolkit patch.
https://developer.nvidia.com/cuda-90-download-archive?target_os=Windows&target_arch=x86_64&target_version=10&target_type=exenetwork
(choose your CUDA version)

@zhixuanli
Copy link

I have installed the CUDA Toolkit patch but still having this problem

@yuanzhedong
Copy link

I have same issue, same code running on K80 but not RTX2080

@checko
Copy link

checko commented Dec 3, 2019

same issue on my Titan RTX.

@yuanzhedong
Copy link

It works after I update the tensorflow version from 1.13.1 to 1.14.

My cuda version is 10.0, cudnn version is 7.6.3, the gpu is RTX2080

@xiaohai-AI
Copy link

It works after I update the tensorflow version from 1.13.1 to 1.14.

My cuda version is 10.0, cudnn version is 7.6.3, the gpu is RTX2080

After I made a change follow the above I still got the problem like the following:

E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

@jinmingteo
Copy link

@xiaohai-AI try this

import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)

@kartikwar
Copy link

kartikwar commented Apr 29, 2020

It works after I update the tensorflow version from 1.13.1 to 1.14.

My cuda version is 10.0, cudnn version is 7.6.3, the gpu is RTX2080

I was also getting the same error in tensorflow -gpu 1.6.0 cuda 9.0. Upgrading to cuda 10.0 and tensorflow -gpu 1.14.0 . Solved the issue for me. Thanks @xiaohai-AI. Not sure why you are getting internal errot hough. Probably because you have two cuda versions or maybe because tensorflow is picking up wrong version of cudnn

@mfshiu
Copy link

mfshiu commented Dec 2, 2020

It works after I update the tensorflow version from 1.13.1 to 1.14.

My cuda version is 10.0, cudnn version is 7.6.3, the gpu is RTX2080

But my tensorflow is 1.15, cuda is 10.0, gpu is RTX 3080, still have the same issue.

@kartikwar
Copy link

hey @mfshiu maybe you can try cuda 10.0 with tensorflow-gpu 1.14

@allenyllee
Copy link

hi @mfshiu, NVIDIA maintains its own version of tensorflow 1.15 here: https://github.com/NVIDIA/tensorflow#install , which support latest gpu card.

So, you need to remove official tensorflow which installed through pip or conda, and install nvidia's version, as its README.md says:

install the NVIDIA wheel index:

$ pip install --user nvidia-pyindex

install the current NVIDIA Tensorflow release:

$ pip install --user nvidia-tensorflow[horovod]

after installed, just use it as regular tensorflow:

import tensorflow as tf

@drscotthawley
Copy link

Hey @allenyllee I wonder if you might be able to clarify or help: When I follow those install instructions for the NVIDIA-tensorflow, I get a long error that tells me...to re-do what I just did?

$ pip install --user nvidia-pyindex
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting nvidia-pyindex
  Downloading nvidia-pyindex-1.0.6.tar.gz (6.7 kB)
Building wheels for collected packages: nvidia-pyindex
  Building wheel for nvidia-pyindex (setup.py) ... done
  Created wheel for nvidia-pyindex: filename=nvidia_pyindex-1.0.6-py3-none-any.whl size=4171 sha256=692df4078194418f4812516403399f2e96373ad780b93c98ce944b5f02efb35d
  Stored in directory: /tmp/pip-ephem-wheel-cache-kpx26e3z/wheels/52/31/c8/db9f8939a8bb1f3500ce81b630604cbfa6e31f82c8f1bd914d
Successfully built nvidia-pyindex
Installing collected packages: nvidia-pyindex
Successfully installed nvidia-pyindex-1.0.6

$ pip install --user nvidia-tensorflow[horovod]
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting nvidia-tensorflow[horovod]
  Downloading nvidia-tensorflow-0.0.1.dev4.tar.gz (3.8 kB)
    ERROR: Command errored out with exit status 1:
     command: /home/shawley/anaconda3/envs/spnet/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-yv_vnm57/nvidia-tensorflow/setup.py'"'"'; __file__='"'"'/tmp/pip-install-yv_vnm57/nvidia-tensorflow/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-1hvhhg4h
         cwd: /tmp/pip-install-yv_vnm57/nvidia-tensorflow/
    Complete output (17 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-yv_vnm57/nvidia-tensorflow/setup.py", line 150, in <module>
        raise RuntimeError(open("ERROR.txt", "r").read())
    RuntimeError:
    ###########################################################################################
    The package you are trying to install is only a placeholder project on PyPI.org repository.
    This package is hosted on NVIDIA Python Package Index.
    
    This package can be installed as:
    ```
    $ pip install nvidia-pyindex
    $ pip install nvidia-tensorflow
    ```
    
    Please refer to NVIDIA instructions: https://github.com/NVIDIA/tensorflow#install.
    ###########################################################################################
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Re-running those "This package can be installed as:" commands just results in the same error message again.

@drscotthawley
Copy link

Resolved this issue for myself: Be sure you're running Python 3.8 and Pip 20 or later.

@GuillaumeMougeot
Copy link

I had the same problem with an RTX 3090 + TF 1.15. I resolved my problem by using the official nvidia+tf1 ngc docker container, available here: https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow

@seongyeop-jeong-poey
Copy link

I had the same problem with an RTX 3090 + TF 1.15. I resolved my problem by using the official nvidia+tf1 ngc docker container, available here: https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow

It works very well to me, in my case with RTX 3090 +TF 1.15, nvidia+tf1 ngc docker container version '21.05-tf1-py3' works very well! Thanks alot.

@bing-0906
Copy link

It works after I update the tensorflow version from 1.13.1 to 1.14.
My cuda version is 10.0, cudnn version is 7.6.3, the gpu is RTX2080

But my tensorflow is 1.15, cuda is 10.0, gpu is RTX 3080, still have the same issue.

me too!!!!!!. have you solved this problem?

@seongyeop-jeong-poey
Copy link

It works after I update the tensorflow version from 1.13.1 to 1.14.
My cuda version is 10.0, cudnn version is 7.6.3, the gpu is RTX2080

But my tensorflow is 1.15, cuda is 10.0, gpu is RTX 3080, still have the same issue.

me too!!!!!!. have you solved this problem?

please find a version that matches your GPU version in nvidia-docker hub

@kwshh
Copy link

kwshh commented Dec 23, 2021

i found the same question on a10 GPU, that 30-, a10, a100, etc. which compute capacity is more than 8.0 must use CUDA11.x, so you could't use tensorflow1.x which match CUDA10 or lower.
some solution is that, use nvidia-tensorflow1.x and could use CUDA11.x to accelerate. download here: https://github.com/NVIDIA/tensorflow#install
thanks to @allenyllee.

@serdarildercaglar
Copy link

Problem fixed after installed
!pip install nvidia-pyindex
!pip install nvidia-tensorflow

@Fay-why
Copy link

Fay-why commented Aug 17, 2022

hi @mfshiu, NVIDIA maintains its own version of tensorflow 1.15 here: https://github.com/NVIDIA/tensorflow#install , which support latest gpu card.

So, you need to remove official tensorflow which installed through pip or conda, and install nvidia's version, as its README.md says:

install the NVIDIA wheel index:

$ pip install --user nvidia-pyindex

install the current NVIDIA Tensorflow release:

$ pip install --user nvidia-tensorflow[horovod]

after installed, just use it as regular tensorflow:

import tensorflow as tf

It works for me!!! Thanks a lot~ The tf version of NVIDA is 1.15, but luckily my codes can run successfully on tf==1.15~
Btw,my error environment are "tf==1.12.0, 3090, cuda==9.0, ubuntu20.04".

@Fay-why
Copy link

Fay-why commented Aug 17, 2022

Problem fixed after installed !pip install nvidia-pyindex !pip install nvidia-tensorflow

Thanks! It works for me~

@qingjiesjtu
Copy link

Cool!! It fixes perfectly my issue! Thanks!

@Guo986
Copy link

Guo986 commented Jan 29, 2023

Yes! Yes!!!
Remove official tensorflow. Python3.8

pip install nvidia-pyindex
pip install nvidia-tensorflow

I used A6000, tf1.15, cuda10.0.130, cudnn7.3.1, and TF website let me use python 3.6 or 3.7, that's what I did before.
But!!!
For using nvidia-pyindex and nvidia-tensorflow, I need to change python to 3.8.
And I succeed!!!

@wowo68
Copy link

wowo68 commented Jan 29, 2023 via email

@zhang159560293
Copy link

hi @mfshiu, NVIDIA maintains its own version of tensorflow 1.15 here: https://github.com/NVIDIA/tensorflow#install , which support latest gpu card.你好,NVIDIA在这里维护自己的tensorflow 1.15版本:https://github.com/NVIDIA/tensorflow#install,它支持最新的gpu卡。

So, you need to remove official tensorflow which installed through pip or conda, and install nvidia's version, as its README.md says:因此,您需要删除通过pip或conda安装的官方tensorflow,并安装nvidia的版本,如其README.md所述:

install the NVIDIA wheel index:安装 NVIDIA 轮索引:

$ pip install --user nvidia-pyindex

install the current NVIDIA Tensorflow release:安装当前的 NVIDIA Tensorflow 版本:

$ pip install --user nvidia-tensorflow[horovod]

after installed, just use it as regular tensorflow:安装后,只需将其用作常规张量流即可:

import tensorflow as tf

Thanks! Very Thanks! It has solved my problems.
InternalError (see above for traceback): Blas xGEMMBatched launch failed : a.shape=[128,3,3], b.shape=[128,3,3], m=3, n=3, k=3, batch_size=128
[[node rotation/MatMul_1 ...... = BatchMatMul[T=DT_DOUBLE, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](rotation/concat_7, rotation/concat_7)]]
[[{{node gradients/decoder/dgcnn_trans_fc1/MatMul_grad/tuple/control_dependency_1/_171}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2202_...pendency_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

By the way, my device is A6000 and 4090 all have this problem, and now solved it , my tensorflow is 1.12.0. cuda is 9.0

@wowo68
Copy link

wowo68 commented Mar 13, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests