failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED #332

ouening · 2019-01-27T16:21:03Z

When I train voc data, the error happened. My GPU is RTX2080 8G * 2，tensorflow-gpu:1.12，keras2.2.4

Epoch 1/50 2019-01-28 00:16:00.441512: E tensorflow/stream_executor/cuda/cuda_blas.cc:652] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED Traceback (most recent call last): File "train.py", line 192, in <module> _main(annotation_path=anno) File "train.py", line 65, in _main callbacks=[logging, checkpoint]) File "/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1418, in fit_generator initial_epoch=initial_epoch) File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py", line 217, in fit_generator class_weight=class_weight) File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1217, in train_on_batch outputs = self.train_function(ins) File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__ return self._call(inputs) File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2675, in _call fetched = self._callable_fn(*array_vals) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1439, in __call__ run_metadata_ptr) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__ c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : m=346112, n=32, k=64 [[{{node conv2d_3/convolution}} = Conv2D[T=DT_FLOAT, _class=["loc:@batch_normalization_3/cond/FusedBatchNorm/Switch"], data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](leaky_re_lu_2/LeakyRelu, conv2d_3/kernel/read)]] [[{{node yolo_loss/while_1/LoopCond/_2963}} = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_6607_yolo_loss/while_1/LoopCond", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopyolo_loss/while_1/strided_slice_1/stack_2/_2805)]]

The text was updated successfully, but these errors were encountered:

ShuteLee · 2019-03-18T12:17:08Z

Hey bro, have you figured it out? I met the same issue.

bingoxumo · 2019-03-20T15:52:22Z

i also met the same issue when i run the yolo-v3,did you solve this problem?

ouening · 2019-03-22T17:03:48Z

I met the same error! My GPU is RTX2080 8G * 2，tensorflow-gpu:1.12，keras2.2.4, Ubuntu18.04. Can somebody solve it?

tak-s · 2019-03-23T04:31:27Z

Try the following statement at the beginning of the code.

import keras.backend as K
cfg = K.tf.ConfigProto(gpu_options={'allow_growth': True})
K.set_session(K.tf.Session(config=cfg))

ouening · 2019-03-24T02:56:07Z

Try the following statement at the beginning of the code.
import keras.backend as K
cfg = K.tf.ConfigProto(gpu_options={'allow_growth': True})
K.set_session(K.tf.Session(config=cfg))

Hi, I still got some errors:
Load weights model_data/yolo_weights.h5. Freeze the first 249 layers of total 252 layers. Train on 3439 samples, val on 382 samples, with batch size 32. Epoch 1/50 2019-03-24 10:53:58.419070: E tensorflow/stream_executor/cuda/cuda_blas.cc:652] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED Traceback (most recent call last): File "train.py", line 206, in <module> _main() File "train.py", line 81, in _main verbose=1) File "/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1418, in fit_generator initial_epoch=initial_epoch) File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py", line 217, in fit_generator class_weight=class_weight) File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1217, in train_on_batch outputs = self.train_function(ins) File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__ return self._call(inputs) File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2675, in _call fetched = self._callable_fn(*array_vals) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1439, in __call__ run_metadata_ptr) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__ c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : m=1384448, n=32, k=64 [[{{node conv2d_3/convolution}} = Conv2D[T=DT_FLOAT, _class=["loc:@batch_normalization_3/cond/FusedBatchNorm/Switch"], data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](leaky_re_lu_2/LeakyRelu, conv2d_3/kernel/read)]] [[{{node yolo_loss/while_1/LoopCond/_2963}} = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_6607_yolo_loss/while_1/LoopCond", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopyolo_loss/while_1/strided_slice_1/stack_2/_2805)]]
any solution for it?

HanGaaaaa · 2019-03-29T02:49:19Z

hello i met the same error, my env is cuda9.0 cudnn 7.4 tensorflow-gpu1.12.0,my gpu is RTX 2080, this is my work computer, but my own computer has same env only gpu is 940 can run same project well,how can i do with this error,someone can help me?

ShuteLee · 2019-04-01T05:14:17Z

I think that it is a bug of RTX 2080 and I have not figured it out. If you get some progress about this issue, get in touch with me please. Thanks a lot 发自我的 iPhone

…

在 2019年3月29日，上午10:49，HanGaaaaa ***@***.***> 写道： hello i met the same error, my env is cuda9.0 cudnn 7.4 tensorflow-gpu1.12.0,my gpu is RTX 2080, this is my work computer, but my own computer has same env only gpu is 940 can run same project well,how can i do with this error,someone can help me? — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

S0soo · 2019-04-09T07:48:59Z

i also met same error, my gpu is RTX 2080ti, tensorflow-gpu 1.8.0, cuda 9.0, but in the GTX 1080ti, tensorflow-gpu 1.4.0, cuda 8.0, the program can run normally. Can someone give some advice? thanks

ouening · 2019-04-09T08:08:28Z

I have solved this problem:
Install patchs for cuda9, there are 4 patchs that can be download from website:cuda9 patchs

guolihong · 2019-08-04T08:50:31Z

Hello! Did you solve it?How?

ShuteLee · 2019-08-04T09:53:32Z

Hello! Did you solve it?How?

I fixed this issue just by installing the CUDA Toolkit patch.
https://developer.nvidia.com/cuda-90-download-archive?target_os=Windows&target_arch=x86_64&target_version=10&target_type=exenetwork
(choose your CUDA version)

zhixuanli · 2019-09-29T07:47:38Z

I have installed the CUDA Toolkit patch but still having this problem

yuanzhedong · 2019-10-24T23:37:17Z

I have same issue, same code running on K80 but not RTX2080

checko · 2019-12-03T10:41:26Z

same issue on my Titan RTX.

yuanzhedong · 2019-12-04T06:20:00Z

It works after I update the tensorflow version from 1.13.1 to 1.14.

My cuda version is 10.0, cudnn version is 7.6.3, the gpu is RTX2080

xiaohai-AI · 2020-04-01T21:01:21Z

It works after I update the tensorflow version from 1.13.1 to 1.14.

My cuda version is 10.0, cudnn version is 7.6.3, the gpu is RTX2080

After I made a change follow the above I still got the problem like the following:

E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

jinmingteo · 2020-04-21T09:58:53Z

@xiaohai-AI try this

import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)

kartikwar · 2020-04-29T08:01:10Z

It works after I update the tensorflow version from 1.13.1 to 1.14.

My cuda version is 10.0, cudnn version is 7.6.3, the gpu is RTX2080

I was also getting the same error in tensorflow -gpu 1.6.0 cuda 9.0. Upgrading to cuda 10.0 and tensorflow -gpu 1.14.0 . Solved the issue for me. Thanks @xiaohai-AI. Not sure why you are getting internal errot hough. Probably because you have two cuda versions or maybe because tensorflow is picking up wrong version of cudnn

mfshiu · 2020-12-02T13:57:31Z

It works after I update the tensorflow version from 1.13.1 to 1.14.

My cuda version is 10.0, cudnn version is 7.6.3, the gpu is RTX2080

But my tensorflow is 1.15, cuda is 10.0, gpu is RTX 3080, still have the same issue.

kartikwar · 2020-12-02T18:15:42Z

hey @mfshiu maybe you can try cuda 10.0 with tensorflow-gpu 1.14

allenyllee · 2020-12-04T13:05:39Z

hi @mfshiu, NVIDIA maintains its own version of tensorflow 1.15 here: https://github.com/NVIDIA/tensorflow#install , which support latest gpu card.

So, you need to remove official tensorflow which installed through pip or conda, and install nvidia's version, as its README.md says:

install the NVIDIA wheel index:

$ pip install --user nvidia-pyindex

install the current NVIDIA Tensorflow release:

$ pip install --user nvidia-tensorflow[horovod]

after installed, just use it as regular tensorflow:

import tensorflow as tf

drscotthawley · 2021-02-02T20:19:13Z

Hey @allenyllee I wonder if you might be able to clarify or help: When I follow those install instructions for the NVIDIA-tensorflow, I get a long error that tells me...to re-do what I just did?

$ pip install --user nvidia-pyindex
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting nvidia-pyindex
  Downloading nvidia-pyindex-1.0.6.tar.gz (6.7 kB)
Building wheels for collected packages: nvidia-pyindex
  Building wheel for nvidia-pyindex (setup.py) ... done
  Created wheel for nvidia-pyindex: filename=nvidia_pyindex-1.0.6-py3-none-any.whl size=4171 sha256=692df4078194418f4812516403399f2e96373ad780b93c98ce944b5f02efb35d
  Stored in directory: /tmp/pip-ephem-wheel-cache-kpx26e3z/wheels/52/31/c8/db9f8939a8bb1f3500ce81b630604cbfa6e31f82c8f1bd914d
Successfully built nvidia-pyindex
Installing collected packages: nvidia-pyindex
Successfully installed nvidia-pyindex-1.0.6

$ pip install --user nvidia-tensorflow[horovod]
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting nvidia-tensorflow[horovod]
  Downloading nvidia-tensorflow-0.0.1.dev4.tar.gz (3.8 kB)
    ERROR: Command errored out with exit status 1:
     command: /home/shawley/anaconda3/envs/spnet/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-yv_vnm57/nvidia-tensorflow/setup.py'"'"'; __file__='"'"'/tmp/pip-install-yv_vnm57/nvidia-tensorflow/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-1hvhhg4h
         cwd: /tmp/pip-install-yv_vnm57/nvidia-tensorflow/
    Complete output (17 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-yv_vnm57/nvidia-tensorflow/setup.py", line 150, in <module>
        raise RuntimeError(open("ERROR.txt", "r").read())
    RuntimeError:
    ###########################################################################################
    The package you are trying to install is only a placeholder project on PyPI.org repository.
    This package is hosted on NVIDIA Python Package Index.
    
    This package can be installed as:
    ```
    $ pip install nvidia-pyindex
    $ pip install nvidia-tensorflow
    ```
    
    Please refer to NVIDIA instructions: https://github.com/NVIDIA/tensorflow#install.
    ###########################################################################################
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Re-running those "This package can be installed as:" commands just results in the same error message again.

drscotthawley · 2021-02-03T00:45:18Z

Resolved this issue for myself: Be sure you're running Python 3.8 and Pip 20 or later.

GuillaumeMougeot · 2021-04-08T19:56:30Z

I had the same problem with an RTX 3090 + TF 1.15. I resolved my problem by using the official nvidia+tf1 ngc docker container, available here: https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow

seongyeop-jeong-poey · 2021-06-02T01:28:42Z

I had the same problem with an RTX 3090 + TF 1.15. I resolved my problem by using the official nvidia+tf1 ngc docker container, available here: https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow

It works very well to me, in my case with RTX 3090 +TF 1.15, nvidia+tf1 ngc docker container version '21.05-tf1-py3' works very well! Thanks alot.

bing-0906 · 2021-07-13T09:25:19Z

It works after I update the tensorflow version from 1.13.1 to 1.14.
My cuda version is 10.0, cudnn version is 7.6.3, the gpu is RTX2080

But my tensorflow is 1.15, cuda is 10.0, gpu is RTX 3080, still have the same issue.

me too!!!!!!. have you solved this problem?

seongyeop-jeong-poey · 2021-07-14T00:31:31Z

It works after I update the tensorflow version from 1.13.1 to 1.14.
My cuda version is 10.0, cudnn version is 7.6.3, the gpu is RTX2080

But my tensorflow is 1.15, cuda is 10.0, gpu is RTX 3080, still have the same issue.

me too!!!!!!. have you solved this problem?

please find a version that matches your GPU version in nvidia-docker hub

kwshh · 2021-12-23T03:16:04Z

i found the same question on a10 GPU, that 30-, a10, a100, etc. which compute capacity is more than 8.0 must use CUDA11.x, so you could't use tensorflow1.x which match CUDA10 or lower.
some solution is that, use nvidia-tensorflow1.x and could use CUDA11.x to accelerate. download here: https://github.com/NVIDIA/tensorflow#install
thanks to @allenyllee.

serdarildercaglar · 2022-07-16T21:47:55Z

Problem fixed after installed
!pip install nvidia-pyindex
!pip install nvidia-tensorflow

Fay-why · 2022-08-17T10:50:29Z

hi @mfshiu, NVIDIA maintains its own version of tensorflow 1.15 here: https://github.com/NVIDIA/tensorflow#install , which support latest gpu card.

So, you need to remove official tensorflow which installed through pip or conda, and install nvidia's version, as its README.md says:

install the NVIDIA wheel index:
$ pip install --user nvidia-pyindex
install the current NVIDIA Tensorflow release:
$ pip install --user nvidia-tensorflow[horovod]
after installed, just use it as regular tensorflow:
import tensorflow as tf

It works for me!!! Thanks a lot~ The tf version of NVIDA is 1.15, but luckily my codes can run successfully on tf==1.15~
Btw,my error environment are "tf==1.12.0, 3090, cuda==9.0, ubuntu20.04".

Fay-why · 2022-08-17T10:53:52Z

Problem fixed after installed !pip install nvidia-pyindex !pip install nvidia-tensorflow

Thanks! It works for me~

qingjiesjtu · 2022-10-31T12:44:45Z

Cool!! It fixes perfectly my issue! Thanks!

Guo986 · 2023-01-29T04:41:24Z

Yes! Yes!!!
Remove official tensorflow. Python3.8

pip install nvidia-pyindex
pip install nvidia-tensorflow

I used A6000, tf1.15, cuda10.0.130, cudnn7.3.1, and TF website let me use python 3.6 or 3.7, that's what I did before.
But!!!
For using nvidia-pyindex and nvidia-tensorflow, I need to change python to 3.8.
And I succeed!!!

wowo68 · 2023-01-29T04:41:52Z

这是来自QQ邮箱的假期自动回复邮件。你好，我最近正在休假中，无法亲自回复你的邮件。我将在假期结束后，尽快给你回复。

zhang159560293 · 2024-03-13T07:02:35Z

hi @mfshiu, NVIDIA maintains its own version of tensorflow 1.15 here: https://github.com/NVIDIA/tensorflow#install , which support latest gpu card.你好，NVIDIA在这里维护自己的tensorflow 1.15版本：https://github.com/NVIDIA/tensorflow#install，它支持最新的gpu卡。

So, you need to remove official tensorflow which installed through pip or conda, and install nvidia's version, as its README.md says:因此，您需要删除通过pip或conda安装的官方tensorflow，并安装nvidia的版本，如其README.md所述：

install the NVIDIA wheel index:安装 NVIDIA 轮索引：
$ pip install --user nvidia-pyindex
install the current NVIDIA Tensorflow release:安装当前的 NVIDIA Tensorflow 版本：
$ pip install --user nvidia-tensorflow[horovod]
after installed, just use it as regular tensorflow:安装后，只需将其用作常规张量流即可：
import tensorflow as tf

Thanks! Very Thanks! It has solved my problems.
InternalError (see above for traceback): Blas xGEMMBatched launch failed : a.shape=[128,3,3], b.shape=[128,3,3], m=3, n=3, k=3, batch_size=128
[[node rotation/MatMul_1 ...... = BatchMatMul[T=DT_DOUBLE, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](rotation/concat_7, rotation/concat_7)]]
[[{{node gradients/decoder/dgcnn_trans_fc1/MatMul_grad/tuple/control_dependency_1/_171}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2202_...pendency_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

By the way, my device is A6000 and 4090 all have this problem, and now solved it , my tensorflow is 1.12.0. cuda is 9.0

wowo68 · 2024-03-13T07:03:10Z

这是来自QQ邮箱的假期自动回复邮件。你好，我最近正在休假中，无法亲自回复你的邮件。我将在假期结束后，尽快给你回复。

ouening closed this as completed Apr 9, 2019

albert597 mentioned this issue Feb 12, 2020

Blas SGEMM launch failed when predicting with GPU albert597/TRAILMAP#4

Open

allenyllee mentioned this issue Dec 4, 2020

Training error on rtx 3090 iperov/DeepFaceLab#910

Open

codename1995 mentioned this issue Aug 9, 2021

failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED QingyongHu/SensatUrban#25

Open

failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED #332

failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED #332

Comments

ouening commented Jan 27, 2019

ShuteLee commented Mar 18, 2019

bingoxumo commented Mar 20, 2019

ouening commented Mar 22, 2019

tak-s commented Mar 23, 2019

ouening commented Mar 24, 2019

HanGaaaaa commented Mar 29, 2019

ShuteLee commented Apr 1, 2019 via email

S0soo commented Apr 9, 2019

ouening commented Apr 9, 2019

guolihong commented Aug 4, 2019

ShuteLee commented Aug 4, 2019

zhixuanli commented Sep 29, 2019

yuanzhedong commented Oct 24, 2019

checko commented Dec 3, 2019

yuanzhedong commented Dec 4, 2019

xiaohai-AI commented Apr 1, 2020

jinmingteo commented Apr 21, 2020

kartikwar commented Apr 29, 2020 • edited

mfshiu commented Dec 2, 2020

kartikwar commented Dec 2, 2020

allenyllee commented Dec 4, 2020

drscotthawley commented Feb 2, 2021

drscotthawley commented Feb 3, 2021

GuillaumeMougeot commented Apr 8, 2021

seongyeop-jeong-poey commented Jun 2, 2021

bing-0906 commented Jul 13, 2021

seongyeop-jeong-poey commented Jul 14, 2021

kwshh commented Dec 23, 2021

serdarildercaglar commented Jul 16, 2022

Fay-why commented Aug 17, 2022 • edited

Fay-why commented Aug 17, 2022

qingjiesjtu commented Oct 31, 2022

Guo986 commented Jan 29, 2023

wowo68 commented Jan 29, 2023 via email

zhang159560293 commented Mar 13, 2024

wowo68 commented Mar 13, 2024 via email

kartikwar commented Apr 29, 2020 •

edited

Fay-why commented Aug 17, 2022 •

edited