CUDA support doesnot work #21

icookycom · 2022-12-04T06:32:58Z

Hi i have noticied that CUDA is not working? Does DCT-Net suports CUDA calculations?

I had to install first
conda install cudatoolkit=10.1
conda install cudnn

It uses CUDA but with error
2022-12-04 09:31:23.527110: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA RTX A4500, Compute Capability 8.6
2022-12-04 09:31:23.527239: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-04 09:31:23.527364: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:0a:00.0 name: NVIDIA RTX A4500 computeCapability: 8.6
coreClock: 1.65GHz coreCount: 56 deviceMemorySize: 19.70GiB deviceMemoryBandwidth: 596.12GiB/s
2022-12-04 09:31:23.527408: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2022-12-04 09:31:23.528422: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2022-12-04 09:31:23.529403: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2022-12-04 09:31:23.529550: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2022-12-04 09:31:23.530466: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2022-12-04 09:31:23.530985: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2022-12-04 09:31:23.532948: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2022-12-04 09:31:23.533015: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-04 09:31:23.533156: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-04 09:31:23.533241: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2022-12-04 09:31:23.533267: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2022-12-04 09:31:23.552689: E tensorflow/core/common_runtime/session.cc:91] Failed to create session: Internal: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid
2022-12-04 09:31:23.552708: E tensorflow/c/c_api.cc:2184] Internal: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid
Traceback (most recent call last):
File "/home/alexandr/miniforge3/envs/dctnet/lib/python3.8/site-packages/modelscope/utils/registry.py", line 211, in build_from_cfg
return obj_cls(**args)
File "/home/alexandr/miniforge3/envs/dctnet/lib/python3.8/site-packages/modelscope/pipelines/cv/image_cartoon_pipeline.py", line 42, in init
self.facer = FaceAna(self.model)
File "/home/alexandr/miniforge3/envs/dctnet/lib/python3.8/site-packages/modelscope/models/cv/cartoon/facelib/facer.py", line 20, in init
self.face_detector = FaceDetector(model_dir)
File "/home/alexandr/miniforge3/envs/dctnet/lib/python3.8/site-packages/modelscope/models/cv/cartoon/facelib/face_detector.py", line 26, in init
self._graph, self._sess = self.init_model(self.model_path)
File "/home/alexandr/miniforge3/envs/dctnet/lib/python3.8/site-packages/modelscope/models/cv/cartoon/facelib/face_detector.py", line 113, in init_model
model = init_pb(pb_path)
File "/home/alexandr/miniforge3/envs/dctnet/lib/python3.8/site-packages/modelscope/models/cv/cartoon/facelib/face_detector.py", line 105, in init_pb
sess = tf.Session(config=config)
File "/home/alexandr/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1586, in init
super(Session, self).init(target, graph, config=config)
File "/home/alexandr/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 701, in init
self._session = tf_session.TF_NewSessionRef(self._graph._c_graph, opts)
tensorflow.python.framework.errors_impl.InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid

During handling of the above exception, another exception occurred:

menyifang · 2022-12-16T07:09:48Z

it supports both GPU and CPU. Please ensure tensorflow-gpu compatible with cuda version.

onefish51 · 2023-06-07T10:38:19Z

there is something wrong !
you said :

pip install --upgrade tensorflow-gpu==1.15

and the tensorflow official documentation shown :

so cuda is 10.0, and cudnn is 7.4

then

pip install "modelscope[cv]==1.3.2" -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html

install log shown

CuDNN8.5.0 is required install

and then

Loaded runtime CuDNN library: 8.5.0 but source was compiled with: 7.6.4.  CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.

so I failed !

I tested tensorflow-gpu

python 
Python 3.7.16 (default, Jan 17 2023, 22:20:44) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.test.is_gpu_available()
...
True

h3clikejava · 2023-12-10T14:27:27Z

I tested on:
ubuntu 18.04
python 3.7
CUDA 10/10.1/11.2
cudnn 7.4/7.6.0/7.6.1
tensorflow-gup 1.14/1.15
torch 1.7.1+cu101
numpy1.18.5

I always get black result like this:

I have been trying to set up this training environment for three days, but ultimately failed.
The documentation for this project is really terrible, and I don't know if it's because of changes in the company's business, but the development members have stopped maintaining it.
The documentation has inconsistencies in various places regarding the runtime environment.
If there is anyone kind enough who has been able to train successfully, please provide your environment and training scripts. Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA support doesnot work #21

CUDA support doesnot work #21

icookycom commented Dec 4, 2022

menyifang commented Dec 16, 2022

onefish51 commented Jun 7, 2023 •

edited

Loading

h3clikejava commented Dec 10, 2023

CUDA support doesnot work #21

CUDA support doesnot work #21

Comments

icookycom commented Dec 4, 2022

menyifang commented Dec 16, 2022

onefish51 commented Jun 7, 2023 • edited Loading

h3clikejava commented Dec 10, 2023

onefish51 commented Jun 7, 2023 •

edited

Loading