Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA support doesnot work #21

Open
icookycom opened this issue Dec 4, 2022 · 3 comments
Open

CUDA support doesnot work #21

icookycom opened this issue Dec 4, 2022 · 3 comments

Comments

@icookycom
Copy link

Hi i have noticied that CUDA is not working? Does DCT-Net suports CUDA calculations?

I had to install first
conda install cudatoolkit=10.1
conda install cudnn

It uses CUDA but with error
2022-12-04 09:31:23.527110: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA RTX A4500, Compute Capability 8.6
2022-12-04 09:31:23.527239: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-04 09:31:23.527364: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:0a:00.0 name: NVIDIA RTX A4500 computeCapability: 8.6
coreClock: 1.65GHz coreCount: 56 deviceMemorySize: 19.70GiB deviceMemoryBandwidth: 596.12GiB/s
2022-12-04 09:31:23.527408: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2022-12-04 09:31:23.528422: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2022-12-04 09:31:23.529403: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2022-12-04 09:31:23.529550: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2022-12-04 09:31:23.530466: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2022-12-04 09:31:23.530985: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2022-12-04 09:31:23.532948: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2022-12-04 09:31:23.533015: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-04 09:31:23.533156: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-04 09:31:23.533241: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2022-12-04 09:31:23.533267: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2022-12-04 09:31:23.552689: E tensorflow/core/common_runtime/session.cc:91] Failed to create session: Internal: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid
2022-12-04 09:31:23.552708: E tensorflow/c/c_api.cc:2184] Internal: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid
Traceback (most recent call last):
File "/home/alexandr/miniforge3/envs/dctnet/lib/python3.8/site-packages/modelscope/utils/registry.py", line 211, in build_from_cfg
return obj_cls(**args)
File "/home/alexandr/miniforge3/envs/dctnet/lib/python3.8/site-packages/modelscope/pipelines/cv/image_cartoon_pipeline.py", line 42, in init
self.facer = FaceAna(self.model)
File "/home/alexandr/miniforge3/envs/dctnet/lib/python3.8/site-packages/modelscope/models/cv/cartoon/facelib/facer.py", line 20, in init
self.face_detector = FaceDetector(model_dir)
File "/home/alexandr/miniforge3/envs/dctnet/lib/python3.8/site-packages/modelscope/models/cv/cartoon/facelib/face_detector.py", line 26, in init
self._graph, self._sess = self.init_model(self.model_path)
File "/home/alexandr/miniforge3/envs/dctnet/lib/python3.8/site-packages/modelscope/models/cv/cartoon/facelib/face_detector.py", line 113, in init_model
model = init_pb(pb_path)
File "/home/alexandr/miniforge3/envs/dctnet/lib/python3.8/site-packages/modelscope/models/cv/cartoon/facelib/face_detector.py", line 105, in init_pb
sess = tf.Session(config=config)
File "/home/alexandr/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1586, in init
super(Session, self).init(target, graph, config=config)
File "/home/alexandr/.local/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 701, in init
self._session = tf_session.TF_NewSessionRef(self._graph._c_graph, opts)
tensorflow.python.framework.errors_impl.InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid

During handling of the above exception, another exception occurred:

@menyifang
Copy link
Owner

it supports both GPU and CPU. Please ensure tensorflow-gpu compatible with cuda version.

@onefish51
Copy link

onefish51 commented Jun 7, 2023

there is something wrong !
you said :

pip install --upgrade tensorflow-gpu==1.15

and the tensorflow official documentation shown :
image
so cuda is 10.0, and cudnn is 7.4

then

pip install "modelscope[cv]==1.3.2" -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html

install log shown
image
CuDNN8.5.0 is required install

and then

Loaded runtime CuDNN library: 8.5.0 but source was compiled with: 7.6.4.  CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.

so I failed !

I tested tensorflow-gpu

python 
Python 3.7.16 (default, Jan 17 2023, 22:20:44) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.test.is_gpu_available()
...
True

@h3clikejava
Copy link

I tested on:
ubuntu 18.04
python 3.7
CUDA 10/10.1/11.2
cudnn 7.4/7.6.0/7.6.1
tensorflow-gup 1.14/1.15
torch 1.7.1+cu101
numpy1.18.5

I always get black result like this:
image
I have been trying to set up this training environment for three days, but ultimately failed.
The documentation for this project is really terrible, and I don't know if it's because of changes in the company's business, but the development members have stopped maintaining it.
The documentation has inconsistencies in various places regarding the runtime environment.
If there is anyone kind enough who has been able to train successfully, please provide your environment and training scripts. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants