Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Install and use with GPUs #3

Open
ajtarraga opened this issue Mar 8, 2023 · 3 comments
Open

Unable to Install and use with GPUs #3

ajtarraga opened this issue Mar 8, 2023 · 3 comments

Comments

@ajtarraga
Copy link

ajtarraga commented Mar 8, 2023

Hi @mdolz, I am trying to install and configure PyDTNN in a project with several heterogeneous nodes for supercomputing. In this nodes I have several GPUs interconnected via GPUDirect Storage and RDMA.

While I am trying to execute PyDTNN but while I execute the command python3 -Ou pydtnn_benchmark.py --model=vgg16_cifar10 --dataset=cifar10 --dataset_train_path=datasets/cifar-10/cifar-10-batches-bin --dataset_test_path=datasets/cifar-10/cifar-10-batches-bin --evaluate_only=True --batch_size=64 --validation_split=0.2 --weights_and_bias_filename=vgg16-weights-nhwc.npz --tracing=False --profile=False --enable_gpu=True --dtype=float32 (it is the example that you gives in the code), I obtain the next output:
/home/ajtarraga/.local/lib/python3.8/site-packages/skcuda/cublas.py:284: UserWarning: creating CUBLAS context to get version number
warnings.warn('creating CUBLAS context to get version number')
Please, install pycuda, skcuda, and cudnn to be able to use the GPUs!

I have installed pycuda, skcuda and cudnn:
$ pip3 install -r requirements_cuda_2.txt
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pycuda>=2021.1 in /home/ajtarraga/.local/lib/python3.8/site-packages (from -r requirements_cuda_2.txt (line 1)) (2022.2.2)
Requirement already satisfied: scikit-cuda>=0.5.3 in /home/ajtarraga/.local/lib/python3.8/site-packages (from -r requirements_cuda_2.txt (line 2)) (0.5.3)
Requirement already satisfied: nvidia-cudnn>=8.1.1.33 in /home/ajtarraga/.local/lib/python3.8/site-packages (from -r requirements_cuda_2.txt (line 3)) (8.2.0.51)
Requirement already satisfied: appdirs>=1.4.0 in /home/ajtarraga/.local/lib/python3.8/site-packages (from pycuda>=2021.1->-r requirements_cuda_2.txt (line 1)) (1.4.4)
Requirement already satisfied: pytools>=2011.2 in /home/ajtarraga/.local/lib/python3.8/site-packages (from pycuda>=2021.1->-r requirements_cuda_2.txt (line 1)) (2022.1.14)
Requirement already satisfied: mako in /home/ajtarraga/.local/lib/python3.8/site-packages (from pycuda>=2021.1->-r requirements_cuda_2.txt (line 1)) (1.2.4)
Requirement already satisfied: numpy>=1.2.0 in /home/ajtarraga/.local/lib/python3.8/site-packages (from scikit-cuda>=0.5.3->-r requirements_cuda_2.txt (line 2)) (1.24.1)
Requirement already satisfied: wheel in /usr/lib/python3/dist-packages (from nvidia-cudnn>=8.1.1.33->-r requirements_cuda_2.txt (line 3)) (0.34.2)
Requirement already satisfied: setuptools in /home/ajtarraga/.local/lib/python3.8/site-packages (from nvidia-cudnn>=8.1.1.33->-r requirements_cuda_2.txt (line 3)) (65.6.3)
Requirement already satisfied: MarkupSafe>=0.9.2 in /home/ajtarraga/.local/lib/python3.8/site-packages (from mako->pycuda>=2021.1->-r requirements_cuda_2.txt (line 1)) (2.1.1)
Requirement already satisfied: typing-extensions>=4.0 in /home/ajtarraga/.local/lib/python3.8/site-packages (from pytools>=2011.2->pycuda>=2021.1->-r requirements_cuda_2.txt (line 1)) (4.4.0)
Requirement already satisfied: platformdirs>=2.2.0 in /home/ajtarraga/.local/lib/python3.8/site-packages (from pytools>=2011.2->pycuda>=2021.1->-r requirements_cuda_2.txt (line 1)) (3.1.0)

What do you think could be the problem and how can I solve it?

@barrachi
Copy link
Contributor

Hi @ajtarraga,

This PyDTNN error message is issued on the next lines of code:

try:
                import pydtnn.backends.gpu.tensor_gpu
                global gpuarray
                import pycuda.gpuarray as gpuarray
                import pycuda.driver as drv
                from pydtnn.backends.gpu.libs import libcudnn as cudnn
                # noinspection PyUnresolvedReferences
                from skcuda import cublas
except (ImportError, ModuleNotFoundError, OSError):
                supported_cudnn = False
                print("Please, install pycuda, skcuda, and cudnn to be able to use the GPUs!")
                sys.exit(-1)

Because of the cublas message in your output, we can suppose that everything previous to that line of code was right. So the error must be on the cublas from skuda importation. Could you please open an interactive python and issue that line alone? Perhaps the error output will give more information of what is failing. Just in case, the scikit-cuda building and installation guide (https://scikit-cuda.readthedocs.io/en/latest/install.html#building-and-installation) states that scikit-cuda searches for CUDA libraries in the system library search path when imported and tells how to solve this issue.

Please, tell us which message was generated (perhaps we should also issue that error on the output).

Also, as a general recommendation, when using python, it is convenient to isolate the environment you are using from the system provided one, for example, using a virtualenv (https://realpython.com/python-virtual-environments-a-primer/). This way, you can keep control of which libraries have been installed and when they should be upgraded for a given environment (which can be used on several applications).

Best regards,

Sergio Barrachina Mir

@ajtarraga
Copy link
Author

Hi @barrachi

I have cublas installed. I have tried with an interactive python and I have import it correctly. I will show it to you.

$ python3
Python 3.8.10 (default, Nov 14 2022, 12:59:47)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from skcuda import cublas
/home/ajtarraga/.local/lib/python3.8/site-packages/skcuda/cublas.py:284: UserWarning: creating CUBLAS context to get version number
warnings.warn('creating CUBLAS context to get version number')
>>>

Thank you for the recomendation of virtualenv.

What do you think could be the problem while installing? I think I had installed cublas correctly.

@barrachi
Copy link
Contributor

Hi @ajtarraga

Well, that should have been the line that failed, but if it does not, could you please execute the following commands on an interactive shell?

import pydtnn.backends.gpu.tensor_gpu
import pycuda.gpuarray as gpuarray
import pycuda.driver as drv
from pydtnn.backends.gpu.libs import libcudnn as cudnn

Another option could be to create a virtualenv and launch pydtnn from that virtualenv. Do you have the last version of PyDTNN installed? (just in case I'm looking a different version code)

Best regards,

Sergio

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants