You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I am trying to set up a virtual environment once logged in the login node. To do that, I created a very simple bash script. The part to reproduce the errors is the following:
Two parts are failing: the installation of the neural renderer and of the MPI-IS Mesh Processing Library. The former fails because Found no NVIDIA driver on your system, the latter because of a permission denied error thrown by shutil.copytree.
Trying to understand how to fix the neural renderer problem I noticed a few things that might be part of the problem:
nvidia-smi seems to be not installed.
pytorch does not detect GPUs when checking if cuda is available (even thought the GPU version seems to be successfully installed)
nvcc --version replies by saying that nvcc is not installed
I assume this is normal in a login node, but I think this might also be the cause of the problem. Is there any workaround? Do you think that including this code in the run script might work?
In the attempt to fix the MPI-IS Mesh Processing problem I set a new default path for temporary directories to my home directory, but for some reason I still have the same issue. I also tried to change the Makefile performing the installation of the package with --user (line 7 of the Makefile), but this doesn't work either because in --user install mode the site-packages are not visible in the virtualenv.
Any guess on how to successfully install these libraries?
PS. I also tried to load the anaconda module and create a conda environment, but I have the exact same issues.
Thanks in advance 🙂
The text was updated successfully, but these errors were encountered:
Can you try running an interactive session on the devel node, the login nodes do not have GPU's or the NVIDIA Stack installed as they are for job submission only.
Just to be sure, is salloc --gres=gpu:1 --partition=devel the right way of accessing a devel node?
I tried to install the libraries from a devel node and the MPI-IS Mesh Processing problem is now solved. However, I still have errors when installing the neural renderer.
No CUDA runtime is found, using CUDA_HOME='/jmain01/apps/cuda/10.1'
.......
.......
The NVIDIA driver on your system is too old (found version 9010).
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: https://pytorch.org to install
a PyTorch version that has been compiled with your version
of the CUDA driver.
Interestingly the installation of pytorch itself seems to be working. Is the neural renderer maybe finding the wrong drivers? I was trying to check the driver version, but nvidia-smi still doesn't work? Is it normal?
Any idea on how to solve the problem?
The following script should be enough to reproduce the error:
Virtual Environment Setup
Hi,
I am trying to set up a virtual environment once logged in the login node. To do that, I created a very simple bash script. The part to reproduce the errors is the following:
Two parts are failing: the installation of the neural renderer and of the MPI-IS Mesh Processing Library. The former fails because
Found no NVIDIA driver on your system
, the latter because of a permission denied error thrown by shutil.copytree.Trying to understand how to fix the neural renderer problem I noticed a few things that might be part of the problem:
nvidia-smi seems to be not installed.
pytorch does not detect GPUs when checking if cuda is available (even thought the GPU version seems to be successfully installed)
nvcc --version replies by saying that nvcc is not installed
I assume this is normal in a login node, but I think this might also be the cause of the problem. Is there any workaround? Do you think that including this code in the run script might work?
In the attempt to fix the MPI-IS Mesh Processing problem I set a new default path for temporary directories to my home directory, but for some reason I still have the same issue. I also tried to change the Makefile performing the installation of the package with --user (line 7 of the Makefile), but this doesn't work either because in --user install mode the site-packages are not visible in the virtualenv.
Any guess on how to successfully install these libraries?
PS. I also tried to load the anaconda module and create a conda environment, but I have the exact same issues.
Thanks in advance 🙂
The text was updated successfully, but these errors were encountered: