Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current nvidia-container-toolkit not working with current nvidia-drivers (435.21 or 440.31) #1

Closed
ahmedmagdiosman opened this issue Nov 25, 2019 · 8 comments

Comments

@ahmedmagdiosman
Copy link

ahmedmagdiosman commented Nov 25, 2019

See pop-os/nvidia-graphics-drivers#31 and NVIDIA/nvidia-docker#1114 .

Current workaround: use NVIDIA's official container-toolkit.

Steps:

1. Add the repo

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update

2. Then pin the repo by creating:

/etc/apt/preferences.d/nvidia-docker-pin-1002

Package: *
Pin: origin nvidia.github.io 
Pin-Priority: 1002

3. Install the driver

sudo apt install nvidia-driver-440

4. Restart your machine

5. Install nvidia-container-toolkit and restart docker

sudo apt install nvidia-container-toolkit

systemctl restart docker
@wilderrodrigues
Copy link

I followed your steps and got my RTX-2080 working just fine. Thanks a lot!

@hemingchen
Copy link

Had the same issue with my GTX-1080Ti, followed above steps and it now works fine. Thanks!

@gussmith
Copy link

gussmith commented Mar 28, 2020

Had the same issue with RTX2070 on Pop_OS.
I only did this after pinning the NVIDIA as shown above:
sudo apt install nvidia-container-toolkit
And it is now, thank god! working finally!
Note: I did NOT have to reinstall the NVIDIA drivers

@ids1024
Copy link
Member

ids1024 commented Oct 7, 2020

The version packaged by Pop!_OS has been updated, so this should be fixed now.

@dysonsphere-startmail
Copy link

dysonsphere-startmail commented Nov 13, 2020

curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
is met with:

# Unsupported distribution!
# Check https://nvidia.github.io/nvidia-docker

cd /etc/os-release returns:
bash: cd: /etc/os-release: Not a directory

@mmstick
Copy link
Member

mmstick commented Nov 13, 2020

@dysonsphere-startmail That's not our repository. This package is in the Pop PPA

@dysonsphere-startmail
Copy link

Ah Ok so there must be something else going on wrtong trying to run a container.
docker run --gpus all nvidia/cuda:10.0-base nvidia-smi seems to indicate everything is fine running the Pop!_OS nvidia-docker

Fri Nov 13 17:43:29 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.28       Driver Version: 455.28       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce MX130       Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   55C    P0    N/A /  N/A |    423MiB /  2004MiB |      6%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|

but trying to run my container:
GPU=1 bash ./dlc-docker run -d -p 2351:8888 -e USER_HOME=$HOME/DeepLabCut --name containername dlc_username/dlcdocker
I get an error:

ret run -d -p 2351:8888 -e USER_HOME=/home/m3coordinator/DeepLabCut --name containername dlc_username/dlcdocker run
1st CMD run

* Setting user name to:             m3coordinator
* Setting user ID to:               1000
* Setting user groups to:           sudo
* Setting password to:              pw
* Setting Notebook port binding to: 2351 (to set manually add -p 2351:8888 as flag)

You can now open the notebook on the host machine by directing your browser to

    http://localhost:2351

1 nvidia-docker run -v /home/m3coordinator:/home/m3coordinator -p 46826:22 -p 2351:8888 -e USER_GROUPS=sudo -e USER=m3coordinator -e USER_ID=1000 -e USER_ENCRYPTED_PASSWORD=aa5V9MSdgw5ec -e USER_HOME=/home/m3coordinator -e GPU=1 -e DISPLAY=:1 -d -e USER_HOME=/home/m3coordinator/DeepLabCut --name containername dlc_username/dlcdocker
./dlc-docker: line 179: nvidia-docker: command not found

I suppose I should look elsewhere for help on this, right?
Thanks

@mmstick
Copy link
Member

mmstick commented Jan 20, 2022

Closing because this was 3 years ago

@mmstick mmstick closed this as completed Jan 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants