Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Error (docker)]: response from daemon: Unknown runtime specified nvidia AND could not select device driver "" with capabilities: [[gpu]]. #324

Open
Luxcium opened this issue Oct 22, 2021 · 11 comments

Comments

@Luxcium
Copy link

Luxcium commented Oct 22, 2021

Docker Error

I am unable to troubleshoot this issue can you let me know what information could be helpful to help me ???

docker: Error response from daemon: Unknown runtime specified nvidia.

❯ REPO=ghcr.io/rapidsai/node
VERSIONS="21.12.00-runtime-node16.10.0-cudagl11.4.2-ubuntu20.04"

# Be sure to pass either the `--runtime=nvidia` or `--gpus` flag!
docker run --rm \
    --runtime=nvidia \
    -e "DISPLAY=$DISPLAY" \
    -v "/etc/fonts:/etc/fonts:ro" \
    -v "/tmp/.X11-unix:/tmp/.X11-unix:rw" \
    -v "/usr/share/fonts:/usr/share/fonts:ro" \
    -v "/usr/share/icons:/usr/share/icons:ro" \
    $REPO:$VERSIONS-demo-amd64 \
    npx @rapidsai/demo-graph
docker: Error response from daemon: Unknown runtime specified nvidia.
See 'docker run --help'.
❯ echo $DISPLAY
:0

docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

❯ REPO=ghcr.io/rapidsai/node
VERSIONS="21.12.00-runtime-node16.10.0-cuda11.4.2-ubuntu20.04"

# Be sure to pass either the `--runtime=nvidia` or `--gpus` flag!
docker run --rm --gpus=0 $REPO:$VERSIONS-cudf-amd64 \
    -p "const {Series, DataFrame} = require('@rapidsai/cudf');\
        new DataFrame({ a: Series.new([0, 1, 2]) }).toString()"
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
@AjayThorve
Copy link
Member

AjayThorve commented Oct 22, 2021

Hey @Luxcium do you have nvidia-docker2 installed on your system?

Might be related to that!

If you do, may be this discussion might help

@Luxcium

This comment has been minimized.

@Luxcium
Copy link
Author

Luxcium commented Oct 22, 2021

I think Fedora team hates people using NVIDIA or NVIDIA team hates people using Fedora

@Luxcium
Copy link
Author

Luxcium commented Oct 22, 2021

Hey @Luxcium do you have nvidia-docker2 installed on your system?

Might be related to that!

If you do, may be this discussion might help

Thanks @AjayThorve
do you know if I can get it except from https://rpms.if-not-true-then-false.com/inttf.repo (link to the blog post)

I use Fedora release 34 (Thirty Four) as shown in the hidden post above ...

@Luxcium
Copy link
Author

Luxcium commented Oct 22, 2021

Screenshot_20211022_193256

I am doing it then...

@Luxcium
Copy link
Author

Luxcium commented Oct 22, 2021

Using nvidia-docker2

I have a new error message now

nvidia-container-cli: container error: cgroup subsystem devices not found: unknown

❯ REPO=ghcr.io/rapidsai/node
VERSIONS="21.12.00-runtime-node16.10.0-cudagl11.4.2-ubuntu20.04"

docker run --rm --runtime=nvidia -e "DISPLAY=$DISPLAY" -v "/etc/fonts:/etc/fonts:ro" \
              -v "/tmp/.X11-unix:/tmp/.X11-unix:rw" -v "/usr/share/fonts:/usr/share/fonts:ro" \ 
              -v "/usr/share/icons:/usr/share/icons:ro" $REPO:$VERSIONS-demo-amd64 npx @rapidsai/demo-graph
docker: Error response from daemon: 
OCI runtime create failed: 
container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: 
Running hook #1:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: 
container error: cgroup subsystem devices not found: unknown.
❯ REPO=ghcr.io/rapidsai/node
VERSIONS="21.12.00-runtime-node16.10.0-cuda11.4.2-ubuntu20.04"

docker run --rm --gpus=0 $REPO:$VERSIONS-cudf-amd64 -p \
        "const {Series, DataFrame} = require('@rapidsai/cudf');\
        new DataFrame({ a: Series.new([0, 1, 2]) }).toString()"
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: 
starting container process caused: process_linux.go:545: container init caused: 
Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: 
container error: cgroup subsystem devices not found: unknown.

@Luxcium

This comment has been minimized.

@Luxcium
Copy link
Author

Luxcium commented Oct 23, 2021

after one hour of googling and trying to find a solution I must admit that I will wait to see if someone could help me here I was looking into the container error: cgroup subsystem devices not found: unknown but maybe I am starting to be blind to solution if you know the solution just let me know or please ask me more details about my system or configuration

@trxcllnt
Copy link
Collaborator

trxcllnt commented Oct 25, 2021

@Luxcium not entirely sure what you've tried, but generally the 3 things you will need (in addition to the driver) are:

I know it's possible to use GPUs in docker in RHEL, because we publish RHEL (Centos) images for the core RAPIDS libraries. Let me know if it still doesn't work after installing the above. I don't have a box with Centos right now, but I could put it on one of my spare machines to test if I need to.

@klueska
Copy link

klueska commented Oct 25, 2021

Please see my comment here about the error of container error: cgroup subsystem devices not found: unknown regarding the lack of cgroupv2 support.

@trxcllnt
Copy link
Collaborator

trxcllnt commented Feb 4, 2022

@Luxcium does this work for you? NVIDIA/nvidia-docker#706 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants