Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker ray doesn’t recognise cuda gpu within container #45674

Closed
stephano41 opened this issue Jun 3, 2024 · 1 comment
Closed

Docker ray doesn’t recognise cuda gpu within container #45674

stephano41 opened this issue Jun 3, 2024 · 1 comment
Labels
core Issues that should be addressed in Ray Core P1 Issue that should be fixed within a few weeks question Just a question :)

Comments

@stephano41
Copy link

What happened + What you expected to happen

I am trying to run Ray with gpu support within a custom docker image and docker compose.

The Dockerfile I am using:

FROM rayproject/ray:latest-py39-cu121

WORKDIR /opt/project

USER root

RUN  sudo apt-get update && \
     apt-get install -y build-essential --no-install-recommends gcc git wget

CMD ["bash"]```
The docker compose file I am using:

services:
app:
build:
context: .
dockerfile: Dockerfile

    container_name: test_ray
    image: test_ray
    volumes:
        - ./:/opt/project/
    tty: true
    stdin_open: true
    shm_size: 12gb
    runtime: nvidia
    environment:
      NVIDIA_VISIBLE_DEVICES: all
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]```

within the container, I can see that nvidia-smi recognises the gpu. However running ray.get_gpu_ids() returns an empty list.

I have tried the following base images with no luck:

rayproject/ray:latest
rayproject/ray:2.20.0.5708e7-py310-cu121
rayproject/ray-ml:latest
The commands I use:

docker compose build
docker compose run app bash
nvidia-smi I can see my cuda version being 12.2
python
import ray
print(ray.get_gpu_ids())

Versions / Dependencies

cuda version 12.2

Reproduction script

The Dockerfile I am using:

FROM rayproject/ray:latest-py39-cu121

WORKDIR /opt/project

USER root

RUN  sudo apt-get update && \
     apt-get install -y build-essential --no-install-recommends gcc git wget

CMD ["bash"]

The docker compose file:

services:
    app:
        build:
            context: .
            dockerfile: Dockerfile
        
        container_name: test_ray
        image: test_ray
        volumes:
            - ./:/opt/project/
        tty: true
        stdin_open: true
        shm_size: 12gb
        runtime: nvidia
        environment:
          NVIDIA_VISIBLE_DEVICES: all
        deploy:
          resources:
            reservations:
              devices:
                - driver: nvidia
                  count: 1
                  capabilities: [gpu]

The commands I used in order:

docker compose build
docker compose run app bash
nvidia-smi I can see my cuda version being 12.2
python
import ray
print(ray.get_gpu_ids())

Issue Severity

High: It blocks me from completing my task.

@stephano41 stephano41 added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jun 3, 2024
@anyscalesam anyscalesam added question Just a question :) core Issues that should be addressed in Ray Core and removed bug Something that is supposed to be working; but isn't labels Jun 3, 2024
@jjyao
Copy link
Collaborator

jjyao commented Jun 3, 2024

https://docs.ray.io/en/latest/ray-core/api/doc/ray.get_gpu_ids.html ray.get_gpu_ids() returns gpus available to a worker process so you need to call it inside a task or actor.

@jjyao jjyao added P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Issues that should be addressed in Ray Core P1 Issue that should be fixed within a few weeks question Just a question :)
Projects
None yet
Development

No branches or pull requests

3 participants