Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Ray Head access to extra GPU resources #2098

Open
2 tasks done
shaowei-su opened this issue Apr 23, 2024 · 1 comment
Open
2 tasks done

[Bug] Ray Head access to extra GPU resources #2098

shaowei-su opened this issue Apr 23, 2024 · 1 comment
Assignees
Labels
bug Something isn't working gpu

Comments

@shaowei-su
Copy link

Search before asking

  • I searched the issues and found no similar issues.

KubeRay Component

ray-operator

What happened + What you expected to happen

If Ray head node is scheduled on GPU node with no GPU resource requested, e.g

      resources:
        limits:
          ephemeral-storage: 10Gi
          memory: 16Gi
        requests:
          cpu: '4'
          ephemeral-storage: 10Gi
          memory: 16Gi

Ray resource scheduler can still access those GPUs accidentally and considered the entire host GPU as "Logical Resources" during scheduling.

Screenshot 2024-04-23 at 16 39 18
Screenshot 2024-04-23 at 16 39 11

Reproduction script

Use RayJob CRD to scheduled both head and workers on the same physical host with > 1 GPUs.

Anything else

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@kevin85421
Copy link
Member

This is not a KubeRay-specific issue. See https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/gpu.html#gpu-multi-tenancy for more details. Recently, GPU UX on K8s seems to have improved. I will take a look at MIG and time-slicing GPU and get back to you.

@kevin85421 kevin85421 added go Pull requests that update Go code gpu and removed go Pull requests that update Go code rayjob labels Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gpu
Projects
None yet
Development

No branches or pull requests

2 participants