Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docker] Add --gpus all for docker when GPU is available #3833

Merged
merged 2 commits into from
Aug 21, 2024

Conversation

Michaelvll
Copy link
Collaborator

@Michaelvll Michaelvll commented Aug 15, 2024

There seems no harm to add --gpus all when GPUs exist, which is a newer version of the option --runtime nvidia and seems docker's official doc is using this: https://docs.docker.com/engine/containers/resource_constraints/#access-an-nvidia-gpu

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: conda deactivate; bash -i tests/backward_compatibility_tests.sh

@Michaelvll Michaelvll marked this pull request as ready for review August 16, 2024 18:30
Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this @Michaelvll ! Will this affect the clouds that are still using sky/skylet/providers/command_runner.py? Those clouds are still using ray's docker configure code path iirc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we remove other --gpus all in j2 files like {azure,gcp,paperspace}-ray.yml.j2?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I just removed them. For those clouds still using node providers, they are not affected by the current PR and will still work as current master, which might be fine?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh actually, should we only remove --gpus all in clouds that support the new provisioner? For other clouds like paperspace, we should keep this in the ray file so the ray docker command runner can use this option to expose GPUs to the docker.

Copy link
Collaborator Author

@Michaelvll Michaelvll Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paperspace is using new provisioner, and it seems none of the clouds using the old provisioned support docker as a runtime? : )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good point! Then maybe we could remove the old docker command runner now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Let's keep it for reference for now. We can probably remove it in a separate PR.

Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this @Michaelvll ! LGTM.

@Michaelvll Michaelvll added this pull request to the merge queue Aug 21, 2024
Merged via the queue into master with commit fa60f7a Aug 21, 2024
20 checks passed
@Michaelvll Michaelvll deleted the add-gpus-all branch August 21, 2024 00:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants