Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oci: implemented CDI device mapping #1459

Merged

Conversation

preminger
Copy link
Contributor

Description of the Pull Request (PR):

Added a --device flag to the experimental --oci runtime mode, allowing the user to make a device available in the contained using a CDI configuration.

This fixes or addresses the following GitHub issues:

@preminger preminger requested a review from dtrudg March 20, 2023 14:58
cmd/internal/cli/action_flags.go Outdated Show resolved Hide resolved
internal/pkg/runtime/launcher/oci/launcher_linux.go Outdated Show resolved Hide resolved
internal/pkg/runtime/launcher/oci/spec_linux.go Outdated Show resolved Hide resolved
internal/pkg/runtime/launcher/oci/spec_linux.go Outdated Show resolved Hide resolved
internal/pkg/runtime/launcher/util.go Outdated Show resolved Hide resolved
cmd/internal/cli/action_flags.go Outdated Show resolved Hide resolved
internal/pkg/runtime/launcher/options.go Outdated Show resolved Hide resolved
cmd/internal/cli/action_flags.go Outdated Show resolved Hide resolved
Copy link
Member

@dtrudg dtrudg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty nice so far. As in the comments, let's move the CDI specific stuff to its own file now, even though it's fairly trivial.

My biggest concern is to verify we are handling any 'env' edits from the CDI spec correctly.

We currently handle the environment when we finalize the spec, not create it. This is because the environment depends on the image config... which we don't yet have at the initial spec creation.

I think at present here, a CDI spec won't be able to set an env var, as we are replacing the spec.Process in finalizeSpec?

Will chat in-person about tests, as requested.

Copy link
Contributor

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments from the perspective of CDI. Here I would obviously defer to singulartiy maintainers.

One thing that I was wonder was whether we want to make the use of CDI devices mutually exclusive with other mechanisms for injecting NVIDIA devices? I would argue that this would make sense sicne using them both may lead to hard-to-diagnose problems due to the interactions with the nvidia-container-runtime-hook. This could be implemented as a follow-up though.

internal/pkg/runtime/launcher/options.go Outdated Show resolved Hide resolved
internal/pkg/runtime/launcher/oci/cdi_linux.go Outdated Show resolved Hide resolved
internal/pkg/runtime/launcher/options.go Outdated Show resolved Hide resolved
cmd/internal/cli/action_flags.go Outdated Show resolved Hide resolved
cmd/internal/cli/action_flags.go Outdated Show resolved Hide resolved
cmd/internal/cli/action_flags.go Show resolved Hide resolved
@dtrudg
Copy link
Member

dtrudg commented Mar 21, 2023

One thing that I was wonder was whether we want to make the use of CDI devices mutually exclusive with other mechanisms for injecting NVIDIA devices? I would argue that this would make sense sicne using them both may lead to hard-to-diagnose problems due to the interactions with the nvidia-container-runtime-hook.

I don't think we want to do this, personally... other than perhaps issuing a warning. I'd only support making them mutually exclusive if there's a way to scope it easily to only NVIDIA CDI devices.

I can definitely foresee situations where someone might be on a system with no NVIDIA CDI configuration in place yet, but some other CDI configuration for another device. We see the most unlikely combinations of config on hosts, quite regularly :-)

@elezar
Copy link
Contributor

elezar commented Mar 21, 2023

One thing that I was wonder was whether we want to make the use of CDI devices mutually exclusive with other mechanisms for injecting NVIDIA devices? I would argue that this would make sense sicne using them both may lead to hard-to-diagnose problems due to the interactions with the nvidia-container-runtime-hook.

I don't think we want to do this, personally... other than perhaps issuing a warning. I'd only support making them mutually exclusive if there's a way to scope it easily to only NVIDIA CDI devices.

I can definitely foresee situations where someone might be on a system with no NVIDIA CDI configuration in place yet, but some other CDI configuration for another device. We see the most unlikely combinations of config on hosts, quite regularly :-)

You're right. I forgot about the more general use cases here. Sorry about that.

@preminger preminger force-pushed the 1394-add-cdi-device-support-to-oci-mode branch from 1ae0e83 to 59af212 Compare March 21, 2023 21:01
@preminger preminger force-pushed the 1394-add-cdi-device-support-to-oci-mode branch from 5101af5 to 869542b Compare March 24, 2023 02:18
Copy link
Member

@dtrudg dtrudg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me to merge as the initial CDI implementation. Please make sure it is either merged using 'squash and merge' on GitHub... or rebased to squash the commits first, though.

If there are any additional things you want to do, please ask @wobito or another to re-review while I'm travelling.

We wil handle a bit more e2e-testing later on. There are the GPU tests, in which we can do some stuff to actually test whether an Nvidia GPU is visible in the container... on systems with one of those :-)

@preminger preminger marked this pull request as ready for review March 27, 2023 13:13
@preminger preminger merged commit f148349 into sylabs:main Mar 27, 2023
edytuk pushed a commit to vzokay/apptainer that referenced this pull request Apr 3, 2023
* oci: implemented CDI device mapping

* first batch of post-comments revisions

* revisions following elezar's comments

* added `--device` to CHANGELOG

* fixed out-of-order calls to addCDIDevices & Update

* added sync.Once getting & refreshing CDI registry

* cdi unit-test stub

* added some more cdi unit-tests

* more unit-tests

* better testing for empty mounts lists

* finishing touches on cdi unit-test

* first stab at cdi e2e-test

* DT's temp fix for userns mapping limitation

* ignore VSCode debugging build targets

* renamed CDI json template file

* added initial deviceNode testing

* changed flag to --cdi-dirs (from --cdidirs)

Signed-off-by: Edita Kizinevic <edita.kizinevic@cern.ch>
edytuk pushed a commit to vzokay/apptainer that referenced this pull request Apr 3, 2023
* oci: implemented CDI device mapping

* first batch of post-comments revisions

* revisions following elezar's comments

* added `--device` to CHANGELOG

* fixed out-of-order calls to addCDIDevices & Update

* added sync.Once getting & refreshing CDI registry

* cdi unit-test stub

* added some more cdi unit-tests

* more unit-tests

* better testing for empty mounts lists

* finishing touches on cdi unit-test

* first stab at cdi e2e-test

* DT's temp fix for userns mapping limitation

* ignore VSCode debugging build targets

* renamed CDI json template file

* added initial deviceNode testing

* changed flag to --cdi-dirs (from --cdidirs)

Signed-off-by: Edita Kizinevic <edita.kizinevic@cern.ch>
edytuk pushed a commit to vzokay/apptainer that referenced this pull request Apr 3, 2023
* oci: implemented CDI device mapping

* first batch of post-comments revisions

* revisions following elezar's comments

* added `--device` to CHANGELOG

* fixed out-of-order calls to addCDIDevices & Update

* added sync.Once getting & refreshing CDI registry

* cdi unit-test stub

* added some more cdi unit-tests

* more unit-tests

* better testing for empty mounts lists

* finishing touches on cdi unit-test

* first stab at cdi e2e-test

* DT's temp fix for userns mapping limitation

* ignore VSCode debugging build targets

* renamed CDI json template file

* added initial deviceNode testing

* changed flag to --cdi-dirs (from --cdidirs)

Signed-off-by: Edita Kizinevic <edita.kizinevic@cern.ch>
edytuk pushed a commit to vzokay/apptainer that referenced this pull request May 24, 2023
* oci: implemented CDI device mapping

* first batch of post-comments revisions

* revisions following elezar's comments

* added `--device` to CHANGELOG

* fixed out-of-order calls to addCDIDevices & Update

* added sync.Once getting & refreshing CDI registry

* cdi unit-test stub

* added some more cdi unit-tests

* more unit-tests

* better testing for empty mounts lists

* finishing touches on cdi unit-test

* first stab at cdi e2e-test

* DT's temp fix for userns mapping limitation

* ignore VSCode debugging build targets

* renamed CDI json template file

* added initial deviceNode testing

* changed flag to --cdi-dirs (from --cdidirs)

Signed-off-by: Edita Kizinevic <edita.kizinevic@cern.ch>
edytuk pushed a commit to vzokay/apptainer that referenced this pull request May 24, 2023
* oci: implemented CDI device mapping

* first batch of post-comments revisions

* revisions following elezar's comments

* added `--device` to CHANGELOG

* fixed out-of-order calls to addCDIDevices & Update

* added sync.Once getting & refreshing CDI registry

* cdi unit-test stub

* added some more cdi unit-tests

* more unit-tests

* better testing for empty mounts lists

* finishing touches on cdi unit-test

* first stab at cdi e2e-test

* DT's temp fix for userns mapping limitation

* ignore VSCode debugging build targets

* renamed CDI json template file

* added initial deviceNode testing

* changed flag to --cdi-dirs (from --cdidirs)

Signed-off-by: Edita Kizinevic <edita.kizinevic@cern.ch>
edytuk pushed a commit to vzokay/apptainer that referenced this pull request Jun 14, 2023
* oci: implemented CDI device mapping

* first batch of post-comments revisions

* revisions following elezar's comments

* added `--device` to CHANGELOG

* fixed out-of-order calls to addCDIDevices & Update

* added sync.Once getting & refreshing CDI registry

* cdi unit-test stub

* added some more cdi unit-tests

* more unit-tests

* better testing for empty mounts lists

* finishing touches on cdi unit-test

* first stab at cdi e2e-test

* DT's temp fix for userns mapping limitation

* ignore VSCode debugging build targets

* renamed CDI json template file

* added initial deviceNode testing

* changed flag to --cdi-dirs (from --cdidirs)

Signed-off-by: Edita Kizinevic <edita.kizinevic@cern.ch>
edytuk pushed a commit to vzokay/apptainer that referenced this pull request Jun 14, 2023
* oci: implemented CDI device mapping

* first batch of post-comments revisions

* revisions following elezar's comments

* added `--device` to CHANGELOG

* fixed out-of-order calls to addCDIDevices & Update

* added sync.Once getting & refreshing CDI registry

* cdi unit-test stub

* added some more cdi unit-tests

* more unit-tests

* better testing for empty mounts lists

* finishing touches on cdi unit-test

* first stab at cdi e2e-test

* DT's temp fix for userns mapping limitation

* ignore VSCode debugging build targets

* renamed CDI json template file

* added initial deviceNode testing

* changed flag to --cdi-dirs (from --cdidirs)

Signed-off-by: Edita Kizinevic <edita.kizinevic@cern.ch>
edytuk pushed a commit to vzokay/apptainer that referenced this pull request Jun 14, 2023
* oci: implemented CDI device mapping

* first batch of post-comments revisions

* revisions following elezar's comments

* added `--device` to CHANGELOG

* fixed out-of-order calls to addCDIDevices & Update

* added sync.Once getting & refreshing CDI registry

* cdi unit-test stub

* added some more cdi unit-tests

* more unit-tests

* better testing for empty mounts lists

* finishing touches on cdi unit-test

* first stab at cdi e2e-test

* DT's temp fix for userns mapping limitation

* ignore VSCode debugging build targets

* renamed CDI json template file

* added initial deviceNode testing

* changed flag to --cdi-dirs (from --cdidirs)

Signed-off-by: Edita Kizinevic <edita.kizinevic@cern.ch>
edytuk pushed a commit to vzokay/apptainer that referenced this pull request Jun 16, 2023
* oci: implemented CDI device mapping

* first batch of post-comments revisions

* revisions following elezar's comments

* added `--device` to CHANGELOG

* fixed out-of-order calls to addCDIDevices & Update

* added sync.Once getting & refreshing CDI registry

* cdi unit-test stub

* added some more cdi unit-tests

* more unit-tests

* better testing for empty mounts lists

* finishing touches on cdi unit-test

* first stab at cdi e2e-test

* DT's temp fix for userns mapping limitation

* ignore VSCode debugging build targets

* renamed CDI json template file

* added initial deviceNode testing

* changed flag to --cdi-dirs (from --cdidirs)

Signed-off-by: Edita Kizinevic <edita.kizinevic@cern.ch>
edytuk pushed a commit to vzokay/apptainer that referenced this pull request Jul 4, 2023
* oci: implemented CDI device mapping

* first batch of post-comments revisions

* revisions following elezar's comments

* added `--device` to CHANGELOG

* fixed out-of-order calls to addCDIDevices & Update

* added sync.Once getting & refreshing CDI registry

* cdi unit-test stub

* added some more cdi unit-tests

* more unit-tests

* better testing for empty mounts lists

* finishing touches on cdi unit-test

* first stab at cdi e2e-test

* DT's temp fix for userns mapping limitation

* ignore VSCode debugging build targets

* renamed CDI json template file

* added initial deviceNode testing

* changed flag to --cdi-dirs (from --cdidirs)

Signed-off-by: Edita Kizinevic <edita.kizinevic@cern.ch>
edytuk pushed a commit to vzokay/apptainer that referenced this pull request Jul 11, 2023
* oci: implemented CDI device mapping

* first batch of post-comments revisions

* revisions following elezar's comments

* added `--device` to CHANGELOG

* fixed out-of-order calls to addCDIDevices & Update

* added sync.Once getting & refreshing CDI registry

* cdi unit-test stub

* added some more cdi unit-tests

* more unit-tests

* better testing for empty mounts lists

* finishing touches on cdi unit-test

* first stab at cdi e2e-test

* DT's temp fix for userns mapping limitation

* ignore VSCode debugging build targets

* renamed CDI json template file

* added initial deviceNode testing

* changed flag to --cdi-dirs (from --cdidirs)

Signed-off-by: Edita Kizinevic <edita.kizinevic@cern.ch>
edytuk pushed a commit to vzokay/apptainer that referenced this pull request Jul 21, 2023
* oci: implemented CDI device mapping

* first batch of post-comments revisions

* revisions following elezar's comments

* added `--device` to CHANGELOG

* fixed out-of-order calls to addCDIDevices & Update

* added sync.Once getting & refreshing CDI registry

* cdi unit-test stub

* added some more cdi unit-tests

* more unit-tests

* better testing for empty mounts lists

* finishing touches on cdi unit-test

* first stab at cdi e2e-test

* DT's temp fix for userns mapping limitation

* ignore VSCode debugging build targets

* renamed CDI json template file

* added initial deviceNode testing

* changed flag to --cdi-dirs (from --cdidirs)

Signed-off-by: Edita Kizinevic <edita.kizinevic@cern.ch>
edytuk pushed a commit to vzokay/apptainer that referenced this pull request Jul 24, 2023
* oci: implemented CDI device mapping

* first batch of post-comments revisions

* revisions following elezar's comments

* added `--device` to CHANGELOG

* fixed out-of-order calls to addCDIDevices & Update

* added sync.Once getting & refreshing CDI registry

* cdi unit-test stub

* added some more cdi unit-tests

* more unit-tests

* better testing for empty mounts lists

* finishing touches on cdi unit-test

* first stab at cdi e2e-test

* DT's temp fix for userns mapping limitation

* ignore VSCode debugging build targets

* renamed CDI json template file

* added initial deviceNode testing

* changed flag to --cdi-dirs (from --cdidirs)

Signed-off-by: Edita Kizinevic <edita.kizinevic@cern.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add CDI --device support to --oci mode
3 participants