Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CDI --device support to --oci mode #1394

Closed
dtrudg opened this issue Mar 1, 2023 · 4 comments · Fixed by #1459
Closed

Add CDI --device support to --oci mode #1394

dtrudg opened this issue Mar 1, 2023 · 4 comments · Fixed by #1459
Assignees
Labels
enhancement New feature or request roadmap Features / changes that are scheduled to be implemented

Comments

@dtrudg
Copy link
Member

dtrudg commented Mar 1, 2023

Is your feature request related to a problem? Please describe.

SingularityCE doesn't currently support the new CDI standard for making hardware devices available in containers.

The --oci experimental runtime mode currently exposes only a naive binding approach to add devices and libraries to a container, via the --nv and --oci flags.

The --nv and --oci approach cannot support a range of functionality that is valuable, such as masking specific GPUs, introducing only subsets of device functionality into a container etc. In addition it is vendor specific, but we'd like to support e.g. Intel GPUs #1094

Describe the solution you'd like

The experimental --oci runtime mode should support a --device flag, allowing the user to request a device is made available in the container using a CDI configuration.

Additional context

Because --oci mode is based on OCI runtime spec generation, and invocation of runc / crun, it should be simple to implement support via spec modification carried out in the CDI module - https://github.com/container-orchestrated-devices/container-device-interface/tree/main/pkg/cdi

@dtrudg dtrudg added enhancement New feature or request roadmap Features / changes that are scheduled to be implemented labels Mar 1, 2023
@dtrudg dtrudg added this to the SingularityCE 4.0 milestone Mar 1, 2023
@elezar
Copy link
Contributor

elezar commented Mar 1, 2023

Thanks for creating this @dtrudg. This is similar to the Podman interoperability where I recently updated the support.

The injection there happens here and is triggered if a device path is a fully-qualified CDI device name (e.g. nvidia.com/gpu=0) and is handled by the cdi package. Specifically the:

		_, err := registry.InjectDevices(g.Config, devicePath)

call. The registry scans configured directories for CDI specs and matches the requested devices to these.

The processing of the device flag happens here.

Note that this does not cover CDI spec generation -- which is vendor specific -- and is kept out-of-band of spec consumption for OCI runtime spec (or other) modification.

@ArangoGutierrez
Copy link
Contributor

++

@dtrudg
Copy link
Member Author

dtrudg commented Mar 14, 2023

@preminger - the steps for implementing this are roughly as follows...

At some point in the code there will need to be some parsing / checks of the values provided to the --devices flag to ensure that they are all CDI devices, as we don't support non-CDI --devices in Singularity (at this time). There's useful context in the podman code linked above.

Initially, you can test this without an NVIDIA GPU, by writing a test CDI configuration file, that binds some arbitrary device and libraries into the container.

Later we'll have to test it on a system with an NVIDIA GPU, and write an e2e GPU test for it.

Before starting, it'd be useful to read about, and experiment with, the functional options pattern - https://dave.cheney.net/2014/10/17/functional-options-for-friendly-apis

We are trying to use this pattern for any area of the code that might be expose as a public API, as it allows extending the API easily without breaking things.

@elezar
Copy link
Contributor

elezar commented Mar 14, 2023

WIth regards to:

At some point in the code there will need to be some parsing / checks of the values provided to the --devices flag to ensure that they are all CDI devices, as we don't support non-CDI --devices in Singularity (at this time). There's useful context in the podman code linked above.

This is done in podman by processing the list of devices here. Note that the IsQualifiedName() function from the cdi package is used to check whether the specified device is a fully-qualified device name. This should be a sufficient check for parsing / validating the --device arguments.

When performing injection as is done here a CDI registry is created and the fully-qualified devices are passed for injection. If these devices are not known (i.e. not present in any CDI specification loaded by the registry from the configured folders), then the InjectDevices() call will fail.

In terms of configuration, for podman there is no additional configuration required and the default paths (/etc/cdi and /var/run/cdi` are used when constructing the registry).

For cri-o, the default spec directories are used, but these can be overridden if required with the actual values used to instantiate the registry.

For containerd CDI must explicitly be enabled and the paths can be specified in the config file as such the registry instantiation looks a little different. Note that the registry is handled as a singleton and therefore the registry that is used for injection should have the same properties when used.

In terms of test devices, cri-o uses /dev/loop* devices for its tests and what I often do is create a CDI Device that only sets an environment variable. This will then not be dependent on device nodes being present on a system. See for example a PR proposing adding CDI support to moby.

Feel free to reach out if you have any questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request roadmap Features / changes that are scheduled to be implemented
Projects
None yet
4 participants