Skip to content

Commit

Permalink
Extend nvidia runtime options
Browse files Browse the repository at this point in the history
This introduces an additional 3 configuration keys to control the
libnvidia-container integration:

 - nvidia.driver.capabilities (maps to NVIDIA_DRIVER_CAPABILITIES)
 - nvidia.require.cuda (maps to NVIDIA_REQUIRE_CUDA)
 - nvidia.require.driver (maps to NVIDIA_REQUIRE_DRIVER)

Details on the valid values for those options can be found in the NVIDIA
documentation here:

  https://github.com/NVIDIA/nvidia-container-runtime

Signed-off-by: Stéphane Graber <stgraber@ubuntu.com>
  • Loading branch information
stgraber committed Sep 12, 2018
1 parent a732506 commit 2325ba2
Show file tree
Hide file tree
Showing 6 changed files with 44 additions and 4 deletions.
8 changes: 8 additions & 0 deletions doc/api-extensions.md
Expand Up @@ -585,3 +585,11 @@ This introduces the config keys `candid.domains` and `candid.expiry`. The
former allows specifying allowed/valid Candid domains, the latter makes the
macaroon's expiry configurable. The `lxc remote add` command now has a
`--domain` flag which allows specifying a Candid domain.

## nvidia\_runtime\_config
This introduces a few extra config keys when using nvidia.runtime and the libnvidia-container library.
Those keys translate pretty much directly to the matching nvidia-container environment variables:

- nvidia.driver.capabilities => NVIDIA\_DRIVER\_CAPABILITIES
- nvidia.require.cuda => NVIDIA\_REQUIRE\_CUDA
- nvidia.require.driver => NVIDIA\_REQUIRE\_DRIVER
3 changes: 3 additions & 0 deletions doc/containers.md
Expand Up @@ -57,7 +57,10 @@ linux.kernel\_modules | string | - | yes
migration.incremental.memory | boolean | false | yes | migration\_pre\_copy | Incremental memory transfer of the container's memory to reduce downtime.
migration.incremental.memory.goal | integer | 70 | yes | migration\_pre\_copy | Percentage of memory to have in sync before stopping the container.
migration.incremental.memory.iterations | integer | 10 | yes | migration\_pre\_copy | Maximum number of transfer operations to go through before stopping the container.
nvidia.driver.capabilities | string | all | no | nvidia\_runtime\_config | What driver capabilities the container needs (sets libnvidia-container NVIDIA\_DRIVER\_CAPABILITIES)
nvidia.runtime | boolean | false | no | nvidia\_runtime | Pass the host NVIDIA and CUDA runtime libraries into the container
nvidia.require.cuda | string | - | no | nvidia\_runtime\_config | Version expression for the required CUDA version (sets libnvidia-container NVIDIA\_REQUIRE\_CUDA)
nvidia.require.driver | string | - | no | nvidia\_runtime\_config | Version expression for the required driver version (sets libnvidia-container NVIDIA\_REQUIRE\_DRIVER)
raw.apparmor | blob | - | yes | - | Apparmor profile entries to be appended to the generated profile
raw.idmap | blob | - | no | id\_map | Raw idmap configuration (e.g. "both 1000 1000")
raw.lxc | blob | - | no | - | Raw LXC configuration to be appended to the generated one
Expand Down
30 changes: 27 additions & 3 deletions lxd/container_lxc.go
Expand Up @@ -1229,9 +1229,33 @@ func (c *containerLXC) initLXC(config bool) error {
return err
}

err = lxcSetConfigItem(cc, "lxc.environment", "NVIDIA_DRIVER_CAPABILITIES=compute,utility")
if err != nil {
return err
nvidiaDriver := c.expandedConfig["nvidia.driver.capabilities"]
if nvidiaDriver == "" {
err = lxcSetConfigItem(cc, "lxc.environment", "NVIDIA_DRIVER_CAPABILITIES=all")
if err != nil {
return err
}
} else {
err = lxcSetConfigItem(cc, "lxc.environment", fmt.Sprintf("NVIDIA_DRIVER_CAPABILITIES=%s", nvidiaDriver))
if err != nil {
return err
}
}

nvidiaRequireCuda := c.expandedConfig["nvidia.require.cuda"]
if nvidiaRequireCuda == "" {
err = lxcSetConfigItem(cc, "lxc.environment", fmt.Sprintf("NVIDIA_REQUIRE_CUDA=%s", nvidiaRequireCuda))
if err != nil {
return err
}
}

nvidiaRequireDriver := c.expandedConfig["nvidia.require.driver"]
if nvidiaRequireDriver == "" {
err = lxcSetConfigItem(cc, "lxc.environment", fmt.Sprintf("NVIDIA_REQUIRE_DRIVER=%s", nvidiaRequireDriver))
if err != nil {
return err
}
}

err = lxcSetConfigItem(cc, "lxc.hook.mount", hookPath)
Expand Down
1 change: 1 addition & 0 deletions scripts/bash/lxd-client
Expand Up @@ -82,6 +82,7 @@ _have lxc && {
limits.memory.swap limits.memory.swap.priority limits.network.priority \
limits.processes linux.kernel_modules migration.incremental.memory \
migration.incremental.memory.goal nvidia.runtime \
nvidia.driver.capabilities nvidia.require.cuda nvidia.require.driver \
migration.incremental.memory.iterations raw.apparmor raw.idmap raw.lxc \
raw.seccomp security.idmap.base security.idmap.isolated \
security.idmap.size security.devlxd security.devlxd.images \
Expand Down
5 changes: 4 additions & 1 deletion shared/container.go
Expand Up @@ -206,7 +206,10 @@ var KnownContainerConfigKeys = map[string]func(value string) error{
"migration.incremental.memory.iterations": IsUint32,
"migration.incremental.memory.goal": IsUint32,

"nvidia.runtime": IsBool,
"nvidia.runtime": IsBool,
"nvidia.driver.capabilities": IsAny,
"nvidia.require.cuda": IsAny,
"nvidia.require.driver": IsAny,

"security.nesting": IsBool,
"security.privileged": IsBool,
Expand Down
1 change: 1 addition & 0 deletions shared/version/api.go
Expand Up @@ -123,6 +123,7 @@ var APIExtensions = []string{
"candid_authentication",
"backup_compression",
"candid_config",
"nvidia_runtime_config",
}

// APIExtensionsCount returns the number of available API extensions.
Expand Down

0 comments on commit 2325ba2

Please sign in to comment.