Skip to content

NVIDIA Driver Integration #249

@runleveldev

Description

@runleveldev

Containers running on nodes that support NVIDIA drivers should use the nvidia-container-toolkit to provide proper driver integration. This feature should include

  1. Admin documentation for
    1. Installing nvidia-driver from official Debian repos
    2. Installing nvidia-container-toolkit from NVIDIA sources (link to NVIDIA docs)
    3. Configuring the /usr/share/lxc/hooks/nvida hook script for API use by symlinking to /var/lib/vz/snippets (idk is this the best way?)
  2. Container creator updates to
    1. Identify NVIDIA nodes (boolean in Nodes model? autodetected based on hook script presence?)
    2. Add the NVIDIA_VISIBLE_DEVICES=all and NVIDIA_DRIVER_CAPABILITIES=utility compute environment variables + the hook script to containers created on NVIDIA nodes.
    3. Boolean in Containers model "GPU Required" to enforce being created on a Node with a GPU? (Unnessecary if GPU Nodes are in their own sites, but nessecary if we have mixed sites, would require the boolean in the Nodes model rather than autodetection)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions