Skip to content

hashicorp/nomad-device-nvidia

Nomad Nvidia Device Plugin

This repository provides an implementation of a Nomad device plugin for Nvidia GPUs.

Behavior

The Nvidia device plugin uses NVML bindings to get data regarding available Nvidia devices and will expose them via Fingerprint RPC. GPUs can be excluded from fingerprinting by setting the ignored_gpu_ids field (see below). Plugin sends statistics for fingerprinted devices periodically.

The plugin detects whether the GPU has Multi-Instance GPU (MIG) enabled. When enabled all instances will be fingerprinted as individual GPUs that can be addressed accordingly.

Config

The plugin is configured in the Nomad client's plugin block:

plugin "nvidia" {
  config {
    ignored_gpu_ids    = ["uuid1", "uuid2"]
    fingerprint_period = "5s"
  }
}

The valid configuration options are:

  • ignored_gpu_ids (list(string): []): list of GPU UUIDs strings that should not be exposed to nomad
  • fingerprint_period (string: "1m"): interval to repeat the fingerprint process to identify possible changes.