Skip to content
collectd exec plugin for collecting nVidia GPU metrics.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
images Added new image. Jul 13, 2017
LICENSE Initial commit Jul 12, 2017 Improved translation. Nov 4, 2017 Fix config file sourcing bug and add some minor improvements May 2, 2019


The project contains collectd CUDA exec plugin for collecting nVidia GPU metrics. The plugin works well with single and multi-GPU machines.


First, make sure that the collectd exec plugin is loaded. Uncomment or add the following line to your collectd.conf:

LoadPlugin exec

Then, add the path to in exec configuration. The configuration file,, is in a separate file and is required by the main script.

<Plugin exec>                                                                   
    Exec some_user "/path/to/"                       

Sample output

Depending on the metrics selection, the plugin will return PUTVAL Plain Text Protocol messages. You can find below the sample output from the server with four TitanX cards.

PUTVAL server.fqdn/cuda-0000:02:00.0/percent-fan_speed interval=10 N:23
PUTVAL server.fqdn/cuda-0000:02:00.0/memory-memory_free interval=10 N:11172
PUTVAL server.fqdn/cuda-0000:02:00.0/temperature-temperature_gpu interval=10 N:32
PUTVAL server.fqdn/cuda-0000:02:00.0/power-power_draw interval=10 N:16.87
PUTVAL server.fqdn/cuda-0000:02:00.0/memory-memory_used interval=10 N:0
PUTVAL server.fqdn/cuda-0000:03:00.0/percent-fan_speed interval=10 N:23
PUTVAL server.fqdn/cuda-0000:03:00.0/memory-memory_free interval=10 N:11172
PUTVAL server.fqdn/cuda-0000:03:00.0/temperature-temperature_gpu interval=10 N:36
PUTVAL server.fqdn/cuda-0000:03:00.0/power-power_draw interval=10 N:17.08
PUTVAL server.fqdn/cuda-0000:03:00.0/memory-memory_used interval=10 N:0
PUTVAL server.fqdn/cuda-0000:83:00.0/percent-fan_speed interval=10 N:23
PUTVAL server.fqdn/cuda-0000:83:00.0/memory-memory_free interval=10 N:11172
PUTVAL server.fqdn/cuda-0000:83:00.0/temperature-temperature_gpu interval=10 N:35
PUTVAL server.fqdn/cuda-0000:83:00.0/power-power_draw interval=10 N:16.88
PUTVAL server.fqdn/cuda-0000:83:00.0/memory-memory_used interval=10 N:0
PUTVAL server.fqdn/cuda-0000:84:00.0/percent-fan_speed interval=10 N:23
PUTVAL server.fqdn/cuda-0000:84:00.0/memory-memory_free interval=10 N:11172
PUTVAL server.fqdn/cuda-0000:84:00.0/temperature-temperature_gpu interval=10 N:42
PUTVAL server.fqdn/cuda-0000:84:00.0/power-power_draw interval=10 N:17.37
PUTVAL server.fqdn/cuda-0000:84:00.0/memory-memory_used interval=10 N:0


Metrics can be added or removed from the config array.

declare -A config=(                                                             

Each entry should be in the following format:


Any query string from nvidia-smi can be a metric_name, but each dot . has to be replaced by an underscore _. For example, temperature.gpu becomes temperature_gpu.

The full list of query options can be obtained with the following command:

nvidia-smi --help-query-gpu

Graphs with Grafana

I store my metrics in the InfluxDB and visualize them with Grafana. You can find below a sample dashboard from one of the servers I administer.


You can’t perform that action at this time.