# Visualizing GPU Resource Utilization with PyNVML and Bokeh

- **Author:** Rick Zamora
- **Date:** 5/15/2019

### Introduction



### Base Envirnonment Setup

In order to visualize GPU utilization for this demo, we start by createing a base conda environment with [RAPIDS](https://rapids.ai/) and [Jupyter](https://jupyter.org/) packages:
```
conda create --name bokeh-pynvml \
    -c defaults -c nvidia -c rapidsai \
    -c pytorch -c numba -c conda-forge \
    cudf=0.7 cuml=0.7 python=3.7 cudatoolkit=9.2 \
    nodejs jupyterlab dask dask-cudf dask-cuda bokeh -y
conda activate bokeh-pynvml
```

Note that I am personally using a DGX machine with eight V100 NVIDIA GPUs for the development of this demo (`Ubuntu 16.04.5 LTS (GNU/Linux 4.4.0-135-generic x86_64`).

Before or after activating our base conda environment, we should also choose a specific root-directory location for this demo:
```
export demo_home='/home/nfs/rzamora/workspace/pynvml-bokeh-demo'
mkdir $demo_home; cd $demo_home
```

### Python Bindings for the NVIDIA Management Library (PyNVML)

PyNVML is a python wrapper for the [NVIDIA Management Library (NVML)](https://developer.nvidia.com/nvidia-management-library-nvml), which is a C-based API for monitoring and managing various states of NVIDIA GPU devices. NVML is directly used by the better-known [NVIDIA System Management Interface](https://developer.nvidia.com/nvidia-system-management-interface) (`nvidia-smi`). According to the NVIDA developer site, NVML provides access to the following query-able states (in additional to modifiable states not discussed here):

- **ECC error counts**: Both correctable single bit and detectable double bit errors are reported. Error counts are provided for both the current boot cycle and for the lifetime of the GPU.
- **GPU utilization**: Current utilization rates are reported for both the compute resources of the GPU and the memory interface.
- **Active compute process**: The list of active processes running on the GPU is reported, along with the corresponding process name/id and allocated GPU memory.
- **Clocks and PState**: Max and current clock rates are reported for several important clock domains, as well as the current GPU performance state.
- **Temperature and fan speed**: The current core GPU temperature is reported, along with fan speeds for non-passive products.
- **Power management**: For supported products, the current board power draw and power limits are reported.
- **Identification**: Various dynamic and static information is reported, including board serial numbers, PCI device ids, VBIOS/Inforom version numbers and product names.

Although several different python wrappers for NVML currently exist, I will be using the [PyNVML](https://github.com/gpuopenanalytics/pynvml) package hosted by GoAi on GitHub. This version of PyNVML uses `ctypes` to wrap most of the NVML C API.  For this demo, we will focus on a small subset of the API needed to query real-time GPU-resource utilization:

- `nvmlInit()`: Initialize an NVML profiling session
- `nvmlShutdown()`: Finalize an NVML profiling session
- `nvmlDeviceGetCount()`: Get the number of available NVIDA GPU devices
- `nvmlDeviceGetHandleByIndex()`: Get a handle for a device (given an integer index)
- `nvmlDeviceGetMemoryInfo()`: Get a memory-info object (given a device handle)
- `nvmlDeviceGetUtilizationRates()`: Get a utlization-rate object (given a device handle)
- `nvmlDeviceGetPcieThroughput()`: Get a PCIe-trhoughput object (given a device handle)

For example, to query the current GPU-utilization rate on every available device, the code would look something like this:

```
In [1]: from pynvml import *
In [2]: nvmlInit()
In [3]: ngpus = nvmlDeviceGetCount()
In [4]: for i in range(ngpus):
   ...:     handle = nvmlDeviceGetHandleByIndex(i)
   ...:     gpu_util = nvmlDeviceGetUtilizationRates(handle).gpu
   ...:     print('GPU %d Utilization = %d%%' % (i, gpu_util))
   ...:
GPU 0 Utilization = 43%
GPU 1 Utilization = 0%
GPU 2 Utilization = 15%
GPU 3 Utilization = 0%
GPU 4 Utilization = 36%
GPU 5 Utilization = 0%
GPU 6 Utilization = 0%
GPU 7 Utilization = 11%
```

Of courese, if there is nothing currently running on any of the GPUs, all devices will show 0% utilization. In this demo, we will use simple python code (like in the above example) to query GPU metrics in real time.  To intall [PyNVML](https://github.com/gpuopenanalytics/pynvml) from source:
```
git clone https://github.com/gpuopenanalytics/pynvml.git
cd pynvml
pip install -e .
```

Note that this version of PyNVML is also hosted on [PyPI](https://pypi.org/project/pynvml/) and [Conda Forge](https://anaconda.org/conda-forge/pynvml), so you can alternitively use `pip install pynvml` or `conda install -c conda-forge pynvml` without cloning the repository.


![alt text](pynvml-bokeh-files/pypi-ss.png)
**PyPI page for PyNVML packag**


### A PyNVML Bokeh-Server Example

Although it is pretty cool that we can use python to query the current state of our NVIDIA GPUs, it would be a lot more useful to sumarize the most-important metrics within a single visualization.  In order for the visualization to *paint* a complete/useful picture for GPU users, the NVML data will clearly need to update in real time. 

The good news is that the `server` module within the [Bokeh](https://bokeh.pydata.org/en/latest/) python library provides the perfect solution for this task!  In fact, the process of building programmatic bokeh servers is already nicely outlined in a [great blog post by Matt Rocklin](http://matthewrocklin.com/blog/work/2017/06/28/simple-bokeh-server) (thanks Matt!). 

For this demo, I will be using a fork of the [`jupyterlab-bokeh-server`](https://github.com/ian-r-rose/jupyterlab-bokeh-server), developed by [Ian Rose](https://github.com/ian-r-rose) and [Matt Rocklin](https://github.com/mrocklin).  In my person fork, I started with the `system-resources` branch of the upstream repository.  This branch was a great reference, because it includes the necessary code for visualising CPU resource utilization (which is pretty similar to the code needed to vizualize GPU utilization).

#### Downloading the Bokeh-Server Code

To get the code I added for NVML-metric visualization, clone the `pynvml` branch of [`rjzamora/jupyterlab-bokeh-server`](https://github.com/rjzamora/jupyterlab-bokeh-server):

```
cd $demo_home
git clone https://github.com/rjzamora/jupyterlab-bokeh-server.git -b pynvml
```

#### Running the PyNVML Bokeh Server

Despite the existance `jupyterlab` within the name of the repository used for this demo, I have yet to integrate the server as a jupyterlab extension.  Instead, we can currently use the code by running the `jupyterlab_bokeh_server/server.py` script directly. For example:

```
python $demo_home/jupyterlab-bokeh-server/jupyterlab_bokeh_server/server.py 5000 > server.out 2>&1 &
```

After the bokeh server is launched, you can navigate to `http://<IP>:5000` in your web browser. If everything worked correctly, you will see the following menue page:

![alt text](pynvml-bokeh-files/bokeh-app-ss.png)

##### GPU-Utilization Bar Plot

If you click on the **GPU-Utilization** link listed in the main menue, you will see a bar-chart visualization of the current GPU compute utilization (y-axis scale being 1-100%).  When running an application on the GPUs, the bar levels tend to jump around alot.  For the dask benchmark (discussed below), I see the following output for a single snapshot in time (with other snapshots showing more and less utilization):

![alt text](pynvml-bokeh-files/gpu-utilization-ss.png)

##### GPU-Resources Stacked Line Plot

If you click on the **GPU-Resources** link listed in the main menue, you will see a comprehensive visualization with four stacked line plots. 

- **GPU Utilization (per Device) [%]**: Plot of the GPU-**compute** utilization for each device. Each GPU is plotted with a different color, and the units are in percent.
- **Memory Utilization (per Device)**: Plot of the GPU-**memory** utilization for each device. Each GPU is plotted with a different color, and the units are in GiB.
- **Total Utilization [%]**: Plot of the **total** GPU **memory** and **compute** utilization. Units are in percent.
- **Total PCI Throughput [MB/s]**: Plot of the **total** PCIe **TX** and **RX** data throughput. Units are in MB/s.

When running the dask benchmark (discussed below), I see the following output for a ~10s snapshot in time:

![alt text](pynvml-bokeh-files/gpu-resources-ss.png)

#### Bokeh-Server Code Details

-

### Sample Dask GPU Benchmark