# Visualizing GPU Resource Utilization with PyNVML and Bokeh

- **Author:** Rick Zamora
- **Date:** 5/15/2019

### Introduction



### Base Envirnonment Setup

In order to visualize GPU utilization for this demo, we start by createing a base conda environment with [RAPIDS](https://rapids.ai/) and [Jupyter](https://jupyter.org/) packages:
```
conda create --name bokeh-pynvml \
    -c defaults -c nvidia -c rapidsai \
    -c pytorch -c numba -c conda-forge \
    cudf=0.7 cuml=0.7 python=3.7 cudatoolkit=9.2 \
    nodejs jupyterlab dask dask-cudf dask-cuda bokeh -y
conda activate bokeh-pynvml
```

Note that I am personally using a DGX machine with eight V100 NVIDIA GPUs for the development of this demo (`Ubuntu 16.04.5 LTS (GNU/Linux 4.4.0-135-generic x86_64`).

Before or after activating our base conda environment, we should also choose a specific root-directory location for this demo:
```
export demo_home='/home/nfs/rzamora/workspace/pynvml-bokeh-demo'
mkdir $demo_home; cd $demo_home
```

### Python Bindings for the NVIDIA Management Library (PyNVML)

PyNVML is a python wrapper for the [NVIDIA Management Library (NVML)](https://developer.nvidia.com/nvidia-management-library-nvml), which is a C-based API for monitoring and managing various states of NVIDIA GPU devices. NVML is directly used by the better-known [NVIDIA System Management Interface](https://developer.nvidia.com/nvidia-system-management-interface) (`nvidia-smi`). According to the NVIDA developer site, NVML provides access to the following query-able states (in additional to modifiable states not discussed here):

- **ECC error counts**: Both correctable single bit and detectable double bit errors are reported. Error counts are provided for both the current boot cycle and for the lifetime of the GPU.
- **GPU utilization**: Current utilization rates are reported for both the compute resources of the GPU and the memory interface.
- **Active compute process**: The list of active processes running on the GPU is reported, along with the corresponding process name/id and allocated GPU memory.
- **Clocks and PState**: Max and current clock rates are reported for several important clock domains, as well as the current GPU performance state.
- **Temperature and fan speed**: The current core GPU temperature is reported, along with fan speeds for non-passive products.
- **Power management**: For supported products, the current board power draw and power limits are reported.
- **Identification**: Various dynamic and static information is reported, including board serial numbers, PCI device ids, VBIOS/Inforom version numbers and product names.

Although several different python wrappers for NVML currently exist, I will be using the [PyNVML](https://github.com/gpuopenanalytics/pynvml) package hosted by GoAi on GitHub. This version of PyNVML uses `ctypes` to wrap most of the NVML C API.  For this demo, we will focus on a small subset of the API needed to query real-time GPU-resource utilization:

- `nvmlInit()`: Initialize an NVML profiling session
- `nvmlShutdown()`: Finalize an NVML profiling session
- `nvmlDeviceGetCount()`: Get the number of available NVIDA GPU devices
- `nvmlDeviceGetHandleByIndex()`: Get a handle for a device (given an integer index)
- `nvmlDeviceGetMemoryInfo()`: Get a memory-info object (given a device handle)
- `nvmlDeviceGetUtilizationRates()`: Get a utlization-rate object (given a device handle)
- `nvmlDeviceGetPcieThroughput()`: Get a PCIe-trhoughput object (given a device handle)

```
__
```

# Get GPU count
@pytest.fixture
def ngpus(nvml):
    result = pynvml.nvmlDeviceGetCount()

To intall [PyNVML](https://github.com/gpuopenanalytics/pynvml) from source:
```
git clone https://github.com/gpuopenanalytics/pynvml.git
cd pynvml
pip install -e .
```

Note that this version of PyNVML is also hosted on [PyPI](https://pypi.org/project/pynvml/) and [Conda Forge](https://anaconda.org/conda-forge/pynvml), so you can alternitively use `pip install pynvml` or `conda install -c conda-forge pynvml` without cloning the repository.


![alt text](pynvml-bokeh-files/pypi-ss.png)
**PyPI page for PyNVML packag**


### The Bokeh-Server Example Repository

```
cd $demo_home
git clone https://github.com/rjzamora/jupyterlab-bokeh-server.git
```

### 