
# Anomalous Behaviour Profiling using GPU statistics
### Authors
* Gorkem Batmaz (NVIDIA)

## Table of Contents
* Introduction
* Dataset
* Training
* Evaluation
* Conclusion
* References

### Introduction
GPUs are used for multiple workloads and often on shared machines. It is important to classify the workloads on GPUs to ensure only allowed activities are taking place. When there is little to no visibility on the application level, one way to do this is to use GPU statistics.

We used `nvidia-smi` outputs to make this classification. `nvidia-smi` (The NVIDIA System Management Interface) is a command-line utility, based on top of the NVIDIA Management Library (NVML), intended to aid in the management and monitoring of NVIDIA GPU devices. We have collected data during machine learning, deep learning, illegitimate crypto mining workloads were running. 

In this notebook, we will use data collected during deep learning and crypto mining workloads on a DGX. 
We will show how to train an XGBoost classification model using RAPIDS that can be saved and used for FIL inference. FIL(Forest Inference Library) is an open-source library in RAPIDS allowing users to accelerate GBDT(Gradient Boosting Decision Tree) and RF(Random Forest) inference with GPUs.
For more information on FIL please visit https://docs.rapids.ai/api/cuml/stable/

## Imports

In [3]:
import xgboost as xgb
import cudf
from cuml.model_selection import train_test_split
from cuml import ForestInference
import sklearn.datasets
import cupy

## Data Ingest

In [2]:
df = cudf.read_json("./labelled_nv_smi.json")



The features we use are:

In [3]:
list(df)

['nvidia_smi_log.timestamp',
 'nvidia_smi_log.gpu.pci.tx_util',
 'nvidia_smi_log.gpu.pci.rx_util',
 'nvidia_smi_log.gpu.fb_memory_usage.used',
 'nvidia_smi_log.gpu.fb_memory_usage.free',
 'nvidia_smi_log.gpu.bar1_memory_usage.total',
 'nvidia_smi_log.gpu.bar1_memory_usage.used',
 'nvidia_smi_log.gpu.bar1_memory_usage.free',
 'nvidia_smi_log.gpu.utilization.gpu_util',
 'nvidia_smi_log.gpu.utilization.memory_util',
 'nvidia_smi_log.gpu.temperature.gpu_temp',
 'nvidia_smi_log.gpu.temperature.gpu_temp_max_threshold',
 'nvidia_smi_log.gpu.temperature.gpu_temp_slow_threshold',
 'nvidia_smi_log.gpu.temperature.gpu_temp_max_gpu_threshold',
 'nvidia_smi_log.gpu.temperature.memory_temp',
 'nvidia_smi_log.gpu.temperature.gpu_temp_max_mem_threshold',
 'nvidia_smi_log.gpu.power_readings.power_draw',
 'nvidia_smi_log.gpu.clocks.graphics_clock',
 'nvidia_smi_log.gpu.clocks.sm_clock',
 'nvidia_smi_log.gpu.clocks.mem_clock',
 'nvidia_smi_log.gpu.clocks.video_clock',
 'nvidia_smi_log.gpu.applications_cl

There are no categorical features in our dataset. `nvidia_smi_log.timestamp` can be used to return the indices.

## Check categories

Rows that are collected during mining activities are marked as 1 and the rest are marked as 0.

In [4]:
df["label"].unique()

0    0
1    1
Name: label, dtype: int64

## Split training and testing data

In [5]:
#  80/20 dataset split
X_train, X_test, y_train, y_test= train_test_split(df.drop(["label","nvidia_smi_log.timestamp"],axis=1), df['label'],  train_size=0.8, random_state=1)

## Move to DMatrix

In [6]:
dmatrix_train = xgb.DMatrix(X_train, label=y_train)
dmatrix_validation = xgb.DMatrix(X_test, label=y_test)

## Set Parameters

In [7]:
params = {'tree_method':'gpu_hist','eval_metric': 'auc', 'objective': 'binary:logistic', 'max_depth':5, 'learning_rate':0.1}

Information on XGBoost parameters can be found [here](https://xgboost.readthedocs.io/en/latest/)

## Train Model

In [8]:
evallist = [(dmatrix_validation, 'validation'), (dmatrix_train, 'train')]
num_round = 5

In [9]:
bst = xgb.train(params, dmatrix_train, num_round, evallist)

[0]	validation-auc:1.00000	train-auc:1.00000
[1]	validation-auc:1.00000	train-auc:1.00000
[2]	validation-auc:1.00000	train-auc:1.00000
[3]	validation-auc:1.00000	train-auc:1.00000
[4]	validation-auc:1.00000	train-auc:1.00000


## Save model

In [10]:
bst.save_model("ABP_xgboost_model.bst")

## Load the model & Run inference with FIL(Forest Inference Library)

In [11]:
# Load the classifier previously saved with xgboost model_save()
model_path = "./ABP_xgboost_model.bst"

In [12]:
fm = ForestInference.load(model_path, output_class=True)

In [13]:
fil_preds_gpu = fm.predict(X_test.astype("float32"))

In [14]:
from sklearn.metrics import accuracy_score
y_pred = fil_preds_gpu.to_array()
y_true = y_test.to_array()
accuracy_score(y_true, y_pred)

1.0

## Conclusion
The model predicted all the workloads in the test set correctly.
Since our dataset in this experiment is balanced, we use the accuracy metric.
We are not able to publish the internal data used in this notebook, however, users can use this notebook with `nvidia-smi` outputs from their machines with multiple combinations of different workloads. This approach will be included in [Morpheus](https://developer.nvidia.com/morpheus-cybersecurity) with a full pipeline.

## References
* https://docs.rapids.ai/api/cuml/stable/
* https://medium.com/rapids-ai/rapids-forest-inference-library-prediction-at-100-million-rows-per-second-19558890bc35
* https://developer.nvidia.com/nvidia-system-management-interface
* https://developer.nvidia.com/morpheus-cybersecurity
* https://github.com/rapidsai/clx/blob/branch-0.20/examples/forest_inference/xgboost_training.ipynb