# CLX Asset Classification (Supervised)

This is an introduction to CLX Asset Classification.

# Introduction

In this example, we will show how to use CLX to perform asset classification with some randomly generated dataset using cudf and cuml. This work could be expanded by using different log types (i.e, Windows Events) or different events from the machines as features to improve accuracy. Various labels can be selected to cover different types of machines or data-centres.

## Train Asset Classification model

First initialize your new model

In [1]:
from clx.analytics.asset_classification import AssetClassification

ac = AssetClassification()

Next, train your assest classification model. The below example uses a small sample dataset for demonstration only. Ideally you will want a larger training set.

In [2]:
import random
import cudf

train_gdf = cudf.DataFrame()
train_gdf["1"] = [random.randint(1, 24) for _ in range(9000)]
train_gdf["2"] = [random.randint(1, 4) for _ in range(9000)]
train_gdf["3"] = [random.randint(1, 9) for _ in range(9000)]
train_gdf["4"] = [random.randint(1, 26) for _ in range(9000)]
train_gdf["5"] = [random.randint(1, 3) for _ in range(9000)]
train_gdf["6"] = [random.randint(1, 9) for _ in range(9000)]
train_gdf["7"] = [random.randint(1, 37) for _ in range(9000)]
train_gdf["8"] = [random.randint(1, 8) for _ in range(9000)]
train_gdf["9"] = [random.randint(1, 4) for _ in range(9000)]
train_gdf["10"] = [random.randint(1, 11) for _ in range(9000)]
train_gdf['label'] = [random.randint(0, 6) for _ in range(9000)]

Split the dataset into training and test sets

In [3]:
from cuml import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(train_gdf, 'label', train_size=0.8)
X_train["label"] = Y_train

## Initialize variables
- Categorical and Continuous feature columns
- Batchsize
- Number of epochs

In [4]:
cat_cols = ["1", "2", "3", "4", "5", "6", "7", "8"]
cont_cols = ["9", "10"]
batch_size = 1000
epochs = 2

In [5]:
ac.train_model(X_train, cat_cols, cont_cols, "label", batch_size, epochs, lr=0.01, wd=0.0)

training loss:  1.948693021580025
valid loss 1.949 and accuracy 0.146
training loss:  1.937150220812103
valid loss 1.950 and accuracy 0.131


Ideally, you will want to train your model over a number of `epochs` as detailed in our example Asset Classification [notebook](https://github.com/rapidsai/clx/blob/branch-0.19/notebooks/network_mapping/CLX_Supervised_Asset_Classification.ipynb).

## Save a trained model

In [6]:
ac.save_model("clx_asset_classifier.pth")

Let's create a new asset classifier instance and load saved model.

In [7]:
asset_classifier = AssetClassification()
asset_classifier.load_model('clx_asset_classifier.pth')

## AC Inferencing

Use your new model for prediction

In [8]:
pred_results = ac.predict(X_test, cat_cols, cont_cols).to_array()
true_results = Y_test.to_array()