# Exploration of the TRAbID2017 dataset

In the following notebook, we illustrate training a multilayer perceptron (MLP) to classify benign and malicious observations from the TRAbID017 IDS dataset.

For training, 14651 samples are used.

For testing, 3663 samples are used, and a 99.78% accuracy is achieved.

In [13]:
import pandas as pd
import numpy as np
from scipy.io import arff

from sklearn.metrics import roc_curve, auc, confusion_matrix, classification_report, accuracy_score, precision_score, recall_score, f1_score
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

data = arff.loadarff('./data/TRAbID2017_dataset.arff')
dataset = pd.DataFrame(data[0])

# Creating X and Y from the dataset
X = dataset.iloc[:,0:43].values
Y_class = pd.read_csv('./data/TRAbID2017_dataset_Y_class.csv')
Y_class = Y_class.iloc[:,:].values

# Performing scale data
scaler = MinMaxScaler().fit(X)
X_scaled = np.array(scaler.transform(X))

X_train, X_test, y_train, y_test = train_test_split(X_scaled, Y_class, test_size = 0.2, random_state = 42, stratify=Y_class)

In [49]:
from sklearn.neural_network import MLPClassifier
clf = MLPClassifier(random_state=1, max_iter=300).fit(X_train, np.ravel(y_train))
print("Training Stats:")
print(" Total samples: ", X_train.shape[0])

Training Stats:
 Total samples:  14651


In [50]:
y_test = np.ravel(y_test)
predict = clf.predict_proba(X_test)
score = 0

for i in range(predict.shape[0]):
    if np.argmax(predict[i]) == y_test[i]:
        score = score + 1

print("Prediction accuracy:")
print(" Correct: ", score)
print(" Total: ", predict.shape[0])
print(" Accuracy: ", score/predict.shape[0]*100)

Prediction accuracy:
 Correct:  3655
 Total:  3663
 Accuracy:  99.78159978159978
