<a href="https://colab.research.google.com/github/fwangliberty/AIoTDesign-Frontend/blob/master/ensembling_DNN_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Ensembling DNN, Random Forest and DecisionTree based on Normalization and Standardization Datasets


We will use the same DNN model, Random Forest and Decision Tree to classify network anormalies in CICIDS2017 dataset. More specifically, the dataset has been augmented by adding 7 new connection based features. We will use normalized and standardized datasets to train the DNN model. Random Foresty model is not sensitive to the normalization method. Then each model will be evaluated using the test set with the corresponding normalized dataset. After that, we put all two models in an ensemble and evaluate it. It is expected that the ensemble will perform better on a test set that any single model in the ensemble separately.

There are many different types of ensembles; stacking is one of them. It is one of the more general types and can theoretically represent any other ensemble technique. Stacking involves training a learning algorithm to combine the predictions of several other learning algorithms. For the sake of this example, I will use one of the simplest forms of Stacking, which involves taking an average of outputs of models in the ensemble. Since averaging doesn't take any parameters, there is no need to train this ensemble (only its models).

**This notebook switch the test and validation dataset.**

## Preparing the data
First, import dependencies.

In [1]:
from keras.callbacks import History
from keras.callbacks import ModelCheckpoint, TensorBoard
from keras.engine import training
from keras.layers import Conv2D, MaxPooling2D, GlobalAveragePooling2D, Dropout, Activation, Average
from keras.losses import categorical_crossentropy
from keras.models import Model, Input
from keras.optimizers import Adam
from keras.utils import to_categorical
from tensorflow.python.framework.ops import Tensor
from typing import Tuple, List
import glob
import numpy as np
import os
from os.path import join
import pandas as pd
import numpy as np
import time
import seaborn as sns
import matplotlib.pyplot as plt


### Define Metrics

In [2]:
def display_metrics(y_test, y_pred, label_names):
  print('\nAccuracy: {:.2f}\n'.format(accuracy_score(y_test, y_pred)))

  print('Micro Precision: {:.2f}'.format(precision_score(y_test, y_pred, average='micro')))
  print('Micro Recall: {:.2f}'.format(recall_score(y_test, y_pred, average='micro')))
  print('Micro F1-score: {:.2f}\n'.format(f1_score(y_test, y_pred, average='micro')))

  print('Macro Precision: {:.2f}'.format(precision_score(y_test, y_pred, average='macro')))
  print('Macro Recall: {:.2f}'.format(recall_score(y_test, y_pred, average='macro')))
  print('Macro F1-score: {:.2f}\n'.format(f1_score(y_test, y_pred, average='macro')))

  print('Weighted Precision: {:.2f}'.format(precision_score(y_test, y_pred, average='weighted')))
  print('Weighted Recall: {:.2f}'.format(recall_score(y_test, y_pred, average='weighted')))
  print('Weighted F1-score: {:.2f}'.format(f1_score(y_test, y_pred, average='weighted')))

  print('\nClassification Report\n')
  print(classification_report(y_test, y_pred, target_names=label_names))

In [3]:
def make_value2index(attacks):
    #make dictionary
    attacks = sorted(attacks)
    d = {}
    counter=0
    for attack in attacks:
        d[attack] = counter
        counter+=1
    return d

In [4]:
# chganges label from string to integer/index
def encode_label(Y_str):
    labels_d = make_value2index(np.unique(Y_str))
    Y = [labels_d[y_str] for y_str  in Y_str]
    Y = np.array(Y)
    return np.array(Y)

# 1. Locating CSV files

In [5]:
# All columns
col_names = np.array(['dst sport count', 'src dport count', 'dst src count', 'dport count', 'sport count', 'dst host count','src host count','Source Port', 'Destination Port',
                      'Protocol', 'Flow Duration', 'Total Fwd Packets', 'Total Backward Packets', 'Total Length of Fwd Packets',
                      'Total Length of Bwd Packets', 'Fwd Packet Length Max', 'Fwd Packet Length Min', 'Fwd Packet Length Mean',
                      'Fwd Packet Length Std', 'Bwd Packet Length Max', 'Bwd Packet Length Min', 'Bwd Packet Length Mean', 'Bwd Packet Length Std',
                      'Flow Bytes/s', 'Flow Packets/s', 'Flow IAT Mean', 'Flow IAT Std', 'Flow IAT Max', 'Flow IAT Min', 'Fwd IAT Total',
                      'Fwd IAT Mean', 'Fwd IAT Std', 'Fwd IAT Max', 'Fwd IAT Min', 'Bwd IAT Total', 'Bwd IAT Mean', 'Bwd IAT Std', 'Bwd IAT Max',
                      'Bwd IAT Min', 'Fwd PSH Flags', 'Fwd URG Flags', 'Fwd Header Length', 'Bwd Header Length',
                      'Fwd Packets/s', 'Bwd Packets/s', 'Min Packet Length', 'Max Packet Length', 'Packet Length Mean', 'Packet Length Std',
                      'Packet Length Variance', 'FIN Flag Count', 'SYN Flag Count', 'RST Flag Count', 'PSH Flag Count', 'ACK Flag Count',
                      'URG Flag Count', 'CWE Flag Count', 'ECE Flag Count', 'Down/Up Ratio', 'Average Packet Size', 'Avg Fwd Segment Size',
                      'Avg Bwd Segment Size','Subflow Fwd Packets', 'Subflow Fwd Bytes',
                      'Subflow Bwd Packets', 'Subflow Bwd Bytes', 'Init_Win_bytes_forward', 'Init_Win_bytes_backward',
                      'act_data_pkt_fwd', 'min_seg_size_forward', 'Active Mean', 'Active Std', 'Active Max', 'Active Min', 'Idle Mean',
                      'Idle Std', 'Idle Max', 'Idle Min', 'Label'])

### Option 1. Connect to Google Drive 

In [6]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [7]:
train_path='/content/drive/My Drive/CICIDS2017/train_set_ext78_2.csv'

In [8]:
validation_path = '/content/drive/My Drive/CICIDS2017/crossval_set_ext78_2.csv'
test_path = '/content/drive/My Drive/CICIDS2017/test_set_ext78_2.csv'

### Option 2. Connect to Local Machine

In [None]:
train_path = '../data/cicids2017clean/train_set_ext78_2.csv'
validation_path = '../data/cicids2017clean/crossval_set_ext78_2.csv'
test_path = '../data/cicids2017clean/test_set_ext78_2.csv'

# 2. Loading CSV Datasets

In [9]:
# load three csv files generated by mlp4nids (Multi-layer perceptron for network intrusion detection )
# first load the train set
df_train = pd.read_csv(train_path,names=col_names, skiprows=1)  

In [None]:
print('Train set size: ', df_train.shape)

Train set size:  (879589, 79)


In [10]:
df_test = pd.read_csv(test_path, names=col_names, skiprows=1)  
print('Test set size: ', df_test.shape)

df_val = pd.read_csv(validation_path,names=col_names, skiprows=1)  
print('Validation set size: ', df_val.shape)

Test set size:  (188483, 79)
Validation set size:  (188484, 79)


In [None]:
df_train.describe()

Unnamed: 0,dst sport count,src dport count,dst src count,dport count,sport count,dst host count,src host count,Source Port,Destination Port,Protocol,Flow Duration,Total Fwd Packets,Total Backward Packets,Total Length of Fwd Packets,Total Length of Bwd Packets,Fwd Packet Length Max,Fwd Packet Length Min,Fwd Packet Length Mean,Fwd Packet Length Std,Bwd Packet Length Max,Bwd Packet Length Min,Bwd Packet Length Mean,Bwd Packet Length Std,Flow Bytes/s,Flow Packets/s,Flow IAT Mean,Flow IAT Std,Flow IAT Max,Flow IAT Min,Fwd IAT Total,Fwd IAT Mean,Fwd IAT Std,Fwd IAT Max,Fwd IAT Min,Bwd IAT Total,Bwd IAT Mean,Bwd IAT Std,Bwd IAT Max,Bwd IAT Min,Fwd PSH Flags,Fwd URG Flags,Fwd Header Length,Bwd Header Length,Fwd Packets/s,Bwd Packets/s,Min Packet Length,Max Packet Length,Packet Length Mean,Packet Length Std,Packet Length Variance,FIN Flag Count,SYN Flag Count,RST Flag Count,PSH Flag Count,ACK Flag Count,URG Flag Count,CWE Flag Count,ECE Flag Count,Down/Up Ratio,Average Packet Size,Avg Fwd Segment Size,Avg Bwd Segment Size,Subflow Fwd Packets,Subflow Fwd Bytes,Subflow Bwd Packets,Subflow Bwd Bytes,Init_Win_bytes_forward,Init_Win_bytes_backward,act_data_pkt_fwd,min_seg_size_forward,Active Mean,Active Std,Active Max,Active Min,Idle Mean,Idle Std,Idle Max,Idle Min
count,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0,879589.0
mean,9.655316,37.867227,49.836973,48.006394,10.465226,55.56617,57.243626,42882.307181,6397.910982,8.693028,19230370.0,8.214027,8.823871,440.3727,14170.82,179.036862,13.701055,48.02017,61.747571,1463.559232,29.06178,487.495944,600.39814,1154200.0,78355.47,1785500.0,4593409.0,15057810.0,181790.9,18950050.0,3497445.0,5784826.0,14935290.0,876415.4,9546793.0,1832198.0,2096462.0,5847373.0,777501.1,0.035845,6.9e-05,-1779.193,-1391.305,70667.63,7731.049,11.697604,1520.444635,250.439207,478.990505,948854.0,0.05693,0.035845,0.000164,0.348102,0.351718,0.068783,6.9e-05,0.000164,0.662657,277.045229,48.02017,487.495944,8.214027,440.3727,8.823871,14169.74,7118.105536,1463.521291,4.322587,-1156.058,87221.49,34274.64,140967.5,66954.74,14020000.0,808858.5,14634780.0,13416850.0
std,25.810487,40.954725,44.142062,39.181947,25.970811,41.115493,40.505493,19661.359246,16111.253716,4.734203,37119190.0,685.761027,912.882257,6605.092,2075330.0,610.645248,57.039193,160.224075,237.180304,2657.258586,60.386127,814.91849,1167.665597,23349980.0,271312.8,4929222.0,9877418.0,31949610.0,3274780.0,37087120.0,9762596.0,13050670.0,32015890.0,8098459.0,28165660.0,8481475.0,8133173.0,20549480.0,7499528.0,0.185904,0.008327,1186150.0,1158914.0,265260.1,38093.61,22.706568,2687.748532,394.185794,848.222439,2302125.0,0.231709,0.185904,0.012794,0.476369,0.477507,0.253085,0.008327,0.012794,0.643625,431.918903,160.224075,814.91849,685.761027,6605.092,912.882257,2075043.0,13584.23916,7163.85698,537.436126,612932.8,648736.9,370051.2,950582.2,590088.5,31146250.0,6040495.0,31991130.0,30915600.0
min,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,-1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-12000000.0,-2000000.0,-1.0,0.0,-1.0,-13.0,0.0,0.0,0.0,0.0,-12.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-1073741000.0,-1073741000.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,-1.0,-1.0,0.0,-536870700.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,1.0,1.0,4.0,2.0,1.0,8.0,11.0,36053.0,53.0,6.0,81.0,1.0,1.0,2.0,0.0,2.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,118.5832,1.130725,59.66667,0.0,77.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,40.0,20.0,0.7750416,0.06135056,0.0,6.0,3.333333,2.19089,4.8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,2.0,0.0,1.0,2.0,1.0,0.0,0.0,-1.0,0.0,20.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,1.0,16.0,37.0,48.0,1.0,60.0,64.0,50214.0,80.0,6.0,40026.0,2.0,2.0,51.0,114.0,31.0,0.0,27.0,0.0,70.0,0.0,63.0,0.0,3542.475,90.14017,14006.8,4520.698,31632.0,4.0,49.0,49.0,0.0,49.0,3.0,3.0,3.0,0.0,3.0,1.0,0.0,0.0,64.0,40.0,47.88519,10.54819,0.0,79.0,54.2,21.36196,456.3333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,69.25,27.0,63.0,2.0,51.0,2.0,114.0,256.0,0.0,1.0,24.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,2.0,95.0,99.0,95.0,2.0,99.0,99.0,57335.0,443.0,6.0,6453224.0,5.0,5.0,310.0,4211.0,201.0,6.0,48.75,72.555151,1460.0,6.0,453.4,630.0,136363.6,27397.26,1033201.0,2212140.0,5702855.0,54.0,5871722.0,1273712.0,2006017.0,5487049.0,48.0,147596.0,29487.8,44657.91,131990.0,45.0,0.0,0.0,144.0,132.0,14084.51,8620.69,6.0,1472.0,283.5,533.098083,284193.6,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,300.0,48.75,453.4,5.0,310.0,5.0,4211.0,8192.0,235.0,2.0,32.0,0.0,0.0,0.0,0.0,5187895.0,0.0,5187896.0,5104746.0
max,100.0,100.0,100.0,100.0,100.0,100.0,100.0,65536.0,65536.0,17.0,120000000.0,217797.0,289585.0,2866110.0,639650600.0,24820.0,2065.0,5939.285714,6692.644993,17376.0,1983.0,4370.686524,6715.738331,2071000000.0,3000000.0,120000000.0,84800000.0,120000000.0,120000000.0,120000000.0,120000000.0,83700000.0,120000000.0,120000000.0,120000000.0,120000000.0,82993410.0,120000000.0,120000000.0,1.0,1.0,4617240.0,5791700.0,3000000.0,2000000.0,1448.0,24820.0,2920.0,4731.522394,22400000.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,29.0,3337.142857,5939.285714,4370.686524,217797.0,2866110.0,289585.0,639650600.0,65535.0,65535.0,207409.0,60.0,100000000.0,63800000.0,100000000.0,100000000.0,120000000.0,75600000.0,120000000.0,120000000.0


In [None]:
df_test.describe()

Unnamed: 0,dst sport count,src dport count,dst src count,dport count,sport count,dst host count,src host count,Source Port,Destination Port,Protocol,Flow Duration,Total Fwd Packets,Total Backward Packets,Total Length of Fwd Packets,Total Length of Bwd Packets,Fwd Packet Length Max,Fwd Packet Length Min,Fwd Packet Length Mean,Fwd Packet Length Std,Bwd Packet Length Max,Bwd Packet Length Min,Bwd Packet Length Mean,Bwd Packet Length Std,Flow Bytes/s,Flow Packets/s,Flow IAT Mean,Flow IAT Std,Flow IAT Max,Flow IAT Min,Fwd IAT Total,Fwd IAT Mean,Fwd IAT Std,Fwd IAT Max,Fwd IAT Min,Bwd IAT Total,Bwd IAT Mean,Bwd IAT Std,Bwd IAT Max,Bwd IAT Min,Fwd PSH Flags,Fwd URG Flags,Fwd Header Length,Bwd Header Length,Fwd Packets/s,Bwd Packets/s,Min Packet Length,Max Packet Length,Packet Length Mean,Packet Length Std,Packet Length Variance,FIN Flag Count,SYN Flag Count,RST Flag Count,PSH Flag Count,ACK Flag Count,URG Flag Count,CWE Flag Count,ECE Flag Count,Down/Up Ratio,Average Packet Size,Avg Fwd Segment Size,Avg Bwd Segment Size,Subflow Fwd Packets,Subflow Fwd Bytes,Subflow Bwd Packets,Subflow Bwd Bytes,Init_Win_bytes_forward,Init_Win_bytes_backward,act_data_pkt_fwd,min_seg_size_forward,Active Mean,Active Std,Active Max,Active Min,Idle Mean,Idle Std,Idle Max,Idle Min
count,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0,188483.0
mean,9.625308,37.861701,49.864025,48.026984,10.434315,55.607434,57.249322,42825.77243,6398.028565,8.699867,19372970.0,6.393341,6.290955,425.507547,9614.426,180.326422,13.616549,48.088141,62.258627,1471.358101,29.25224,489.581182,604.521501,1141231.0,78369.92,1791253.0,4628396.0,15150270.0,173650.2,19101160.0,3521722.0,5821567.0,15029400.0,882492.2,9685500.0,1863613.0,2133658.0,5939107.0,789039.8,0.036115,9.5e-05,-724.4541,-729.0569,70769.28,7712.062,11.748036,1528.642275,251.232699,481.683733,957566.8,0.057278,0.036115,0.000164,0.34862,0.350392,0.068335,9.5e-05,0.000175,0.66268,277.96455,48.088141,489.581182,6.393341,425.507547,6.290955,9615.25,7121.99713,1479.610888,3.475565,-418.5286,89219.3,35333.51,144818.9,68315.14,14120790.0,800156.3,14730720.0,13522380.0
std,25.759133,40.9958,44.175627,39.21799,25.91664,41.132627,40.53749,19694.807197,16123.799361,4.737968,37240730.0,364.106378,421.965728,3858.850242,1454190.0,618.93718,55.371692,158.738971,238.828812,2667.68265,62.105876,817.093491,1174.621985,23485090.0,272735.1,4872676.0,9915489.0,32024280.0,3155030.0,37215790.0,9783012.0,13092090.0,32092030.0,8108421.0,28367290.0,8531263.0,8212071.0,20703590.0,7526471.0,0.186576,0.009772,386509.0,386530.4,266532.4,37336.33,23.000785,2699.393195,394.531874,851.827223,2321479.0,0.232374,0.186576,0.012824,0.476535,0.477094,0.252321,0.009772,0.013231,0.648263,432.349886,158.738971,817.093491,364.106378,3858.850242,421.965728,1454292.0,13565.260196,7247.906478,355.424245,193219.1,661947.2,371648.2,974288.3,599289.3,31239640.0,5987927.0,32065970.0,31017370.0
min,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,-2.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-12000000.0,-2000000.0,-2.0,0.0,-2.0,-14.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-167770500.0,-167770500.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,-1.0,-1.0,0.0,-83885310.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,1.0,1.0,4.0,2.0,1.0,8.0,11.0,35960.0,53.0,6.0,81.0,1.0,1.0,2.0,4.0,2.0,0.0,2.0,0.0,2.0,0.0,2.0,0.0,118.7398,1.091799,60.0,0.0,77.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,40.0,20.0,0.7514426,0.06150911,0.0,6.0,3.333333,2.19089,4.8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,2.0,2.0,1.0,2.0,1.0,4.0,0.0,-1.0,0.0,20.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,1.0,16.0,37.0,48.0,1.0,60.0,64.0,50181.0,80.0,6.0,40576.0,2.0,2.0,51.0,115.0,32.0,0.0,27.0,0.0,72.0,0.0,65.0,0.0,3547.589,88.7968,14268.0,4794.877,31808.0,4.0,49.0,49.0,0.0,49.0,3.0,3.0,3.0,0.0,3.0,1.0,0.0,0.0,64.0,40.0,46.70976,10.65553,0.0,80.0,54.666667,21.685248,470.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,70.25,27.0,65.0,2.0,51.0,2.0,115.0,256.0,0.0,1.0,24.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,2.0,95.0,99.0,96.0,2.0,99.0,99.0,57348.0,443.0,6.0,6817418.0,5.0,5.0,311.0,4256.0,201.0,8.0,49.0,73.49915,1460.0,23.0,454.142512,631.195401,135593.2,27210.88,1064177.0,2267229.0,5766194.0,54.0,5912571.0,1313678.0,2022685.0,5549644.0,48.0,147833.0,29539.0,46307.22,132203.0,46.0,0.0,0.0,144.0,132.0,13953.49,8658.009,6.0,1489.0,284.565336,533.916876,285067.2,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,300.811966,49.0,454.142512,5.0,311.0,5.0,4256.0,8192.0,235.0,2.0,32.0,0.0,0.0,0.0,0.0,5281430.0,0.0,5282144.0,5158305.0
max,100.0,100.0,100.0,100.0,100.0,100.0,100.0,65536.0,65536.0,17.0,119999900.0,155812.0,179415.0,917301.0,627000000.0,24820.0,1983.0,3917.928571,5796.50069,14480.0,1702.0,3927.262213,6715.738331,2071000000.0,4000000.0,118000000.0,84800000.0,120000000.0,118000000.0,120000000.0,120000000.0,83400000.0,120000000.0,120000000.0,120000000.0,120000000.0,82704010.0,120000000.0,120000000.0,1.0,1.0,3162564.0,3588312.0,3000000.0,2000000.0,1359.0,24820.0,2119.741935,4731.522394,22400000.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,43.0,2357.0,3917.928571,3927.262213,155812.0,917301.0,179415.0,627040600.0,65535.0,65535.0,152621.0,138.0,107000000.0,42300000.0,107000000.0,107000000.0,120000000.0,72600000.0,120000000.0,120000000.0


In [None]:
df_val.describe()

Unnamed: 0,dst sport count,src dport count,dst src count,dport count,sport count,dst host count,src host count,Source Port,Destination Port,Protocol,Flow Duration,Total Fwd Packets,Total Backward Packets,Total Length of Fwd Packets,Total Length of Bwd Packets,Fwd Packet Length Max,Fwd Packet Length Min,Fwd Packet Length Mean,Fwd Packet Length Std,Bwd Packet Length Max,Bwd Packet Length Min,Bwd Packet Length Mean,Bwd Packet Length Std,Flow Bytes/s,Flow Packets/s,Flow IAT Mean,Flow IAT Std,Flow IAT Max,Flow IAT Min,Fwd IAT Total,Fwd IAT Mean,Fwd IAT Std,Fwd IAT Max,Fwd IAT Min,Bwd IAT Total,Bwd IAT Mean,Bwd IAT Std,Bwd IAT Max,Bwd IAT Min,Fwd PSH Flags,Fwd URG Flags,Fwd Header Length,Bwd Header Length,Fwd Packets/s,Bwd Packets/s,Min Packet Length,Max Packet Length,Packet Length Mean,Packet Length Std,Packet Length Variance,FIN Flag Count,SYN Flag Count,RST Flag Count,PSH Flag Count,ACK Flag Count,URG Flag Count,CWE Flag Count,ECE Flag Count,Down/Up Ratio,Average Packet Size,Avg Fwd Segment Size,Avg Bwd Segment Size,Subflow Fwd Packets,Subflow Fwd Bytes,Subflow Bwd Packets,Subflow Bwd Bytes,Init_Win_bytes_forward,Init_Win_bytes_backward,act_data_pkt_fwd,min_seg_size_forward,Active Mean,Active Std,Active Max,Active Min,Idle Mean,Idle Std,Idle Max,Idle Min
count,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0,188484.0
mean,9.731272,37.780172,49.883332,47.914115,10.539032,55.614689,57.218512,42898.015243,6422.896426,8.694239,19246850.0,7.434695,7.835328,439.0308,11801.21,178.156406,13.800163,48.024119,61.523877,1463.444457,29.060005,487.327309,600.629474,1225907.0,79429.23,1791008.0,4609148.0,15135620.0,184944.7,18968320.0,3499186.0,5813441.0,15011550.0,868859.3,9470943.0,1831885.0,2100258.0,5840577.0,774556.4,0.035764,6.9e-05,-62920.69,-11645.2,71680.2,7802.082,11.720783,1520.258568,249.86342,478.533085,947914.5,0.056583,0.035764,0.000154,0.347573,0.352502,0.068372,6.9e-05,0.000154,0.660905,276.454458,48.024119,487.327309,7.434695,439.0308,7.835328,11799.81,7106.119968,1484.06192,3.641153,-8963.623,90428.21,35655.53,145008.0,68993.22,14071870.0,820558.0,14697360.0,13460960.0
std,25.915515,40.960165,44.147226,39.203217,26.082113,41.112314,40.531929,19646.086134,16145.734341,4.735494,37093540.0,561.52956,747.556849,6996.634,1668861.0,604.186341,57.791361,160.261305,237.467628,2660.07588,60.839255,816.166175,1169.171476,24824710.0,273571.9,4871805.0,9865068.0,31987300.0,3220118.0,37059750.0,9655176.0,13071590.0,32050870.0,7956019.0,28049160.0,8424538.0,8163563.0,20532120.0,7404418.0,0.185702,0.008305,22532880.0,3503013.0,267285.1,38671.33,22.334888,2689.274156,393.766139,847.92847,2293114.0,0.231045,0.185702,0.012403,0.476201,0.477751,0.252384,0.008305,0.012403,0.64401,431.48156,160.261305,816.166175,561.52956,6996.634,747.556849,1668515.0,13575.776949,7232.426279,432.259086,2150554.0,698806.3,390891.0,999005.3,630712.8,31163490.0,6075121.0,32018620.0,30930170.0
min,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,-1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-12000000.0,-2000000.0,-1.0,0.0,-1.0,-13.0,0.0,0.0,0.0,0.0,-1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-9663668000.0,-1073741000.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,-1.0,-1.0,0.0,-536870700.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,1.0,1.0,4.0,2.0,1.0,8.0,11.0,36032.0,53.0,6.0,80.0,1.0,1.0,2.0,0.0,2.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,118.8367,1.102465,59.0,0.0,76.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,40.0,20.0,0.7607826,0.06115542,0.0,6.0,3.333333,2.19089,4.8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,2.0,0.0,1.0,2.0,1.0,0.0,0.0,-1.0,0.0,20.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,1.0,16.0,37.0,48.0,1.0,60.0,64.0,50188.0,80.0,6.0,39284.0,2.0,2.0,51.0,113.0,32.0,0.0,27.0,0.0,69.0,0.0,63.0,0.0,3578.308,89.08289,14206.12,3495.541,31581.0,4.0,49.0,49.0,0.0,49.0,3.0,3.0,3.0,0.0,3.0,1.0,0.0,0.0,64.0,40.0,47.70885,10.50464,0.0,79.0,53.8,21.36196,456.3333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,69.0,27.0,63.0,2.0,51.0,2.0,113.0,256.0,0.0,1.0,24.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,2.0,95.0,99.0,95.0,2.0,99.0,99.0,57364.0,443.0,6.0,6542525.0,5.0,5.0,309.0,4160.25,201.0,8.0,48.857143,71.990846,1460.0,6.0,450.142857,626.148804,136363.6,27397.26,1052813.0,2238775.0,5747710.0,55.0,5891722.0,1299830.0,2010913.0,5516472.0,48.0,147567.2,29482.45,43573.38,131420.0,45.0,0.0,0.0,144.0,132.0,14084.51,8695.652,6.0,1460.0,280.125,529.37442,280237.3,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,297.6,48.857143,450.142857,5.0,309.0,5.0,4160.25,8192.0,235.0,2.0,32.0,0.0,0.0,0.0,0.0,5202890.0,0.0,5202890.0,5116548.0
max,100.0,100.0,100.0,100.0,100.0,100.0,100.0,65536.0,65536.0,17.0,120000000.0,193200.0,256740.0,2321478.0,575681600.0,23360.0,1983.0,5940.857143,7049.469004,15928.0,1460.0,5800.5,8194.660487,2071000000.0,3000000.0,120000000.0,84800260.0,120000000.0,120000000.0,120000000.0,120000000.0,84417970.0,120000000.0,120000000.0,120000000.0,120000000.0,84418010.0,120000000.0,120000000.0,1.0,1.0,3939424.0,5134800.0,3000000.0,2000000.0,1408.0,23360.0,2433.333333,4708.990311,22200000.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,27.0,2539.130435,5940.857143,5800.5,193200.0,2321478.0,256740.0,575681600.0,65535.0,65535.0,187113.0,93.0,101659700.0,48500000.0,101659700.0,101659700.0,120000000.0,73800000.0,120000000.0,120000000.0


# 3. Encoding Datasets

### Encoding train dataset

In [11]:
df_label = df_train['Label']
data = df_train.drop(columns=['Label'])
Xtrain = data.values
y_train = encode_label(df_label.values)

### Encoding test dataset

In [12]:
df_label = df_test['Label']
data = df_test.drop(columns=['Label'])
Xval = data.values
y_val = encode_label(df_label.values)

### Encoding validation dataset

In [13]:
df_label = df_val['Label']
data = df_val.drop(columns=['Label'])
Xtest = data.values
y_test = encode_label(df_label.values)

# 4. Normalization

The values of the datasets are normalized using the Min-Max scaling technique, bringing them all within a range of [0,1].

In [14]:
from sklearn.preprocessing import MinMaxScaler

In [15]:
scaler = MinMaxScaler()
X_train_n = scaler.fit_transform(Xtrain)
X_train_n

array([[0.01010101, 1.        , 1.        , ..., 0.68650794, 0.71416667,
        0.10166667],
       [0.01010101, 0.22222222, 0.12121212, ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.01010101, 0.02020202, ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.12121212, 0.12121212, ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.87878788, 0.87878788, ..., 0.        , 0.        ,
        0.        ],
       [1.        , 0.        , 1.        , ..., 0.        , 0.        ,
        0.        ]])

In [16]:
X_val_n = scaler.fit_transform(Xval)
X_val_n

array([[0.        , 0.96969697, 0.96969697, ..., 0.        , 0.71583333,
        0.71583333],
       [0.        , 0.96969697, 0.96969697, ..., 0.        , 0.6975    ,
        0.6975    ],
       [0.        , 0.        , 1.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.44444444, 0.08080808, ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.78787879, ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.28282828, 0.28282828, ..., 0.        , 0.        ,
        0.        ]])

In [17]:
X_test_n = scaler.fit_transform(Xtest)
X_test_n

array([[0.        , 0.32323232, 0.32323232, ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.05050505, 0.05050505, ..., 0.        , 0.        ,
        0.        ],
       [0.02020202, 0.09090909, 0.02020202, ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 1.        , 1.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 1.        , 1.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.96969697, ..., 0.        , 0.        ,
        0.        ]])

# 5. Standardization

In [18]:
from sklearn.preprocessing import StandardScaler

In [19]:
scaler = StandardScaler()

X_train_sd = scaler.fit_transform(Xtrain)
X_val_sd = scaler.fit_transform(Xval)
X_test_sd = scaler.fit_transform(Xtest)

X_train_sd

array([[-0.29659729,  1.51710965,  1.13640038, ...,  8.4581104 ,
         2.22140524, -0.0393603 ],
       [-0.29659729, -0.36301636, -0.83451004, ..., -0.13390608,
        -0.45746381, -0.43398332],
       [-0.33534125, -0.87577799, -1.06105146, ..., -0.13390608,
        -0.45746381, -0.43398332],
       ...,
       [-0.33534125, -0.60718856, -0.83451004, ..., -0.13390608,
        -0.45746381, -0.43398332],
       [-0.33534125,  1.224103  ,  0.86455067, ..., -0.13390608,
        -0.45746381, -0.43398332],
       [ 3.50031109, -0.90019522,  1.13640038, ..., -0.13390608,
        -0.45746381, -0.43398332]])

# 6. One-hot Encoding for labels

In [20]:
from tensorflow.keras.utils import to_categorical

In [21]:
y_train_origin = y_train
y_test_origin = y_test
y_val_origin = y_val

In [22]:
y_train = to_categorical(y_train, 15)
y_test = to_categorical(y_test, 15)
y_val = to_categorical(y_val, 15)

# 7.  Define the Metrics

In [23]:
import tensorflow as tf
from sklearn.ensemble import RandomForestClassifier

#importing confusion matrix
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

from sklearn import metrics
from sklearn.metrics import accuracy_score

#importing accuracy_score, precision_score, recall_score, f1_score
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.metrics import mean_squared_error,mean_absolute_error

In [24]:
METRICS = [
      tf.keras.metrics.TruePositives(name='tp'),
      tf.keras.metrics.FalsePositives(name='fp'),
      tf.keras.metrics.TrueNegatives(name='tn'),
      tf.keras.metrics.FalseNegatives(name='fn'), 
      tf.keras.metrics.BinaryAccuracy(name='accuracy'),
      tf.keras.metrics.Precision(name='precision'),
      tf.keras.metrics.Recall(name='recall'),
      tf.keras.metrics.AUC(name='auc'),
]

In [26]:
labels_d = make_value2index(df_test['Label'])

In [27]:
print(labels_d)

{'BENIGN': 105018, 'Bot': 105298, 'DDoS': 124569, 'DoS GoldenEye': 126111, 'DoS Hulk': 160658, 'DoS Slowhttptest': 161486, 'DoS slowloris': 162320, 'FTP-Patator': 163498, 'Heartbleed': 163500, 'Infiltration': 163501, 'PortScan': 187347, 'SSH-Patator': 188173, 'Web Attack � Brute Force': 188382, 'Web Attack � Sql Injection': 188389, 'Web Attack � XSS': 188482}


# First model:  Random Foresty with DecisionTree

### The first model is Random Foresty with DecisionTree.  

In [28]:
randomforest = RandomForestClassifier(n_estimators=10, random_state=10)
randomforest.fit(X_train_n,y_train)
    
y_pred = randomforest.predict(X_test_n)

In [29]:
display_metrics(y_test_origin, np.argmax(y_pred, axis = 1), labels_d)


Accuracy: 1.00

Micro Precision: 1.00
Micro Recall: 1.00
Micro F1-score: 1.00



  _warn_prf(average, modifier, msg_start, len(result))


Macro Precision: 0.91
Macro Recall: 0.82
Macro F1-score: 0.85

Weighted Precision: 1.00
Weighted Recall: 1.00
Weighted F1-score: 1.00

Classification Report

                            precision    recall  f1-score   support

                    BENIGN       0.99      1.00      1.00    104952
                       Bot       0.97      0.57      0.72       275
                      DDoS       1.00      1.00      1.00     19164
             DoS GoldenEye       1.00      0.86      0.92      1531
                  DoS Hulk       1.00      1.00      1.00     34443
          DoS Slowhttptest       1.00      0.99      1.00       838
             DoS slowloris       1.00      0.99      1.00       913
               FTP-Patator       1.00      1.00      1.00      1136
                Heartbleed       1.00      1.00      1.00         1
              Infiltration       1.00      0.40      0.57        10
                  PortScan       1.00      1.00      1.00     23966
               SSH-Patato

  _warn_prf(average, modifier, msg_start, len(result))


### Save the model to disk

In [30]:
import pickle

randomforest_file_name = 'randomforest_sw.sav'
pickle.dump(randomforest, open(randomforest_file_name, 'wb'))

# Second Model: Decision Tree Classifier

In [31]:
from sklearn.tree import DecisionTreeClassifier

In [32]:
model_dec = DecisionTreeClassifier()
model_dec.fit(X_train_n, y_train_origin)

DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',
                       max_depth=None, max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort='deprecated',
                       random_state=None, splitter='best')

In [33]:
y_pred_tree = model_dec.predict(X_test_n)

In [34]:
display_metrics(y_test_origin, y_pred_tree, labels_d)


Accuracy: 0.93

Micro Precision: 0.93
Micro Recall: 0.93
Micro F1-score: 0.93

Macro Precision: 0.87
Macro Recall: 0.89
Macro F1-score: 0.86

Weighted Precision: 0.95
Weighted Recall: 0.93
Weighted F1-score: 0.92

Classification Report

                            precision    recall  f1-score   support

                    BENIGN       1.00      1.00      1.00    104952
                       Bot       0.97      0.98      0.98       275
                      DDoS       1.00      0.37      0.54     19164
             DoS GoldenEye       0.58      0.92      0.71      1531
                  DoS Hulk       0.74      0.96      0.84     34443
          DoS Slowhttptest       0.84      1.00      0.91       838
             DoS slowloris       0.98      1.00      0.99       913
               FTP-Patator       1.00      1.00      1.00      1136
                Heartbleed       1.00      1.00      1.00         1
              Infiltration       0.90      0.90      0.90        10
             

### Save the model

In [35]:
decisiontree_file_name = 'decisiontree_sw.sav'
pickle.dump(model_dec, open(decisiontree_file_name, 'wb'))

# Third model: DNN Model

We will train the same DNN model by using the same dataset with different normalization methods.

In [36]:
def make_model(X_train, y_train, output_bias=None):
  if output_bias is not None:
    output_bias = tf.keras.initializers.Constant(output_bias)
  model = tf.keras.Sequential([
      tf.keras.layers.Dense(
          256, activation='relu',
          input_shape=(X_train.shape[-1],)),
      tf.keras.layers.Dense(256, activation ='relu'),
      tf.keras.layers.Dense(128, activation ='relu'),
      tf.keras.layers.Dense(64, activation ='relu'),
      tf.keras.layers.Dense(y_train.shape[-1], activation='softmax',
                         bias_initializer=output_bias),
  ])

  model.compile(
      optimizer=tf.keras.optimizers.Adam(lr=1e-4),
      loss=tf.keras.losses.BinaryCrossentropy(),
      metrics=METRICS)
    
  return model

We use normalized dataset to train this DNN model.

In [37]:
model_dnn_n = make_model(X_train_n, y_train)

### Train the first DNN model with normalized dataset

In [38]:
EPOCHS = 200
BATCH_SIZE = 9500

In [39]:
baseline_history_n = model_dnn_n.fit(
    X_train_n,
    y_train,
    batch_size=BATCH_SIZE,
    epochs=EPOCHS,
    validation_data=(X_val_n, y_val))

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78

In [40]:
y_pred_n=model_dnn_n.predict(X_test_n)

In [41]:
display_metrics(y_test_origin, np.argmax(y_pred_n, axis = 1), labels_d)


Accuracy: 0.99

Micro Precision: 0.99
Micro Recall: 0.99
Micro F1-score: 0.99



  _warn_prf(average, modifier, msg_start, len(result))


Macro Precision: 0.88
Macro Recall: 0.84
Macro F1-score: 0.84

Weighted Precision: 0.99
Weighted Recall: 0.99
Weighted F1-score: 0.99

Classification Report

                            precision    recall  f1-score   support

                    BENIGN       1.00      0.99      0.99    104952
                       Bot       0.88      0.95      0.91       275
                      DDoS       1.00      1.00      1.00     19164
             DoS GoldenEye       1.00      0.96      0.98      1531
                  DoS Hulk       1.00      1.00      1.00     34443
          DoS Slowhttptest       0.99      0.99      0.99       838
             DoS slowloris       0.99      0.98      0.99       913
               FTP-Patator       0.99      0.99      0.99      1136
                Heartbleed       1.00      1.00      1.00         1
              Infiltration       1.00      0.70      0.82        10
                  PortScan       0.97      0.99      0.98     23966
               SSH-Patato

  _warn_prf(average, modifier, msg_start, len(result))


### Save the model

In [42]:
model_dnn_n.save('dnn_n_sw.h5')

### Train the second DNN model with standardized dataset

In [76]:
def make_model2(X_train, y_train, output_bias=None):
  if output_bias is not None:
    output_bias = tf.keras.initializers.Constant(output_bias)
  model = tf.keras.Sequential([
      tf.keras.layers.Dense(
          512, activation='relu',
          input_shape=(X_train.shape[-1],)),
      tf.keras.layers.Dense(256, activation ='relu'),
      tf.keras.layers.Dense(128, activation ='relu'),
      tf.keras.layers.Dense(64, activation ='relu'),
      tf.keras.layers.Dense(y_train.shape[-1], activation='softmax',
                         bias_initializer=output_bias),
  ])
    
  return model

In [77]:
model_dnn_sd = make_model2(X_train_sd, y_train) 

In [67]:
model_dnn_sd.compile(
      optimizer=tf.keras.optimizers.Adam(lr=0.001),
      loss=tf.keras.losses.BinaryCrossentropy(),
      metrics=METRICS)

In [92]:
EPOCHS = 100
BATCH_SIZE = 9000

In [72]:
from keras.optimizers import Nadam
from keras.callbacks import LearningRateScheduler, ModelCheckpoint
import keras

In [93]:
reduce_lr = keras.callbacks.ReduceLROnPlateau(moniter='val_precision',
                                              factor=0.1,
                                              patience=5)
nadam = Nadam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, schedule_decay=0.0001)
model_dnn_sd.compile(loss = "categorical_crossentropy",optimizer = "nadam", metrics = METRICS)

history = model_dnn_sd.fit(X_train_sd, y_train, 
                    epochs=EPOCHS, 
                    batch_size=BATCH_SIZE,
                    verbose=2,
                    validation_data=(X_val_sd, y_val),
                    callbacks=[reduce_lr])

Epoch 1/100
98/98 - 5s - loss: 0.0119 - tp: 1016949.0000 - fp: 2580.0000 - tn: 14271666.0000 - fn: 2640.0000 - accuracy: 0.9997 - precision: 0.9975 - recall: 0.9974 - auc: 0.9993 - val_loss: 0.0594 - val_tp: 186732.0000 - val_fp: 1738.0000 - val_tn: 2637024.0000 - val_fn: 1751.0000 - val_accuracy: 0.9988 - val_precision: 0.9908 - val_recall: 0.9907 - val_auc: 0.9967
Epoch 2/100
98/98 - 2s - loss: 0.0240 - tp: 875912.0000 - fp: 3604.0000 - tn: 12310642.0000 - fn: 3677.0000 - accuracy: 0.9994 - precision: 0.9959 - recall: 0.9958 - auc: 0.9994 - val_loss: 0.0169 - val_tp: 187304.0000 - val_fp: 1161.0000 - val_tn: 2637601.0000 - val_fn: 1179.0000 - val_accuracy: 0.9992 - val_precision: 0.9938 - val_recall: 0.9937 - val_auc: 0.9999
Epoch 3/100
98/98 - 2s - loss: 0.0124 - tp: 875744.0000 - fp: 3788.0000 - tn: 12310458.0000 - fn: 3845.0000 - accuracy: 0.9994 - precision: 0.9957 - recall: 0.9956 - auc: 0.9999 - val_loss: 0.0206 - val_tp: 187145.0000 - val_fp: 1325.0000 - val_tn: 2637437.0000 -

In [68]:
baseline_history_sd = model_dnn_sd.fit(
    X_train_sd,
    y_train,
    batch_size=BATCH_SIZE,
    epochs=EPOCHS,
    validation_data=(X_val_sd, y_val))

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100

KeyboardInterrupt: ignored

In [48]:
y_pred_sd = model_dnn_sd.predict(X_test_sd)

In [49]:
display_metrics(y_test_origin, np.argmax(y_pred_sd, axis = 1), labels_d)


Accuracy: 1.00

Micro Precision: 1.00
Micro Recall: 1.00
Micro F1-score: 1.00



  _warn_prf(average, modifier, msg_start, len(result))


Macro Precision: 0.81
Macro Recall: 0.80
Macro F1-score: 0.81

Weighted Precision: 1.00
Weighted Recall: 1.00
Weighted F1-score: 1.00

Classification Report

                            precision    recall  f1-score   support

                    BENIGN       1.00      1.00      1.00    104952
                       Bot       0.90      1.00      0.94       275
                      DDoS       1.00      1.00      1.00     19164
             DoS GoldenEye       1.00      0.99      0.99      1531
                  DoS Hulk       1.00      1.00      1.00     34443
          DoS Slowhttptest       0.99      0.99      0.99       838
             DoS slowloris       0.99      0.99      0.99       913
               FTP-Patator       0.99      0.99      0.99      1136
                Heartbleed       0.00      0.00      0.00         1
              Infiltration       0.88      0.70      0.78        10
                  PortScan       0.99      1.00      1.00     23966
               SSH-Patato

  _warn_prf(average, modifier, msg_start, len(result))


### Save the trained model

In [None]:
model_dnn_sd.save('dnn_sd_sw.h5')

# Training DNN with mixed data normalization

## 1. Load the DNN model trained with normalized dataset

In [None]:
from keras.models import load_model

In [None]:
dnn_n_model = load_model('dnn_n.h5')

In [None]:
dnn_n_model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_10 (Dense)             (None, 256)               20224     
_________________________________________________________________
dense_11 (Dense)             (None, 256)               65792     
_________________________________________________________________
dense_12 (Dense)             (None, 128)               32896     
_________________________________________________________________
dense_13 (Dense)             (None, 64)                8256      
_________________________________________________________________
dense_14 (Dense)             (None, 15)                975       
Total params: 128,143
Trainable params: 128,143
Non-trainable params: 0
_________________________________________________________________


In [None]:
dnn_n_model# update all layers in all models to not be trainable

for layer in dnn_n_model.layers:
		# make trainable
    layer.trainable = True
    print(layer.name)

dense_10
dense_11
dense_12
dense_13
dense_14


In [None]:
layer = dnn_n_model.get_layer('dense_10')
layer.trainable=True

layer = dnn_n_model.get_layer('dense_11')
layer.trainable=False

## 2. Training the model by using standardized dataset

In [None]:
EPOCHS = 80
BATCH_SIZE = 9000

In [None]:
history_sd = dnn_n_model.fit(
    X_train_sd,
    y_train,
    batch_size=BATCH_SIZE,
    epochs=EPOCHS,
    validation_data=(X_val_sd, y_val))

Epoch 1/80
Epoch 2/80
Epoch 3/80
Epoch 4/80
Epoch 5/80
Epoch 6/80
Epoch 7/80
Epoch 8/80
Epoch 9/80
Epoch 10/80
Epoch 11/80
Epoch 12/80
Epoch 13/80
Epoch 14/80
Epoch 15/80
Epoch 16/80
Epoch 17/80
Epoch 18/80
Epoch 19/80
Epoch 20/80
Epoch 21/80
Epoch 22/80
Epoch 23/80
Epoch 24/80
Epoch 25/80
Epoch 26/80
Epoch 27/80
Epoch 28/80
Epoch 29/80
Epoch 30/80
Epoch 31/80
Epoch 32/80
Epoch 33/80
Epoch 34/80
Epoch 35/80
Epoch 36/80
Epoch 37/80
Epoch 38/80
Epoch 39/80
Epoch 40/80
Epoch 41/80
Epoch 42/80
Epoch 43/80
Epoch 44/80
Epoch 45/80
Epoch 46/80
Epoch 47/80
Epoch 48/80
Epoch 49/80
Epoch 50/80
Epoch 51/80
Epoch 52/80
Epoch 53/80
Epoch 54/80
Epoch 55/80
Epoch 56/80
Epoch 57/80
Epoch 58/80
Epoch 59/80
Epoch 60/80
Epoch 61/80
Epoch 62/80
Epoch 63/80
Epoch 64/80
Epoch 65/80
Epoch 66/80
Epoch 67/80
Epoch 68/80
Epoch 69/80
Epoch 70/80
Epoch 71/80
Epoch 72/80
Epoch 73/80
Epoch 74/80
Epoch 75/80
Epoch 76/80
Epoch 77/80
Epoch 78/80
Epoch 79/80
Epoch 80/80


In [None]:
y_pred_mix = dnn_n_model.predict(X_test_sd)

In [None]:
y_pred_mix

array([[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00],
       [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00],
       [3.0891338e-09, 7.7628711e-12, 1.4513332e-22, ..., 3.4873085e-15,
        1.2335191e-19, 3.6827611e-12],
       ...,
       [1.0000000e+00, 1.0418901e-34, 0.0000000e+00, ..., 5.9388984e-32,
        1.4865620e-36, 0.0000000e+00],
       [1.1582623e-08, 5.4412169e-10, 1.2817214e-21, ..., 1.1954169e-12,
        8.6470095e-18, 6.4685195e-11],
       [1.0000000e+00, 3.2562312e-34, 2.7089362e-36, ..., 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00]], dtype=float32)

In [None]:
display_metrics(y_test_origin, np.argmax(y_pred_mix, axis = 1), labels_d)


Accuracy: 0.99

Micro Precision: 0.99
Micro Recall: 0.99
Micro F1-score: 0.99

Macro Precision: 0.81
Macro Recall: 0.78


  _warn_prf(average, modifier, msg_start, len(result))


Macro F1-score: 0.78

Weighted Precision: 0.99
Weighted Recall: 0.99
Weighted F1-score: 0.99

Classification Report



  _warn_prf(average, modifier, msg_start, len(result))


                            precision    recall  f1-score   support

                    BENIGN       1.00      0.99      0.99    105019
                       Bot       0.85      0.90      0.87       280
                      DDoS       1.00      1.00      1.00     19271
             DoS GoldenEye       0.99      0.98      0.99      1542
                  DoS Hulk       1.00      1.00      1.00     34547
          DoS Slowhttptest       0.99      0.98      0.99       828
             DoS slowloris       0.97      0.99      0.98       834
               FTP-Patator       0.99      0.99      0.99      1178
                Heartbleed       1.00      1.00      1.00         2
              Infiltration       0.00      0.00      0.00         1
                  PortScan       0.97      0.99      0.98     23846
               SSH-Patator       0.97      0.98      0.97       826
  Web Attack � Brute Force       0.65      0.78      0.71       209
Web Attack � Sql Injection       0.00      0.00

### Save the trained model

In [None]:
dnn_n_model.save('dnn_mix.h5')

#  **Four Model Ensemble**

Now all three models will be combined in an ensemble. 

Here, all four models are reinstantiated and the best saved weights are loaded.

If we want to reload the models with saved weights, we reload the saved models

In [None]:
results = np.zeros((y_test_origin.shape[0], 15))

In [None]:
y_pred_tree_cat = to_categorical(y_pred_tree, 15)

In [None]:
results = results+y_pred*10+y_pred_n+y_pred_sd+y_pred_mix+y_pred_tree_cat*10

In [None]:
display_metrics(y_test_origin, np.argmax(results, axis = 1) , labels_d)


Accuracy: 1.00

Micro Precision: 1.00
Micro Recall: 1.00
Micro F1-score: 1.00

Macro Precision: 0.98
Macro Recall: 0.92
Macro F1-score: 0.94

Weighted Precision: 1.00
Weighted Recall: 1.00
Weighted F1-score: 1.00

Classification Report

                            precision    recall  f1-score   support

                    BENIGN       1.00      1.00      1.00    105019
                       Bot       0.98      0.96      0.97       280
                      DDoS       1.00      0.99      1.00     19271
             DoS GoldenEye       1.00      0.95      0.97      1542
                  DoS Hulk       1.00      1.00      1.00     34547
          DoS Slowhttptest       1.00      0.99      1.00       828
             DoS slowloris       0.96      0.99      0.98       834
               FTP-Patator       1.00      1.00      1.00      1178
                Heartbleed       1.00      1.00      1.00         2
              Infiltration       1.00      1.00      1.00         1
             

# Pre-trained Models and Weights
If you do not want to train the models, you can use pre-trained models and their weights.

In [None]:
from tensorflow.keras.models import load_model

In [None]:
randomforest_file_name = 'randomforest.sav'
decisiontree_file_name = 'decisiontree.sav'

In [None]:
# load the Random Forest model from disk
randomforest_model = pickle.load(open(randomforest_file_name, 'rb'))
decisiontree_model = pickle.load(open(decisiontree_file_name, 'rb'))
dnn_n_model = load_model('dnn_n.h5',compile = False)
dnn_sd_model = load_model('dnn_sd.h5',compile = False)
dnn_mix_model = load_model('dnn_mix.h5',compile = False) 



### Get the prediction values

In [None]:
y_pred = randomforest_model.predict(X_test_n)

In [None]:
y_pred_tree = decisiontree_model.predict(X_test_n)

In [None]:
y_pred_dnn_n = dnn_n_model.predict(X_test_n)

In [None]:
y_pred_dnn_sd = dnn_sd_model.predict(X_test_n)

In [None]:
y_pred_dnn_mix = dnn_mix_model.predict(X_test_n)

In [None]:
display_metrics(y_test_origin, np.argmax(y_pred_dnn_sd, axis = 1) , labels_d)


Accuracy: 0.99

Micro Precision: 0.99
Micro Recall: 0.99
Micro F1-score: 0.99

Macro Precision: 0.88
Macro Recall: 0.87


  _warn_prf(average, modifier, msg_start, len(result))


Macro F1-score: 0.87

Weighted Precision: 0.99
Weighted Recall: 0.99
Weighted F1-score: 0.99

Classification Report



  _warn_prf(average, modifier, msg_start, len(result))


                            precision    recall  f1-score   support

                    BENIGN       1.00      0.99      0.99    105019
                       Bot       0.86      0.99      0.92       280
                      DDoS       0.98      1.00      0.99     19271
             DoS GoldenEye       1.00      0.94      0.97      1542
                  DoS Hulk       1.00      0.99      0.99     34547
          DoS Slowhttptest       0.99      0.99      0.99       828
             DoS slowloris       0.98      0.97      0.98       834
               FTP-Patator       1.00      0.99      0.99      1178
                Heartbleed       1.00      1.00      1.00         2
              Infiltration       1.00      1.00      1.00         1
                  PortScan       0.96      0.99      0.98     23846
               SSH-Patator       0.98      0.98      0.98       826
  Web Attack � Brute Force       0.75      0.82      0.78       209
Web Attack � Sql Injection       0.00      0.00

### Define a filter for each model

In [None]:
def filter(pred, list):
    for i in list:
        pred[:, i] = 0
    return pred

In [None]:
random_list=[2,4,8,9,13]
tree_list = [2,3,4,6,8,9]
dnn_n_list = [2,4,13]
dnn_sd_list = [4,6,8,9,11,13]

In [None]:
y_pred_tree_cat = to_categorical(y_pred_tree, 15)

In [None]:
random_pred_filter=filter(y_pred, random_list)
tree_pred_filter=filter(y_pred_tree_cat, tree_list)
dnn_n_pred_filter=filter(y_pred_dnn_n, dnn_n_list)
dnn_sd_pred_filter=filter(y_pred_dnn_sd, dnn_sd_list)

In [None]:
dnn_n_pred_filter

array([[2.25684058e-28, 1.23421603e-26, 0.00000000e+00, ...,
        5.93876203e-32, 0.00000000e+00, 1.30338280e-19],
       [3.07559069e-30, 2.64064061e-34, 0.00000000e+00, ...,
        2.03038143e-33, 0.00000000e+00, 9.98061425e-21],
       [6.42797315e-10, 4.76133356e-15, 0.00000000e+00, ...,
        3.35292570e-26, 0.00000000e+00, 1.11194267e-17],
       ...,
       [1.00000000e+00, 3.83469262e-23, 0.00000000e+00, ...,
        1.61605880e-33, 0.00000000e+00, 5.29033394e-24],
       [1.17106474e-07, 5.25993962e-12, 0.00000000e+00, ...,
        7.37398901e-23, 0.00000000e+00, 1.31422943e-15],
       [1.00000000e+00, 1.80899039e-24, 0.00000000e+00, ...,
        1.73941646e-32, 0.00000000e+00, 2.67943355e-26]], dtype=float32)

In [None]:
results = np.zeros((y_test_origin.shape[0], 15))

results = results+y_pred*10 + y_pred_dnn_n + y_pred_dnn_sd + y_pred_dnn_mix + y_pred_tree_cat*10

In [None]:
display_metrics(y_test_origin, np.argmax(results, axis = 1) , labels_d)


Accuracy: 0.81

Micro Precision: 0.81
Micro Recall: 0.81
Micro F1-score: 0.81

Macro Precision: 0.56
Macro Recall: 0.78
Macro F1-score: 0.58

Weighted Precision: 0.76
Weighted Recall: 0.81
Weighted F1-score: 0.78

Classification Report

                            precision    recall  f1-score   support

                    BENIGN       0.94      1.00      0.97    105019
                       Bot       0.95      0.96      0.96       280
                      DDoS       0.88      0.97      0.93     19271
             DoS GoldenEye       0.98      0.96      0.97      1542
                  DoS Hulk       0.00      0.00      0.00     34547
          DoS Slowhttptest       0.29      0.99      0.44       828
             DoS slowloris       0.24      0.98      0.38       834
               FTP-Patator       0.99      1.00      0.99      1178
                Heartbleed       0.05      1.00      0.09         2
              Infiltration       0.00      0.00      0.00         1
             

In [None]:
ensemble = np.zeros((y_test_origin.shape[0], 15))

ensemble = ensemble+random_pred_filter*10+dnn_sd_pred_filter + y_pred_dnn_mix + dnn_n_pred_filter+tree_pred_filter*10

In [None]:
display_metrics(y_test_origin, np.argmax(ensemble, axis = 1) , labels_d)


Accuracy: 0.81

Micro Precision: 0.81
Micro Recall: 0.81
Micro F1-score: 0.81

Macro Precision: 0.56
Macro Recall: 0.78
Macro F1-score: 0.58

Weighted Precision: 0.76
Weighted Recall: 0.81
Weighted F1-score: 0.78

Classification Report

                            precision    recall  f1-score   support

                    BENIGN       0.94      1.00      0.97    105019
                       Bot       0.95      0.96      0.96       280
                      DDoS       0.88      0.97      0.93     19271
             DoS GoldenEye       0.98      0.96      0.97      1542
                  DoS Hulk       0.00      0.00      0.00     34547
          DoS Slowhttptest       0.29      0.99      0.44       828
             DoS slowloris       0.24      0.98      0.38       834
               FTP-Patator       0.99      1.00      0.99      1178
                Heartbleed       0.05      1.00      0.09         2
              Infiltration       0.00      0.00      0.00         1
             

## Conclusion

 