<a href="https://colab.research.google.com/github/fwangliberty/AIoTDesign-Frontend/blob/master/ensembling_DNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Ensembling DNN, Random Forest and DecisionTree based on Normalization and Standardization Datasets


We will use the same DNN model, Random Forest and Decision Tree to classify network anormalies in CICIDS2017 dataset. More specifically, the dataset has been augmented by adding 7 new connection based features. We will use normalized and standardized datasets to train the DNN model. Random Foresty model is not sensitive to the normalization method. Then each model will be evaluated using the test set with the corresponding normalized dataset. After that, we put all two models in an ensemble and evaluate it. It is expected that the ensemble will perform better on a test set that any single model in the ensemble separately.

There are many different types of ensembles; stacking is one of them. It is one of the more general types and can theoretically represent any other ensemble technique. Stacking involves training a learning algorithm to combine the predictions of several other learning algorithms. For the sake of this example, I will use one of the simplest forms of Stacking, which involves taking an average of outputs of models in the ensemble. Since averaging doesn't take any parameters, there is no need to train this ensemble (only its models).

## Preparing the data
First, import dependencies.

In [None]:
from keras.callbacks import History
from keras.callbacks import ModelCheckpoint, TensorBoard
from keras.engine import training
from keras.layers import Conv2D, MaxPooling2D, GlobalAveragePooling2D, Dropout, Activation, Average
from keras.losses import categorical_crossentropy
from keras.models import Model, Input
from keras.optimizers import Adam
from keras.utils import to_categorical
from tensorflow.python.framework.ops import Tensor
from typing import Tuple, List
import glob
import numpy as np
import os
from os.path import join
import pandas as pd
import numpy as np
import time
import seaborn as sns
import matplotlib.pyplot as plt


### Define Metrics

In [None]:
def display_metrics(y_test, y_pred, label_names):
  print('\nAccuracy: {:.2f}\n'.format(accuracy_score(y_test, y_pred)))

  print('Micro Precision: {:.2f}'.format(precision_score(y_test, y_pred, average='micro')))
  print('Micro Recall: {:.2f}'.format(recall_score(y_test, y_pred, average='micro')))
  print('Micro F1-score: {:.2f}\n'.format(f1_score(y_test, y_pred, average='micro')))

  print('Macro Precision: {:.2f}'.format(precision_score(y_test, y_pred, average='macro')))
  print('Macro Recall: {:.2f}'.format(recall_score(y_test, y_pred, average='macro')))
  print('Macro F1-score: {:.2f}\n'.format(f1_score(y_test, y_pred, average='macro')))

  print('Weighted Precision: {:.2f}'.format(precision_score(y_test, y_pred, average='weighted')))
  print('Weighted Recall: {:.2f}'.format(recall_score(y_test, y_pred, average='weighted')))
  print('Weighted F1-score: {:.2f}'.format(f1_score(y_test, y_pred, average='weighted')))

  print('\nClassification Report\n')
  print(classification_report(y_test, y_pred, target_names=label_names))

In [None]:
def make_value2index(attacks):
    #make dictionary
    attacks = sorted(attacks)
    d = {}
    counter=0
    for attack in attacks:
        d[attack] = counter
        counter+=1
    return d

In [None]:
# chganges label from string to integer/index
def encode_label(Y_str):
    labels_d = make_value2index(np.unique(Y_str))
    Y = [labels_d[y_str] for y_str  in Y_str]
    Y = np.array(Y)
    return np.array(Y)

# 1. Locating CSV files

In [None]:
# All columns
col_names = np.array(['dst sport count', 'src dport count', 'dst src count', 'dport count', 'sport count', 'dst host count','src host count','Source Port', 'Destination Port',
                      'Protocol', 'Flow Duration', 'Total Fwd Packets', 'Total Backward Packets', 'Total Length of Fwd Packets',
                      'Total Length of Bwd Packets', 'Fwd Packet Length Max', 'Fwd Packet Length Min', 'Fwd Packet Length Mean',
                      'Fwd Packet Length Std', 'Bwd Packet Length Max', 'Bwd Packet Length Min', 'Bwd Packet Length Mean', 'Bwd Packet Length Std',
                      'Flow Bytes/s', 'Flow Packets/s', 'Flow IAT Mean', 'Flow IAT Std', 'Flow IAT Max', 'Flow IAT Min', 'Fwd IAT Total',
                      'Fwd IAT Mean', 'Fwd IAT Std', 'Fwd IAT Max', 'Fwd IAT Min', 'Bwd IAT Total', 'Bwd IAT Mean', 'Bwd IAT Std', 'Bwd IAT Max',
                      'Bwd IAT Min', 'Fwd PSH Flags', 'Fwd URG Flags', 'Fwd Header Length', 'Bwd Header Length',
                      'Fwd Packets/s', 'Bwd Packets/s', 'Min Packet Length', 'Max Packet Length', 'Packet Length Mean', 'Packet Length Std',
                      'Packet Length Variance', 'FIN Flag Count', 'SYN Flag Count', 'RST Flag Count', 'PSH Flag Count', 'ACK Flag Count',
                      'URG Flag Count', 'CWE Flag Count', 'ECE Flag Count', 'Down/Up Ratio', 'Average Packet Size', 'Avg Fwd Segment Size',
                      'Avg Bwd Segment Size','Subflow Fwd Packets', 'Subflow Fwd Bytes',
                      'Subflow Bwd Packets', 'Subflow Bwd Bytes', 'Init_Win_bytes_forward', 'Init_Win_bytes_backward',
                      'act_data_pkt_fwd', 'min_seg_size_forward', 'Active Mean', 'Active Std', 'Active Max', 'Active Min', 'Idle Mean',
                      'Idle Std', 'Idle Max', 'Idle Min', 'Label'])

### Option 1. Connect to Google Drive 

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
train_path='/content/drive/My Drive/CICIDS2017/train_set_ext78_2.csv'

In [None]:
validation_path = '/content/drive/My Drive/CICIDS2017/crossval_set_ext78_2.csv'
test_path = '/content/drive/My Drive/CICIDS2017/test_set_ext78_2.csv'

### Option 2. Connect to Local Machine

In [None]:
train_path = '../data/cicids2017clean/train_set_ext78_2.csv'
validation_path = '../data/cicids2017clean/crossval_set_ext78_2.csv'
test_path = '../data/cicids2017clean/test_set_ext78_2.csv'

# 2. Loading CSV Datasets

In [None]:
# load three csv files generated by mlp4nids (Multi-layer perceptron for network intrusion detection )
# first load the train set
df_train = pd.read_csv(train_path,names=col_names, skiprows=1)  

In [None]:
print('Train set size: ', df_train.shape)

Train set size:  (879589, 79)


In [None]:
df_test = pd.read_csv(test_path, names=col_names, skiprows=1)  
print('Test set size: ', df_test.shape)

df_val = pd.read_csv(validation_path,names=col_names, skiprows=1)  
print('Validation set size: ', df_val.shape)

Test set size:  (188483, 79)
Validation set size:  (188484, 79)


# 3. Encoding Datasets

### Encoding train dataset

In [None]:
df_label = df_train['Label']
data = df_train.drop(columns=['Label'])
Xtrain = data.values
y_train = encode_label(df_label.values)

### Encoding test dataset

In [None]:
df_label = df_test['Label']
data = df_test.drop(columns=['Label'])
Xtest = data.values
y_test = encode_label(df_label.values)

### Encoding validation dataset

In [None]:
df_label = df_val['Label']
data = df_val.drop(columns=['Label'])
Xval = data.values
y_val = encode_label(df_label.values)

# 4. Normalization

The values of the datasets are normalized using the Min-Max scaling technique, bringing them all within a range of [0,1].

In [None]:
from sklearn.preprocessing import MinMaxScaler

In [None]:
scaler = MinMaxScaler()
X_train_n = scaler.fit_transform(Xtrain)
X_train_n

array([[0.01010101, 1.        , 1.        , ..., 0.68650794, 0.71416667,
        0.10166667],
       [0.01010101, 0.22222222, 0.12121212, ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.01010101, 0.02020202, ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.12121212, 0.12121212, ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.87878788, 0.87878788, ..., 0.        , 0.        ,
        0.        ],
       [1.        , 0.        , 1.        , ..., 0.        , 0.        ,
        0.        ]])

In [None]:
X_val_n = scaler.fit_transform(Xval)
X_val_n

array([[0.        , 0.32323232, 0.32323232, ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.05050505, 0.05050505, ..., 0.        , 0.        ,
        0.        ],
       [0.02020202, 0.09090909, 0.02020202, ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 1.        , 1.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 1.        , 1.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.96969697, ..., 0.        , 0.        ,
        0.        ]])

In [None]:
X_test_n = scaler.fit_transform(Xtest)
X_test_n

array([[0.        , 0.96969697, 0.96969697, ..., 0.        , 0.71583333,
        0.71583333],
       [0.        , 0.96969697, 0.96969697, ..., 0.        , 0.6975    ,
        0.6975    ],
       [0.        , 0.        , 1.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.44444444, 0.08080808, ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.78787879, ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.28282828, 0.28282828, ..., 0.        , 0.        ,
        0.        ]])

# 4.2 User Defined Normalization

In [182]:
# normalization
def normalize(data):
    data = data.astype(np.float32)
       
    eps = 1e-15

    mask = data==-1
    data[mask]=0
    mean_i = np.mean(data,axis=0)
    min_i = np.min(data,axis=0) #  to leave -1 (missing features) values as is and exclude in normilizing
    max_i = np.max(data,axis=0)

    r = max_i-min_i+eps
    data = (data-mean_i)/r  # zero centered 

    #deal with missing features -1
    data[mask] = 0        
    return data

In [None]:
minmax_scaling(Xtrain, columns=col_norm)

# 5. Standardization

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
scaler = StandardScaler()

X_train_sd = scaler.fit_transform(Xtrain)
X_val_sd = scaler.fit_transform(Xval)
X_test_sd = scaler.fit_transform(Xtest)

X_train_sd

array([[-0.29659729,  1.51710965,  1.13640038, ...,  8.4581104 ,
         2.22140524, -0.0393603 ],
       [-0.29659729, -0.36301636, -0.83451004, ..., -0.13390608,
        -0.45746381, -0.43398332],
       [-0.33534125, -0.87577799, -1.06105146, ..., -0.13390608,
        -0.45746381, -0.43398332],
       ...,
       [-0.33534125, -0.60718856, -0.83451004, ..., -0.13390608,
        -0.45746381, -0.43398332],
       [-0.33534125,  1.224103  ,  0.86455067, ..., -0.13390608,
        -0.45746381, -0.43398332],
       [ 3.50031109, -0.90019522,  1.13640038, ..., -0.13390608,
        -0.45746381, -0.43398332]])

# 6. One-hot Encoding for labels

In [None]:
from tensorflow.keras.utils import to_categorical

In [None]:
y_train_origin = y_train
y_test_origin = y_test
y_val_origin = y_val

In [None]:
y_train = to_categorical(y_train, 15)
y_test = to_categorical(y_test, 15)
y_val = to_categorical(y_val, 15)

# 7.  Define the Metrics

In [None]:
import tensorflow as tf
from sklearn.ensemble import RandomForestClassifier

#importing confusion matrix
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

from sklearn import metrics
from sklearn.metrics import accuracy_score

#importing accuracy_score, precision_score, recall_score, f1_score
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.metrics import mean_squared_error,mean_absolute_error

In [None]:
METRICS = [
      tf.keras.metrics.TruePositives(name='tp'),
      tf.keras.metrics.FalsePositives(name='fp'),
      tf.keras.metrics.TrueNegatives(name='tn'),
      tf.keras.metrics.FalseNegatives(name='fn'), 
      tf.keras.metrics.BinaryAccuracy(name='accuracy'),
      tf.keras.metrics.Precision(name='precision'),
      tf.keras.metrics.Recall(name='recall'),
      tf.keras.metrics.AUC(name='auc'),
]

In [None]:
labels_d = make_value2index(df_test['Label'])

In [None]:
print(labels_d)

{'BENIGN': 105018, 'Bot': 105298, 'DDoS': 124569, 'DoS GoldenEye': 126111, 'DoS Hulk': 160658, 'DoS Slowhttptest': 161486, 'DoS slowloris': 162320, 'FTP-Patator': 163498, 'Heartbleed': 163500, 'Infiltration': 163501, 'PortScan': 187347, 'SSH-Patator': 188173, 'Web Attack � Brute Force': 188382, 'Web Attack � Sql Injection': 188389, 'Web Attack � XSS': 188482}


# First model:  Random Foresty with DecisionTree

### The first model is Random Foresty with DecisionTree.  

In [None]:
randomforest = RandomForestClassifier(n_estimators=10, random_state=10)
randomforest.fit(X_train_n,y_train)
    
y_pred = randomforest.predict(X_test_n)

In [None]:
display_metrics(y_test_origin, np.argmax(y_pred, axis = 1), labels_d)


Accuracy: 0.92

Micro Precision: 0.92
Micro Recall: 0.92
Micro F1-score: 0.92

Macro Precision: 0.91
Macro Recall: 0.75


  _warn_prf(average, modifier, msg_start, len(result))


Macro F1-score: 0.80

Weighted Precision: 0.93
Weighted Recall: 0.92
Weighted F1-score: 0.91

Classification Report



  _warn_prf(average, modifier, msg_start, len(result))


                            precision    recall  f1-score   support

                    BENIGN       0.88      1.00      0.93    105019
                       Bot       0.94      0.18      0.31       280
                      DDoS       1.00      0.91      0.95     19271
             DoS GoldenEye       1.00      0.92      0.96      1542
                  DoS Hulk       1.00      1.00      1.00     34547
          DoS Slowhttptest       1.00      0.99      0.99       828
             DoS slowloris       1.00      0.90      0.95       834
               FTP-Patator       1.00      0.53      0.69      1178
                Heartbleed       1.00      1.00      1.00         2
              Infiltration       1.00      1.00      1.00         1
                  PortScan       1.00      0.51      0.68     23846
               SSH-Patator       1.00      0.99      1.00       826
  Web Attack � Brute Force       0.91      0.75      0.82       209
Web Attack � Sql Injection       0.00      0.00

### Save the model to disk

In [None]:
import pickle

randomforest_file_name = 'randomforest.sav'
pickle.dump(randomforest, open(randomforest_file_name, 'wb'))

# Second Model: Decision Tree Classifier

In [None]:
from sklearn.tree import DecisionTreeClassifier

In [None]:
model_dec = DecisionTreeClassifier()
model_dec.fit(X_train_n, y_train_origin)

DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',
                       max_depth=None, max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort='deprecated',
                       random_state=None, splitter='best')

In [None]:
y_pred_tree = model_dec.predict(X_test_n)

In [None]:
display_metrics(y_test_origin, y_pred_tree, labels_d)


Accuracy: 0.99

Micro Precision: 0.99
Micro Recall: 0.99
Micro F1-score: 0.99

Macro Precision: 0.86
Macro Recall: 0.93
Macro F1-score: 0.88

Weighted Precision: 0.99
Weighted Recall: 0.99
Weighted F1-score: 0.99

Classification Report

                            precision    recall  f1-score   support

                    BENIGN       0.99      1.00      1.00    105019
                       Bot       0.69      0.97      0.81       280
                      DDoS       0.97      0.97      0.97     19271
             DoS GoldenEye       0.95      1.00      0.97      1542
                  DoS Hulk       1.00      0.98      0.99     34547
          DoS Slowhttptest       0.95      0.97      0.96       828
             DoS slowloris       0.88      0.89      0.88       834
               FTP-Patator       1.00      1.00      1.00      1178
                Heartbleed       0.67      1.00      0.80         2
              Infiltration       0.50      1.00      0.67         1
             

### Save the model

In [None]:
decisiontree_file_name = 'decisiontree.sav'
pickle.dump(model_dec, open(decisiontree_file_name, 'wb'))

# Third model: DNN Model

We will train the same DNN model by using the same dataset with different normalization methods.

In [None]:
def make_model(X_train, y_train, output_bias=None):
  if output_bias is not None:
    output_bias = tf.keras.initializers.Constant(output_bias)
  model = tf.keras.Sequential([
      tf.keras.layers.Dense(
          256, activation='relu',
          input_shape=(X_train.shape[-1],)),
      tf.keras.layers.Dense(256, activation ='relu'),
      tf.keras.layers.Dense(128, activation ='relu'),
      tf.keras.layers.Dense(64, activation ='relu'),
      tf.keras.layers.Dense(y_train.shape[-1], activation='softmax',
                         bias_initializer=output_bias),
  ])

  model.compile(
      optimizer=tf.keras.optimizers.Adam(lr=1e-4),
      loss=tf.keras.losses.BinaryCrossentropy(),
      metrics=METRICS)
    
  return model

We use normalized dataset to train this DNN model.

In [None]:
model_dnn_n = make_model(X_train_n, y_train)
model_dnn_sd = make_model(X_train_sd, y_train) 

### Train the first DNN model with normalized dataset

In [None]:
EPOCHS = 100
BATCH_SIZE = 9500

In [None]:
baseline_history_n = model_dnn_n.fit(
    X_train_n,
    y_train,
    batch_size=BATCH_SIZE,
    epochs=EPOCHS,
    validation_data=(X_val_n, y_val))

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

In [None]:
y_pred_n=model_dnn_n.predict(X_test_n)

In [None]:
display_metrics(y_test_origin, np.argmax(y_pred_n, axis = 1), labels_d)


Accuracy: 0.99

Micro Precision: 0.99
Micro Recall: 0.99
Micro F1-score: 0.99

Macro Precision: 0.81
Macro Recall: 0.80


  _warn_prf(average, modifier, msg_start, len(result))


Macro F1-score: 0.80

Weighted Precision: 0.99
Weighted Recall: 0.99
Weighted F1-score: 0.99

Classification Report



  _warn_prf(average, modifier, msg_start, len(result))


                            precision    recall  f1-score   support

                    BENIGN       1.00      0.99      0.99    105019
                       Bot       0.87      0.97      0.92       280
                      DDoS       0.98      1.00      0.99     19271
             DoS GoldenEye       1.00      0.95      0.97      1542
                  DoS Hulk       1.00      0.99      0.99     34547
          DoS Slowhttptest       0.99      0.99      0.99       828
             DoS slowloris       0.98      0.98      0.98       834
               FTP-Patator       0.99      0.99      0.99      1178
                Heartbleed       1.00      1.00      1.00         2
              Infiltration       0.00      0.00      0.00         1
                  PortScan       0.97      0.99      0.98     23846
               SSH-Patator       0.99      0.98      0.98       826
  Web Attack � Brute Force       0.75      0.78      0.76       209
Web Attack � Sql Injection       0.00      0.00

### Save the model

In [None]:
model_dnn_n.save('dnn_n.h5')

### Train the second DNN model with standardized dataset

In [None]:
EPOCHS = 80
BATCH_SIZE = 10000

In [None]:
baseline_history_sd = model_dnn_sd.fit(
    X_train_sd,
    y_train,
    batch_size=BATCH_SIZE,
    epochs=EPOCHS,
    validation_data=(X_val_sd, y_val))

Epoch 1/80
Epoch 2/80
Epoch 3/80
Epoch 4/80
Epoch 5/80
Epoch 6/80
Epoch 7/80
Epoch 8/80
Epoch 9/80
Epoch 10/80
Epoch 11/80
Epoch 12/80
Epoch 13/80
Epoch 14/80
Epoch 15/80
Epoch 16/80
Epoch 17/80
Epoch 18/80
Epoch 19/80
Epoch 20/80
Epoch 21/80
Epoch 22/80
Epoch 23/80
Epoch 24/80
Epoch 25/80
Epoch 26/80
Epoch 27/80
Epoch 28/80
Epoch 29/80
Epoch 30/80
Epoch 31/80
Epoch 32/80
Epoch 33/80
Epoch 34/80
Epoch 35/80
Epoch 36/80
Epoch 37/80
Epoch 38/80
Epoch 39/80
Epoch 40/80
Epoch 41/80
Epoch 42/80
Epoch 43/80
Epoch 44/80
Epoch 45/80
Epoch 46/80
Epoch 47/80
Epoch 48/80
Epoch 49/80
Epoch 50/80
Epoch 51/80
Epoch 52/80
Epoch 53/80
Epoch 54/80
Epoch 55/80
Epoch 56/80
Epoch 57/80
Epoch 58/80
Epoch 59/80
Epoch 60/80
Epoch 61/80
Epoch 62/80
Epoch 63/80
Epoch 64/80
Epoch 65/80
Epoch 66/80
Epoch 67/80
Epoch 68/80
Epoch 69/80
Epoch 70/80
Epoch 71/80
Epoch 72/80
Epoch 73/80
Epoch 74/80
Epoch 75/80
Epoch 76/80
Epoch 77/80
Epoch 78/80
Epoch 79/80
Epoch 80/80


In [None]:
y_pred_sd = model_dnn_sd.predict(X_test_sd)

In [None]:
display_metrics(y_test_origin, np.argmax(y_pred_sd, axis = 1), labels_d)


Accuracy: 0.99

Micro Precision: 0.99
Micro Recall: 0.99
Micro F1-score: 0.99

Macro Precision: 0.74
Macro Recall: 0.71


  _warn_prf(average, modifier, msg_start, len(result))


Macro F1-score: 0.72

Weighted Precision: 0.99
Weighted Recall: 0.99
Weighted F1-score: 0.99

Classification Report



  _warn_prf(average, modifier, msg_start, len(result))


                            precision    recall  f1-score   support

                    BENIGN       0.99      0.99      0.99    105019
                       Bot       0.83      0.98      0.90       280
                      DDoS       1.00      1.00      1.00     19271
             DoS GoldenEye       0.99      0.99      0.99      1542
                  DoS Hulk       1.00      1.00      1.00     34547
          DoS Slowhttptest       0.99      0.99      0.99       828
             DoS slowloris       0.99      0.99      0.99       834
               FTP-Patator       0.99      0.79      0.88      1178
                Heartbleed       0.00      0.00      0.00         2
              Infiltration       0.00      0.00      0.00         1
                  PortScan       0.94      1.00      0.97     23846
               SSH-Patator       0.93      0.48      0.63       826
  Web Attack � Brute Force       0.78      0.81      0.79       209
Web Attack � Sql Injection       0.00      0.00

### Save the trained model

In [None]:
model_dnn_sd.save('dnn_sd.h5')

# Training DNN with mixed data normalization

## 1. Load the DNN model trained with normalized dataset

In [None]:
from keras.models import load_model

In [174]:
dnn_n_model = load_model('dnn_n.h5')

In [175]:
dnn_n_model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_10 (Dense)             (None, 256)               20224     
_________________________________________________________________
dense_11 (Dense)             (None, 256)               65792     
_________________________________________________________________
dense_12 (Dense)             (None, 128)               32896     
_________________________________________________________________
dense_13 (Dense)             (None, 64)                8256      
_________________________________________________________________
dense_14 (Dense)             (None, 15)                975       
Total params: 128,143
Trainable params: 128,143
Non-trainable params: 0
_________________________________________________________________


In [176]:
# update all layers in all models to not be trainable

for layer in dnn_n_model.layers:
		# make trainable
    layer.trainable = True
    print(layer.name)

dense_10
dense_11
dense_12
dense_13
dense_14


In [177]:
layer = dnn_n_model.get_layer('dense_10')
layer.trainable=False

layer = dnn_n_model.get_layer('dense_11')
layer.trainable=False

## 2. Training the model by using standardized dataset

In [178]:
EPOCHS = 80
BATCH_SIZE = 9000

In [179]:
history_sd = dnn_n_model.fit(
    X_train_sd,
    y_train,
    batch_size=BATCH_SIZE,
    epochs=EPOCHS,
    validation_data=(X_val_sd, y_val))

Epoch 1/80
Epoch 2/80
Epoch 3/80
Epoch 4/80
Epoch 5/80
Epoch 6/80
Epoch 7/80
Epoch 8/80
Epoch 9/80
Epoch 10/80
Epoch 11/80
Epoch 12/80
Epoch 13/80
Epoch 14/80
Epoch 15/80
Epoch 16/80
Epoch 17/80
Epoch 18/80
Epoch 19/80
Epoch 20/80
Epoch 21/80
Epoch 22/80
Epoch 23/80
Epoch 24/80
Epoch 25/80
Epoch 26/80
Epoch 27/80
Epoch 28/80
Epoch 29/80
Epoch 30/80
Epoch 31/80
Epoch 32/80
Epoch 33/80
Epoch 34/80
Epoch 35/80
Epoch 36/80
Epoch 37/80
Epoch 38/80
Epoch 39/80
Epoch 40/80
Epoch 41/80
Epoch 42/80
Epoch 43/80
Epoch 44/80
Epoch 45/80
Epoch 46/80
Epoch 47/80
Epoch 48/80
Epoch 49/80
Epoch 50/80
Epoch 51/80
Epoch 52/80
Epoch 53/80
Epoch 54/80
Epoch 55/80
Epoch 56/80
Epoch 57/80
Epoch 58/80
Epoch 59/80
Epoch 60/80
Epoch 61/80
Epoch 62/80
Epoch 63/80
Epoch 64/80
Epoch 65/80
Epoch 66/80
Epoch 67/80
Epoch 68/80
Epoch 69/80
Epoch 70/80
Epoch 71/80
Epoch 72/80
Epoch 73/80
Epoch 74/80
Epoch 75/80
Epoch 76/80
Epoch 77/80
Epoch 78/80
Epoch 79/80
Epoch 80/80


In [180]:
y_pred_mix = dnn_n_model.predict(X_test_sd)

In [None]:
y_pred_mix

array([[5.5653794e-13, 4.0668070e-31, 6.2066779e-16, ..., 2.2391216e-15,
        5.3001617e-20, 1.0403518e-14],
       [1.9565431e-18, 1.0100094e-31, 2.6302912e-22, ..., 1.4751772e-16,
        4.0375356e-21, 1.4432327e-16],
       [4.9642357e-10, 1.3548063e-13, 8.1766635e-11, ..., 3.0086602e-13,
        1.7612661e-14, 1.9550812e-10],
       ...,
       [1.0000000e+00, 1.5166569e-36, 7.0435135e-36, ..., 1.4443171e-32,
        3.0466214e-29, 1.1239785e-30],
       [8.9370695e-07, 1.7262206e-10, 6.1920286e-10, ..., 1.7661256e-10,
        1.6983684e-11, 1.2976760e-08],
       [1.0000000e+00, 2.4259917e-33, 2.2282519e-35, ..., 4.5881440e-30,
        8.0983207e-28, 2.6309154e-28]], dtype=float32)

In [181]:
display_metrics(y_test_origin, np.argmax(y_pred_mix, axis = 1), labels_d)


Accuracy: 0.99

Micro Precision: 0.99
Micro Recall: 0.99
Micro F1-score: 0.99

Macro Precision: 0.74
Macro Recall: 0.73


  _warn_prf(average, modifier, msg_start, len(result))


Macro F1-score: 0.74

Weighted Precision: 0.99
Weighted Recall: 0.99
Weighted F1-score: 0.99

Classification Report



  _warn_prf(average, modifier, msg_start, len(result))


                            precision    recall  f1-score   support

                    BENIGN       1.00      0.99      0.99    105019
                       Bot       0.88      0.96      0.92       280
                      DDoS       1.00      1.00      1.00     19271
             DoS GoldenEye       0.99      0.99      0.99      1542
                  DoS Hulk       1.00      1.00      1.00     34547
          DoS Slowhttptest       1.00      0.99      0.99       828
             DoS slowloris       0.98      0.99      0.98       834
               FTP-Patator       1.00      0.99      0.99      1178
                Heartbleed       0.00      0.00      0.00         2
              Infiltration       0.00      0.00      0.00         1
                  PortScan       0.96      1.00      0.98     23846
               SSH-Patator       0.96      0.98      0.97       826
  Web Attack � Brute Force       0.70      0.69      0.70       209
Web Attack � Sql Injection       0.00      0.00

#  **Four Model Ensemble**

Now all three models will be combined in an ensemble. 

Here, all four models are reinstantiated and the best saved weights are loaded.

If we want to reload the models with saved weights, we reload the saved models

In [None]:
from keras.models import load_model

In [None]:
randomforest_file_name = 'randomforest.sav'
decisiontree_file_name = 'decisiontree.sav'

In [None]:
# load the Random Forest model from disk
randomforest_model = pickle.load(open(randomforest_file_name, 'rb'))
decisiontree_model = pickle.load(open(decisiontree_file_name, 'rb'))
dnn_n_model = load_model('dnn_n.h5')
dnn_sd_model = load_model('dnn_sd.h5')

In [None]:
models = [randomforest_model, dnn_n_model, dnn_sd_model]

Ensemble model definition is very straightforward. It uses the same input layer thas is shared between all previous models. In the top layer, the ensemble computes the average of three models' outputs by using `Average()` merge layer.

In [None]:
def ensemble(models: List [training.Model], model_input: Tensor) -> training.Model:
    
    outputs = [model.outputs[0] for model in models]
    y = Average()(outputs)
    
    model = Model(model_input, y, name='ensemble')
    
    return model

In [None]:
result = randomforest_model.score(X_test, Y_test)
print(result)

In [None]:
ensemble_model = ensemble(models, model_input)

As expected, the ensemble has a lower error rate than any single model.

In [None]:
evaluate_error(ensemble_model)

0.2049

# end of my code

## Conclusion

To reiterate what was said in the introduction: every model has its own weaknesses. The reasoning behind using an ensemble is that by stacking different models representing different hypotheses about the data, we can find a better hypothesis that is not in the hypothesis space of the models from which the ensemble is built.

By using a very basic ensemble, a much lower error rate was achieved than when a single model was used. This proves effectiveness of ensembling.

Of course, there are some practical considerations to keep in mind when using an ensemble for your machine learning task. Since ensembling means stacking multiple models together, it also means that the input data needs to be forward-propagated for each model. This increases the amount of compute that needs to be performed and, consequently, evaluation (predicition) time. Increased evaluation time is not critical if you use an ensemble in research or in a Kaggle competition. However, it is a very critical factor when designing a commercial product. Another consideration is increased size of the final model which, again, might be a limiting factor for ensemble use in a commercial product.