# Defending ML IDS against an evasion attack using adversarial training

In this notebook we implement defence against an evasion attack that targets a ML-based IDS.

The program consists of two main steps:
1. performing an evasion attack: crafting adversarial samples for a Random Forest model;
1. adversarial training: extending an original dataset with correctly labeled adversarial samples and training a new and adversarially robust model on a new training set.

**Summary of the implementation of the main steps of the program:**

*Finding adversarial samples*
1. A Random Forest model is trained on CICIDS2017 dataset (on a subset with web attacks: web_attacks_balanced.csv).
1. The performance of the model is evaluated on a test set.
1. For all samples that are correctly labeled by the model as an attack, a value of the "Total Length of Fwd Packets" feature is modified within the given range.
1. If the model changes its prediction for the sample with the modified "Total Length of Fwd Packets" feature, this sample is adversarial (it misleads the model).
1. The second test set with adversarial samples is formed. The performance of the model is evaluated on this test set. It is expected that the performance will decline: even one adversarial sample enables the opportunity for an attack.

*Defending against the evasion attack*
1. Adversarial samples are labeled as an "attack" and added to the original training and test sets. 
1. A new model is trained on the new training set.
1. The performance of the adversarially trained model is evaluated on the new test set. It is expected that the performance metrics will be close to metrics of the original model before the attack, because the addition of adversarial samples increases the robustness of the model against adversarial attacks.

Training data: https://github.com/fisher85/ml-cybersecurity/blob/master/python-web-attack-detection/datasets/web_attacks_balanced.zip

Training dataset is the balanced dataset based on CICIDS2017: https://www.unb.ca/cic/datasets/ids-2017.html

## Data preprocessing

In [1]:
import math
import pickle

import numpy as np
import pandas as pd
from sklearn import model_selection
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, f1_score, precision_score, recall_score
from sklearn.model_selection import train_test_split

Download the dataset from Github to Google Colab and unzip it.

In [None]:
!wget https://github.com/fisher85/ml-cybersecurity/blob/master/python-web-attack-detection/datasets/web_attacks_balanced.zip?raw=true -O dataset.zip
!unzip -u dataset.zip

Load our dataset. We use the balanced dataset based on CICIDS2017 (see the description of this balanced dataset in the previous work: https://ispranproceedings.elpub.ru/jour/article/view/1348/1147).

In [2]:
df = pd.read_csv('web_attacks_balanced.csv')
df

Unnamed: 0,Flow ID,Source IP,Source Port,Destination IP,Destination Port,Protocol,Timestamp,Flow Duration,Total Fwd Packets,Total Backward Packets,...,min_seg_size_forward,Active Mean,Active Std,Active Max,Active Min,Idle Mean,Idle Std,Idle Max,Idle Min,Label
0,60803,1261,39923.0,1599,53.0,17.0,181,350.0,4.0,4.0,...,32.0,0.0,0.000,0.0,0.0,0.0,0.000,0.0,0.0,BENIGN
1,69607,1265,3480.0,1599,53.0,17.0,181,176.0,2.0,2.0,...,32.0,0.0,0.000,0.0,0.0,0.0,0.000,0.0,0.0,BENIGN
2,33770,1256,16043.0,1599,53.0,17.0,181,151.0,2.0,2.0,...,32.0,0.0,0.000,0.0,0.0,0.0,0.000,0.0,0.0,BENIGN
3,69711,1265,49221.0,1599,53.0,17.0,181,163.0,2.0,2.0,...,32.0,0.0,0.000,0.0,0.0,0.0,0.000,0.0,0.0,BENIGN
4,69659,1265,41529.0,1599,53.0,17.0,181,163.0,2.0,2.0,...,32.0,0.0,0.000,0.0,0.0,0.0,0.000,0.0,0.0,BENIGN
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7262,18877,1257,62969.0,2409,443.0,6.0,134,6037341.0,8.0,6.0,...,20.0,283421.0,0.000,283421.0,283421.0,5753917.0,0.000,5753917.0,5753917.0,BENIGN
7263,41113,1257,63397.0,1599,53.0,17.0,134,157.0,2.0,2.0,...,20.0,0.0,0.000,0.0,0.0,0.0,0.000,0.0,0.0,BENIGN
7264,18825,1257,63000.0,2308,443.0,6.0,134,117745833.0,36.0,34.0,...,20.0,220289.5,682382.033,2387143.0,22952.0,9591863.0,1403991.926,10000000.0,5133889.0,BENIGN
7265,13528,1263,60419.0,1590,53.0,17.0,134,47819.0,1.0,1.0,...,20.0,0.0,0.000,0.0,0.0,0.0,0.000,0.0,0.0,BENIGN


Prepare feature vectors and labels:
* transform categorical labels into numeric form with simple label encoding: "1" for samples with attacks and "0" for benign samples;
* select the 10 most important features (see https://ispranproceedings.elpub.ru/jour/article/view/1348/1147).

In [3]:
# Labels corresponding to attacks are marked as "1", benign labels - as "0".
df['Label'] = df['Label'].apply(lambda x: 0 if x == 'BENIGN' else 1)

# The 10 most important features.
webattack_features = ['Average Packet Size',
                      'Flow Bytes/s',
                      'Max Packet Length',
                      'Fwd IAT Min',
                      'Fwd Packet Length Mean',
                      'Total Length of Fwd Packets',
                      'Flow IAT Mean',
                      'Fwd IAT Std',
                      'Fwd Packet Length Max',
                      'Fwd Header Length']

In [4]:
# Inspect these 10 features.
df[webattack_features]

Unnamed: 0,Average Packet Size,Flow Bytes/s,Max Packet Length,Fwd IAT Min,Fwd Packet Length Mean,Total Length of Fwd Packets,Flow IAT Mean,Fwd IAT Std,Fwd Packet Length Max,Fwd Header Length
0,32.625000,6.628571e+05,29.0,1.0,29.000000,116.0,5.000000e+01,1.131651e+02,29.0,128.0
1,80.000000,1.568182e+06,94.0,3.0,44.000000,88.0,5.866667e+01,0.000000e+00,44.0,64.0
2,80.000000,1.827815e+06,94.0,3.0,44.000000,88.0,5.033333e+01,0.000000e+00,44.0,64.0
3,94.250000,2.000000e+06,112.0,3.0,51.000000,102.0,5.433333e+01,0.000000e+00,51.0,64.0
4,80.000000,1.693252e+06,94.0,3.0,44.000000,88.0,5.433333e+01,0.000000e+00,44.0,64.0
...,...,...,...,...,...,...,...,...,...,...
7262,327.500000,7.594403e+02,1460.0,3.0,46.500000,372.0,4.644108e+05,2.157444e+06,191.0,172.0
7263,69.000000,1.503185e+06,78.0,3.0,40.000000,80.0,5.233333e+01,0.000000e+00,40.0,40.0
7264,471.428571,2.802647e+02,3620.0,3.0,109.666667,3948.0,1.706461e+06,4.652067e+06,901.0,732.0
7265,110.500000,3.555072e+03,119.0,0.0,51.000000,51.0,4.781900e+04,0.000000e+00,51.0,20.0


In [5]:
# Get a target vector of the training set.
y = df['Label'].values
# Get a feature matrix of the training set.
X = df[webattack_features].values
# Show shapes of the target vector and the feature matrix.
print(X.shape, y.shape)

(7267, 10) (7267,)


Split the dataset into a training set and a test set.

In [6]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, shuffle=True, random_state=42)

In [7]:
X_test.shape

(1817, 10)

## Training the initial Random Forest model

Use a previously trained model for repeatability or use the following code to train and save a new Random Forest model.

In [None]:
RFmodel = RandomForestClassifier(max_depth=5, n_estimators=5, max_features=3)
RFmodel.fit(X_train, y_train)

In [None]:
with open('RFmodel.sav', 'wb') as f:
    pickle.dump(RFmodel, f)

Load a previously trained model.

In [8]:
RFmodel = pickle.load(open('RFmodel.sav', 'rb'))

Get the Random Forest model's evaluation metrics for the test data.

In [9]:
y_pred = RFmodel.predict(X_test)
y_pred.shape

(1817,)

In [10]:
matrix = confusion_matrix(y_test, y_pred)
matrix

array([[1249,   16],
       [  37,  515]], dtype=int64)

We use the following function to get evaluation metrics.

In [11]:
def print_metrics(y_eval, y_pred, average='binary'):
    accuracy = accuracy_score(y_eval, y_pred)
    precision = precision_score(y_eval, y_pred, average=average)
    recall = recall_score(y_eval, y_pred, average=average)
    f1 = f1_score(y_eval, y_pred, average=average)

    print('Accuracy =', accuracy)
    print('Precision =', precision)
    print('Recall =', recall)
    print('F1 =', f1)

The initial Random Forest model's evaluation metrics for the test data: <a id='metrics'></a>

In [12]:
print_metrics(y_test, y_pred)

Accuracy = 0.9708310401761144
Precision = 0.9698681732580038
Recall = 0.9329710144927537
F1 = 0.9510618651892891


## Evasion attack

We modify the "Total Length of Fwd Packets" feature of the samples to implement the evasion attack.

First, inspect non-zero values of the "Total Length of Fwd Packets" feature for samples with the "attack" type.

In [13]:
for i in range(0, X_test.shape[0]):
    # The index of the "Total Length of Fwd Packets" feature is 5.
    if (X_test[i, 5] > 0) and (y_test[i] == 1):
        print('#', i, '=>', X_test[i, 5])

# 14 => 383.0
# 47 => 48783.0
# 62 => 43951.0
# 84 => 5927.0
# 94 => 383.0
# 129 => 383.0
# 260 => 460.0
# 281 => 602.0
# 356 => 43951.0
# 530 => 43951.0
# 560 => 43906.0
# 571 => 43951.0
# 646 => 43951.0
# 657 => 383.0
# 715 => 43951.0
# 728 => 43906.0
# 779 => 43951.0
# 817 => 43906.0
# 1166 => 383.0
# 1196 => 341.0
# 1355 => 43951.0
# 1467 => 43951.0
# 1484 => 43906.0
# 1519 => 383.0
# 1524 => 48783.0
# 1585 => 599.0
# 1593 => 43906.0
# 1618 => 43951.0
# 1737 => 602.0
# 1753 => 43906.0
# 1758 => 602.0
# 1759 => 48695.0
# 1804 => 2972.0
# 1810 => 383.0


The following function implements the evasion attack and finds adversarial samples for the given samples. It returns a copy of a given feature matrix with found adversarial samples, which replaced original samples in the matrix, and indices of these samples.

The function works as follows:
* all samples that are correctly labeled by the model as an attack are processed;
* for these samples, a value of the "Total Length of Fwd Packets" feature is modified within the range \[original value, original value + 500);
* if the model changes its prediction for the sample with the modified "Total Length of Fwd Packets" feature, this sample is adversarial. The function prints out the index and the new "Total Length of Fwd Packets" value of this sample.

In [14]:
def evasion_attack(samples, labels, model):
    evasion_samples = samples.copy()
    sample_index = np.empty((0), dtype=int)

    for i in range(0, samples.shape[0]):
        if (labels[i] == 1) and (model.predict(samples[[i]]) == 1):
            evasion_sample = samples[[i]]
            j = math.ceil(samples[i, 5])
            for total_length_fwd_packets in range(j, j + 500):
                evasion_sample[0, 5] = total_length_fwd_packets
                pred = model.predict(evasion_sample)
                if pred[0] < 1:
                    print(i, total_length_fwd_packets)
                    sample_index = np.append(sample_index, i)
                    evasion_samples[i, 5] = total_length_fwd_packets
                    break
    return evasion_samples, sample_index

Find adversarial samples for the test set.

In [15]:
X_test_evasion_attack, evasion_sample_index = evasion_attack(
    X_test, y_test, RFmodel)

14 606
94 606
129 606
260 606
281 606
657 606
1166 606
1519 606
1737 606
1758 606
1810 606


We found adversarial samples for original samples with following indices:

In [16]:
evasion_sample_index

array([  14,   94,  129,  260,  281,  657, 1166, 1519, 1737, 1758, 1810])

Here we can see the difference between an example of an original sample and its adversarial replacement.

In [17]:
print("An original sample from the test set:\n", X_test[[94]])
pred = RFmodel.predict(X_test[[94]])
print("A prediction for the original sample: ", pred[0])

An original sample from the test set:
 [[9.41250000e+01 8.79764779e+01 3.83000000e+02 8.97000000e+02
  9.57500000e+01 3.83000000e+02 1.22272943e+06 2.05109986e+06
  3.83000000e+02 1.36000000e+02]]
A prediction for the original sample:  1


In [18]:
print("An adversarial sample:\n", X_test_evasion_attack[[94]])
y_pred_evasion_attack = RFmodel.predict(X_test_evasion_attack[[94]])
print("A prediction for the adversarial sample: ", y_pred_evasion_attack[0])

An adversarial sample:
 [[9.41250000e+01 8.79764779e+01 3.83000000e+02 8.97000000e+02
  9.57500000e+01 6.06000000e+02 1.22272943e+06 2.05109986e+06
  3.83000000e+02 1.36000000e+02]]
A prediction for the adversarial sample:  0


As we can see, the adversarial sample misleads the model: the classifier changes its answer from "1" (an attack) to "0" (not an attack). It is important to note that the found sample retains its attack capability and is, in fact, an effective adversarial sample: it is feasible to increase the value of the "Total Length of Fwd Packets" feature by padding the payload with zeroes/spaces/etc.

Now, we inspect the model's evaluation metrics for the test data with the addition of adversarial samples.

In [19]:
y_pred_evasion_attack = RFmodel.predict(X_test_evasion_attack)
y_pred_evasion_attack.shape

(1817,)

In [20]:
matrix = confusion_matrix(y_test, y_pred_evasion_attack)
matrix

array([[1249,   16],
       [  48,  504]], dtype=int64)

In [21]:
print_metrics(y_test, y_pred_evasion_attack)

Accuracy = 0.9647771051183269
Precision = 0.9692307692307692
Recall = 0.9130434782608695
F1 = 0.9402985074626865


As we can see, the performance metrics degrade after the attack ([see original metrics of the model](#metrics)), because adversarial samples that are added into the test set mislead the model.

## Defence with adversarial training

To defend our model against the implemented evasion attack, we find adversarial samples for the whole dataset and perform adversarial training with them.

In [22]:
X_evasion_attack, evasion_sample_index_ = evasion_attack(X, y, RFmodel)

495 624
647 606
674 606
731 606
790 606
822 606
848 606
918 606
950 606
979 606
1014 606
1045 606
1069 606
1097 606
1151 606
1198 606
1251 606
1280 606
1353 606
1380 606
1410 606
1440 606
1482 606
1513 606
1553 606
1580 606
1621 606
1675 606
1698 606
1748 606
1803 606
1827 606
1862 606
1969 606
1995 606
2033 606
2080 606
2136 606
2186 606
2211 606
2234 606
2284 606
2311 606
2349 606
2386 606
2422 606
2451 606
2522 606
2582 606
2612 606
2645 606
2674 606
2711 606
2762 606
2797 606
2839 606
2871 606
2905 606
2954 606
3029 606
3059 606
3095 606
3140 606
3180 606
3211 606
3273 606
3332 606
3366 606
3407 606
3448 606
3481 606
5085 606
5087 606
5092 606
5126 606
5130 606
5151 606
5197 606


We found adversarial samples for original samples with following indices:

In [23]:
evasion_sample_index_

array([ 495,  647,  674,  731,  790,  822,  848,  918,  950,  979, 1014,
       1045, 1069, 1097, 1151, 1198, 1251, 1280, 1353, 1380, 1410, 1440,
       1482, 1513, 1553, 1580, 1621, 1675, 1698, 1748, 1803, 1827, 1862,
       1969, 1995, 2033, 2080, 2136, 2186, 2211, 2234, 2284, 2311, 2349,
       2386, 2422, 2451, 2522, 2582, 2612, 2645, 2674, 2711, 2762, 2797,
       2839, 2871, 2905, 2954, 3029, 3059, 3095, 3140, 3180, 3211, 3273,
       3332, 3366, 3407, 3448, 3481, 5085, 5087, 5092, 5126, 5130, 5151,
       5197])

Get the model's evaluation metrics for the original data with the addition of adversarial samples.

In [24]:
y_pred_evasion_attack_ = RFmodel.predict(X_evasion_attack)
y_pred_evasion_attack_.shape

(7267,)

In [25]:
matrix = confusion_matrix(y, y_pred_evasion_attack_)
matrix

array([[5036,   51],
       [ 198, 1982]], dtype=int64)

In [26]:
print_metrics(y, y_pred_evasion_attack_)

Accuracy = 0.9657355167194165
Precision = 0.9749139203148057
Recall = 0.9091743119266055
F1 = 0.9408972228815571


Now, extend the original dataset with adversarial samples that are correctly labeled as an "attack".

In [27]:
X_defence = X
y_defence = y
for i in evasion_sample_index_:
    X_defence = np.vstack([X_defence, X_evasion_attack[i]]) 
    y_defence = np.append(y_defence, 1)

In [28]:
print("Shapes of the original dataset: ", X.shape, y.shape)
print("Shapes of the extended dataset: ", X_defence.shape, y_defence.shape)

Shapes of the original dataset:  (7267, 10) (7267,)
Shapes of the extended dataset:  (7345, 10) (7345,)


Split the extended dataset into a training set and a test set.

In [29]:
X_train_defence, X_test_defence, y_train_defence, y_test_defence = train_test_split(
    X_defence, y_defence, test_size=0.25, shuffle=True, random_state=42)

In [30]:
X_train_defence.shape

(5508, 10)

In [31]:
X_test_defence.shape

(1837, 10)

Finally, train a new model on a new training set.

In [None]:
RFmodel_defence = RandomForestClassifier(
    max_depth=5, n_estimators=5, max_features=3)
RFmodel_defence.fit(X_train_defence, y_train_defence)

Save the adversarially trained model for repeatability.

In [None]:
with open('RFmodel_defence.sav', 'wb') as f:
    pickle.dump(RFmodel_defence, f)

Load the adversarially trained model.

In [32]:
RFmodel_defence = pickle.load(open('RFmodel_defence.sav', 'rb'))

In [33]:
y_pred_defence = RFmodel_defence.predict(X_test_defence)

In [34]:
matrix = confusion_matrix(y_test_defence, y_pred_defence)
matrix

array([[1259,   13],
       [  42,  523]], dtype=int64)

In [35]:
print_metrics(y_test_defence, y_pred_defence)

Accuracy = 0.9700598802395209
Precision = 0.9757462686567164
Recall = 0.9256637168141593
F1 = 0.9500454132606722


As we can see, the performance metrics are nearly restored to their values before the evasion attack ([see original metrics of the model](#metrics)). We may conclude that the implemented defence, i.e. adversarial training, increased the robustness of the model against adversarial attacks.