# Comparison of adversarial attacks 

In this notebook we compare the effectiveness of 4 adversarial attacks that target a ML-based IDS.

The following attacks are compared:
- HopSkipJump;
- ZOO;
- SignOPT;
- LT-Attack.

We use a Scikit-learn version of a Random Forest classifier with the Adversarial Robustness Toolbox (ART) library implementations of ZOO, HSJA, and SignOPT attacks and XGBoost version of a Random Forest classifier with LT-Attack.

Training data: https://github.com/fisher85/ml-cybersecurity/blob/master/python-web-attack-detection/datasets/web_attacks_balanced.zip

Training dataset is the balanced dataset based on CICIDS2017: https://www.unb.ca/cic/datasets/ids-2017.html
    
Sources:
- Chen P.-Y., Zhang H., Sharma Y., Yi J., Hsieh C.-J. ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models. Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security (AISec '17). Association for Computing Machinery, New York, NY, USA, 2017, pp. 15–26.
- J. Chen, M. I. Jordan and M. J. Wainwright, "HopSkipJumpAttack: A Query-Efficient Decision-Based Attack," 2020 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 2020, pp. 1277-1294, doi: 10.1109/SP40000.2020.00045.
- Cheng M., Singh S., Chen P.H., Chen P.-Y., Liu S., Hsieh C.-J. Sign-OPT: A Query-Efficient Hard-label Adversarial Attack // International Conference on Learning Representations. 2020, 16 p.
- https://adversarial-robustness-toolbox.readthedocs.io/en/latest/modules/attacks/evasion.htm
- Zhang C., Zhang H., Hsieh C.-J. An Efficient Adversarial Attack for Tree Ensembles. Advances in Neural Information Processing Systems (NeurIPS), vol. 33, 2020, pp. 16165-16176.
- https://github.com/chong-z/tree-ensemble-attack

In [None]:
%pip install adversarial-robustness-toolbox

## Data preprocessing

In [1]:
import pickle
import time
import warnings

from art.attacks.evasion import HopSkipJump, SignOPTAttack, ZooAttack
from art.estimators.classification import SklearnClassifier
import numpy as np
import pandas as pd
from sklearn.datasets import dump_svmlight_file
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, f1_score, precision_score, recall_score
from sklearn.model_selection import train_test_split
import sklearn.preprocessing
import xgboost as xgb

warnings.filterwarnings('ignore')

Download the dataset from Github to Google Colab and unzip it.

In [None]:
!wget https://github.com/fisher85/ml-cybersecurity/blob/master/python-web-attack-detection/datasets/web_attacks_balanced.zip?raw=true -O dataset.zip
!unzip -u dataset.zip

Load our dataset. We use the balanced dataset based on CICIDS2017 (see the description of this balanced dataset in the previous work: https://ispranproceedings.elpub.ru/jour/article/view/1348/1147).

In [2]:
df = pd.read_csv('web_attacks_balanced.csv')
df

Unnamed: 0,Flow ID,Source IP,Source Port,Destination IP,Destination Port,Protocol,Timestamp,Flow Duration,Total Fwd Packets,Total Backward Packets,...,min_seg_size_forward,Active Mean,Active Std,Active Max,Active Min,Idle Mean,Idle Std,Idle Max,Idle Min,Label
0,60803,1261,39923.0,1599,53.0,17.0,181,350.0,4.0,4.0,...,32.0,0.0,0.000,0.0,0.0,0.0,0.000,0.0,0.0,BENIGN
1,69607,1265,3480.0,1599,53.0,17.0,181,176.0,2.0,2.0,...,32.0,0.0,0.000,0.0,0.0,0.0,0.000,0.0,0.0,BENIGN
2,33770,1256,16043.0,1599,53.0,17.0,181,151.0,2.0,2.0,...,32.0,0.0,0.000,0.0,0.0,0.0,0.000,0.0,0.0,BENIGN
3,69711,1265,49221.0,1599,53.0,17.0,181,163.0,2.0,2.0,...,32.0,0.0,0.000,0.0,0.0,0.0,0.000,0.0,0.0,BENIGN
4,69659,1265,41529.0,1599,53.0,17.0,181,163.0,2.0,2.0,...,32.0,0.0,0.000,0.0,0.0,0.0,0.000,0.0,0.0,BENIGN
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7262,18877,1257,62969.0,2409,443.0,6.0,134,6037341.0,8.0,6.0,...,20.0,283421.0,0.000,283421.0,283421.0,5753917.0,0.000,5753917.0,5753917.0,BENIGN
7263,41113,1257,63397.0,1599,53.0,17.0,134,157.0,2.0,2.0,...,20.0,0.0,0.000,0.0,0.0,0.0,0.000,0.0,0.0,BENIGN
7264,18825,1257,63000.0,2308,443.0,6.0,134,117745833.0,36.0,34.0,...,20.0,220289.5,682382.033,2387143.0,22952.0,9591863.0,1403991.926,10000000.0,5133889.0,BENIGN
7265,13528,1263,60419.0,1590,53.0,17.0,134,47819.0,1.0,1.0,...,20.0,0.0,0.000,0.0,0.0,0.0,0.000,0.0,0.0,BENIGN


The dataset contains 4 classes:

In [3]:
df['Label'].unique()

array(['BENIGN', 'Web Attack – Brute Force', 'Web Attack – XSS',
       'Web Attack – Sql Injection'], dtype=object)

Transform categorical labels into numeric form with simple label encoding.

In [4]:
df['Label'] = df['Label'].apply(lambda x: 0 if x == 'BENIGN'
                                else 1 if x == 'Web Attack – XSS'
                                else 2 if x == 'Web Attack – Brute Force'
                                else 3 if x == 'Web Attack – Sql Injection'
                                else 4)
df['Label'].unique()

array([0, 2, 1, 3])

Select the 10 most important features (see https://ispranproceedings.elpub.ru/jour/article/view/1348/1147)

In [5]:
webattack_features = ['Average Packet Size',
                      'Flow Bytes/s',
                      'Max Packet Length',
                      'Fwd IAT Min',
                      'Fwd Packet Length Mean',
                      'Total Length of Fwd Packets',
                      'Flow IAT Mean',
                      'Fwd IAT Std',
                      'Fwd Packet Length Max',
                      'Fwd Header Length']
df[webattack_features]

Unnamed: 0,Average Packet Size,Flow Bytes/s,Max Packet Length,Fwd IAT Min,Fwd Packet Length Mean,Total Length of Fwd Packets,Flow IAT Mean,Fwd IAT Std,Fwd Packet Length Max,Fwd Header Length
0,32.625000,6.628571e+05,29.0,1.0,29.000000,116.0,5.000000e+01,1.131651e+02,29.0,128.0
1,80.000000,1.568182e+06,94.0,3.0,44.000000,88.0,5.866667e+01,0.000000e+00,44.0,64.0
2,80.000000,1.827815e+06,94.0,3.0,44.000000,88.0,5.033333e+01,0.000000e+00,44.0,64.0
3,94.250000,2.000000e+06,112.0,3.0,51.000000,102.0,5.433333e+01,0.000000e+00,51.0,64.0
4,80.000000,1.693252e+06,94.0,3.0,44.000000,88.0,5.433333e+01,0.000000e+00,44.0,64.0
...,...,...,...,...,...,...,...,...,...,...
7262,327.500000,7.594403e+02,1460.0,3.0,46.500000,372.0,4.644108e+05,2.157444e+06,191.0,172.0
7263,69.000000,1.503185e+06,78.0,3.0,40.000000,80.0,5.233333e+01,0.000000e+00,40.0,40.0
7264,471.428571,2.802647e+02,3620.0,3.0,109.666667,3948.0,1.706461e+06,4.652067e+06,901.0,732.0
7265,110.500000,3.555072e+03,119.0,0.0,51.000000,51.0,4.781900e+04,0.000000e+00,51.0,20.0


Get a target vector and a feature matrix of the training set.

In [6]:
y = df['Label'].values
X = df[webattack_features].values

print(X.shape, y.shape)

(7267, 10) (7267,)


Split the dataset into a training set and a test set.

In [7]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, shuffle=True, random_state=42)

LT-attack implementation uses normalized data in LIBSVM format as input. We need to perform feature scaling and save the normalized training and test sets in LIBSVM format for LT-attack.

In our case, the min-max normalization is used: all selected features are transformed into the range [0, 1].

Uncomment and use the following code to get the normalized training and test sets in LIBSVM format or download an archive with the already prepared data.

In [8]:
#scaler = sklearn.preprocessing.MinMaxScaler()

#dtrain = scaler.fit_transform(X_train)
#dtest = scaler.fit_transform(X_test)

#dump_svmlight_file(dtest, y_test, 'web_attacks_libsvm_normalized.test',
#                   zero_based=True, comment=None, query_id=None, multilabel=False)
#dump_svmlight_file(dtrain, y_train, 'web_attacks_libsvm_normalized.train',
#                   zero_based=True, comment=None, query_id=None, multilabel=False)

Download the archive with the already prepared data from Github to Google Colab and unzip it. This archive also contains saved models, instances of the attacks and all necessary data for the LT-Attack.

In [None]:
!wget https://github.com/fisher85/ml-cybersecurity/blob/master/adversarial-attacks/data_for_comparison.zip?raw=true -O data_for_comparison.zip
!unzip -u data_for_comparison.zip

## Training a Random Forest model for ZOO, HSJA, SignOPT

We use a Scikit-learn version of a Random Forest classifier with ART implementations of ZOO, HSJA, and SignOPT attacks.

Use a previously trained model for repeatability or uncomment and use the following code to train and save a new Random Forest model.

In [None]:
#model = RandomForestClassifier(n_estimators=10, criterion='gini', max_depth=None, min_samples_split=2,
#                               min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto',
#                               max_leaf_nodes=None, min_impurity_decrease=0.0,
#                               bootstrap=True, oob_score=False, n_jobs=None, random_state=None, verbose=0,
#                               warm_start=False, class_weight=None)
#model.fit(X=X_train, y=y_train)

In [None]:
#with open('model_for_comparison.sav', 'wb') as f:
#    pickle.dump(model, f)

Load the previously trained model.

In [9]:
model = pickle.load(open('model_for_comparison.sav', 'rb'))

Get the Random Forest model's evaluation metrics for the test data.

In [10]:
y_pred = model.predict(X_test)
y_pred.shape

(1817,)

In [11]:
matrix = confusion_matrix(y_test, y_pred)
matrix

array([[1245,    1,   19,    0],
       [   6,   57,   94,    0],
       [  24,   92,  274,    0],
       [   1,    0,    2,    2]])

We use the following function to get evaluation metrics.

In [12]:
def print_metrics(y_eval, y_pred, average='weighted'):
    accuracy = accuracy_score(y_eval, y_pred)
    precision = precision_score(y_eval, y_pred, average=average)
    recall = recall_score(y_eval, y_pred, average=average)
    f1 = f1_score(y_eval, y_pred, average=average)

    print('Accuracy =', accuracy)
    print('Precision =', precision)
    print('Recall =', recall)
    print('F1 =', f1)

In [13]:
print_metrics(y_test, y_pred)

Accuracy = 0.868464501926252
Precision = 0.8660603255692124
Recall = 0.868464501926252
F1 = 0.8668788881263884


Create an ART classifier for the trained model.

In [14]:
art_classifier = SklearnClassifier(model=model)

## ZOO attack

Use a previously created attack instance for repeatability or uncomment and use the following code to create and save a ZOO attack instance.

In [None]:
#zoo = ZooAttack(classifier=art_classifier, confidence=0.0, targeted=False, learning_rate=1e-1, max_iter=100,
#                binary_search_steps=20, initial_const=1e-3, abort_early=True, use_resize=False,
#                use_importance=False, nb_parallel=10, batch_size=1, variable_h=0.25)

In [None]:
#with open('zoo_for_comparison.sav', 'wb') as f:
#    pickle.dump(zoo, f)

Load the previously created attack instance.

In [15]:
zoo = pickle.load(open('zoo_for_comparison.sav', 'rb'))

Generate adversarial samples and measure elapsed time. This step may take some time, around 8 minutes or more depending on the hardware performance.

In [16]:
start_time = time.time()
X_test_adv = zoo.generate(X_test)
print(f'Total time: {time.time() - start_time}')

ZOO:   0%|          | 0/1817 [00:00<?, ?it/s]

Total time: 470.82162594795227


Get predictions for the generated samples.

In [17]:
y_pred_adv = model.predict(X_test_adv)
y_pred_adv.shape

(1817,)

The dataset with generated samples contains samples that may change the model's predictions between different classes. We only need the samples that change the prediction from one of the attack classes (1, 2, 3) to a benign class (0).

Here and further we use the following function to evaluate the model's performance on the test data with relevant crafted adversarial samples. As we already have predictions for the original test set and for the generated test set, we do not need to form the actual dataset with relevant adversarial samples for performance evaluation. We skip the step of forming the actual dataset with relevant samples and work only with previously predicted labels.

In [18]:
def process_relevant_samples(labels, pred_labels, adv_labels):
    # Count all adversarial samples.
    print(f'Generated {np.count_nonzero(adv_labels != pred_labels)} adversarial samples'
          f' from {labels.shape[0]} original samples.')

    # Replace predictions for the irrelevant samples with original predictions for the test set.
    relevant_examples_mask = (adv_labels != pred_labels) & (
        pred_labels != 0) & (adv_labels == 0)
    adv_labels[~relevant_examples_mask] = pred_labels[~relevant_examples_mask]

    # Count only relevant adversarial samples.
    print(f'Generated {np.count_nonzero(adv_labels != pred_labels)} relevant adversarial samples'
          f' from {labels.shape[0]} original samples.')

    # Get model's evaluation metrics.
    matrix = confusion_matrix(labels, adv_labels)
    print('Confusion matrix:\n', matrix)
    print_metrics(labels, adv_labels)

In [19]:
process_relevant_samples(y_test, y_pred, y_pred_adv)

Generated 25 adversarial samples from 1817 original samples.
Generated 12 relevant adversarial samples from 1817 original samples.
Confusion matrix:
 [[1251    1   13    0]
 [   7   57   93    0]
 [  28   92  270    0]
 [   2    0    1    2]]
Accuracy = 0.8695652173913043
Precision = 0.8655096637937812
Recall = 0.8695652173913043
F1 = 0.8670681442159682


## HSJA attack

Use a previously created attack instance for repeatability or uncomment and use the following code to create and save a HopSkipJump attack instance.

In [None]:
#hsja = HopSkipJump(classifier=art_classifier, batch_size=64, targeted=False,
#                   norm=2, max_iter=50, max_eval=10000,
#                   init_eval=100, init_size=100, verbose=True)

In [None]:
#with open('hsja_for_comparison.sav', 'wb') as f:
#    pickle.dump(hsja, f)

Load the previously created attack instance.

In [20]:
hsja = pickle.load(open('hsja_for_comparison.sav', 'rb'))

Generate adversarial samples and measure elapsed time.  This step may take some time, around 8 minutes or more depending on the hardware performance.

In [21]:
start_time = time.time()
X_test_adv = hsja.generate(X_test)
print(f'Total time: {time.time() - start_time}')

HopSkipJump:   0%|          | 0/1817 [00:00<?, ?it/s]

Total time: 509.2923903465271


Get predictions for the generated samples.

In [22]:
y_pred_adv = model.predict(X_test_adv)
y_pred_adv.shape

(1817,)

Evaluate the model's performance on the test data with relevant crafted adversarial samples.

In [23]:
process_relevant_samples(y_test, y_pred, y_pred_adv)

Generated 674 adversarial samples from 1817 original samples.
Generated 458 relevant adversarial samples from 1817 original samples.
Confusion matrix:
 [[1265    0    0    0]
 [ 130   16   11    0]
 [ 335   31   24    0]
 [   4    0    0    1]]
Accuracy = 0.7187671986791414
Precision = 0.6872466525663968
Recall = 0.7187671986791414
F1 = 0.6260393875990758


## SignOPT attack

Use a previously created attack instance for repeatability or uncomment and use the following code to create and save a SignOPT attack instance.

In [None]:
#signopt = SignOPTAttack(estimator=art_classifier, targeted=False, epsilon=0.001,
#                        num_trial=100, max_iter=1000, query_limit=20000, k=200,
#                        alpha=0.2, beta=0.001, eval_perform=False, batch_size=64, verbose=False)
#signopt.clip_min = None
#signopt.clip_max = None

In [None]:
#with open('signopt_for_comparison.sav', 'wb') as f:
#    pickle.dump(signopt, f)

Load the previously created attack instance.

In [24]:
signopt = pickle.load(open('signopt_for_comparison.sav', 'rb'))

Generate adversarial samples and measure elapsed time.  This step may take some considerable time, around 1 hour or more depending on the hardware performance.

In [25]:
start_time = time.time()
X_test_adv = signopt.generate(X_test)
print(f'Total time: {time.time() - start_time}')

Total time: 3227.4964702129364


Get predictions for the generated samples.

In [26]:
y_pred_adv = model.predict(X_test_adv)
y_pred_adv.shape

(1817,)

Evaluate the model's performance on the test data with relevant crafted adversarial samples.

In [27]:
process_relevant_samples(y_test, y_pred, y_pred_adv)

Generated 269 adversarial samples from 1817 original samples.
Generated 102 relevant adversarial samples from 1817 original samples.
Confusion matrix:
 [[1265    0    0    0]
 [  22   51   84    0]
 [  88   74  228    0]
 [   3    0    0    2]]
Accuracy = 0.8508530544854155
Precision = 0.833969360490705
Recall = 0.8508530544854155
F1 = 0.8386864812412756


## LT-Attack

LT-Attack implementation works with XGBoost models and provides its own script for training them (https://github.com/chong-z/tree-ensemble-attack/blob/main/random_forest/train_random_forest.py).

We used existing examples of multiclass datasets from this script as a template and changed the following parameters:
* paths to the test set and the training set in LIBSVM format ("test_path": "raw_data/web_attacks_libsvm_normalized.test", "train_path": "raw_data/web_attacks_libsvm_normalized.train");
* path to the trained model ("save_path": "models/web_attacks_rf.json");
* number of classses ("num_class": 4);
* training algorithm ('tree_method': 'hist'). We used 'hist' instead of 'gpu_hist' to train without GPU.

To get the trained Random Forest model, use that script with the prepared datasets in LIBSVM format and the above-mentioned modifications or use the already prepared model from the "data_for_comparison.zip" archive.

Load the previously trained model.

In [28]:
model = xgb.Booster()
model.load_model('web_attacks_rf.model')

Load the test set.

In [29]:
dtest = xgb.DMatrix('web_attacks_libsvm_normalized.test', silent=True)
dy_test = dtest.get_label()

Get the Random Forest model's evaluation metrics for the test data.

In [30]:
dy_pred = model.predict(dtest, ntree_limit=model.best_ntree_limit)

In [31]:
print_metrics(dy_test, dy_pred)

Accuracy = 0.8866263070996148
Precision = 0.8852895107874651
Recall = 0.8866263070996148
F1 = 0.8548262756875967


In [32]:
matrix = confusion_matrix(dy_test, dy_pred)
matrix

array([[1250,    2,   13,    0],
       [   7,    6,  144,    0],
       [  35,    0,  355,    0],
       [   2,    0,    3,    0]])

The downloaded archive ("data_for_comparison.zip") also contains already prepared datasets with relevant crafted adversarial samples. We modified the original source code for the LT-Attack in a way that allowed us to save 3 datasets with different adversarial perturbations:
* "adv_examples_1" for the l<sub>1</sub> norm perturbation;
* "adv_examples_2" for the l<sub>2</sub> norm perturbation;
* "adv_examples_inf" for the l<sub>∞</sub> norm perturbation.

The elapsed time for the adversarial samples generation was 42 minutes.

Load prepared datasets and get predictions for them.

In [33]:
dtest_1 = xgb.DMatrix('adv_examples_1', silent=True)
dtest_2 = xgb.DMatrix('adv_examples_2', silent=True)
dtest_inf = xgb.DMatrix('adv_examples_inf', silent=True)

In [34]:
dy_pred_1 = model.predict(dtest_1, ntree_limit=model.best_ntree_limit)
dy_pred_2 = model.predict(dtest_2, ntree_limit=model.best_ntree_limit)
dy_pred_inf = model.predict(dtest_inf, ntree_limit=model.best_ntree_limit)

Evaluate the model's performance on the test data with relevant crafted adversarial samples for the l<sub>1</sub> norm perturbation.

In [35]:
process_relevant_samples(dy_test, dy_pred, dy_pred_1)

Generated 671 adversarial samples from 1817 original samples.
Generated 18 relevant adversarial samples from 1817 original samples.
Confusion matrix:
 [[1250    2   13    0]
 [   7    6  144    0]
 [  53    0  337    0]
 [   2    0    3    0]]
Accuracy = 0.8767198679141442
Precision = 0.8736476151160423
Recall = 0.8767198679141442
F1 = 0.8447812838569833


Evaluate the model's performance on the test data with relevant crafted adversarial samples for the l<sub>2</sub> norm perturbation.

In [36]:
process_relevant_samples(dy_test, dy_pred, dy_pred_2)

Generated 693 adversarial samples from 1817 original samples.
Generated 18 relevant adversarial samples from 1817 original samples.
Confusion matrix:
 [[1250    2   13    0]
 [   7    6  144    0]
 [  53    0  337    0]
 [   2    0    3    0]]
Accuracy = 0.8767198679141442
Precision = 0.8736476151160423
Recall = 0.8767198679141442
F1 = 0.8447812838569833


Evaluate the model's performance on the test data with relevant crafted adversarial samples for the l<sub>∞</sub> norm perturbation.

In [37]:
process_relevant_samples(dy_test, dy_pred, dy_pred_inf)

Generated 536 adversarial samples from 1817 original samples.
Generated 29 relevant adversarial samples from 1817 original samples.
Confusion matrix:
 [[1261    2    2    0]
 [   7    6  144    0]
 [  53    0  337    0]
 [   2    0    3    0]]
Accuracy = 0.8827738029719318
Precision = 0.8772152800050697
Recall = 0.8827738029719318
F1 = 0.8498768531657963


## Conclusion

As we can see, the HSJA attack was the most effective attack: it needed less time to generate one relevant adversarial sample than any other attack. Similarly, the LT-Attack was the least effective attack.