# Attacking an ML classifier for malicious IDS traffic

In this part of the workshop we will generate adversarial samples to fool the classifiers from part 1

There are many ways that an ML classifier can be manipulated. 

Model evasion attacks use adversarial samples, specially crafted samples which retain the original class in reality but are misclassified by the model

There are many open source implemenations of adversarial attacks. Here we will use the [Adversarial Robustness Toolbox (ART)](https://github.com/Trusted-AI/adversarial-robustness-toolbox) which has the advantage of compatibility with multiple python machine learning libraries across many types of attack. 


In [1]:
from art.attacks.evasion import DecisionTreeAttack, HopSkipJump
from art.estimators.classification import SklearnClassifier
from art.estimators.classification.scikitlearn import ScikitlearnDecisionTreeClassifier

from models import Model
from utils import compare_data, parse_df_for_pcap_validity
import numpy as np

## White Box Attack
 
First we will assume the adversary has full knowledge of the classifier and use the [Decision Tree Attack (Papernot, McDaniel, Goodfellow 2016)](https://arxiv.org/abs/1605.07277) (on the decision tree models). 

In [2]:
# Adversary has a pcap of 1 minute of Ping flood DDos that they have crafted 
# For the adversary: success = 0 packet detection but will settle for 90% getting through

attack_data_pcap = "datasets/AdversaryPingFlood.pcap"

# load up the stolen IDS classifier
model = Model(None, save_model_name="time_model_dt")

# check how well the model works at detecting the packets so far
print("Classification before adversarial evasion")
target_attack_x, target_attack_y, preds = model.test(attack_data_pcap, malicious=1, return_x_y_preds=True)

save_model_path exists, loading model and config....
DecisionTreeClassifier()
['time_delta', 'IP__ttl', 'Ethernet__type_2048.0', 'Ethernet__type_2054.0', 'Ethernet__type_0.0', 'Ethernet__type_34525.0', 'Ethernet__type_32821.0', 'IP__proto_6.0', 'IP__proto_17.0', 'IP__proto_0.0', 'IP__proto_1.0', 'IP__proto_2.0']
Classification before adversarial evasion
Opening datasets/AdversaryPingFlood.pcap ...
done parsing datasets/AdversaryPingFlood.pcap
-----
Testing acc: 0.94, f1: 0.97, tpr: 0.94, tnr 0.00
[[   0    0]
 [ 114 1886]]
-----


In [3]:
# get packets classified as malicious - these are the ones we want to manipulate
target_attack_x, target_attack_y = target_attack_x[np.where(preds == 1)], target_attack_y[np.where(preds == 1)]

# add ART wrapper to classifier
art_classifier = ScikitlearnDecisionTreeClassifier(model=model.get_classifier())

# create DecisionTreeAttack instance and pass ART classifier 
dt_attack = DecisionTreeAttack(classifier=art_classifier)

# generate adversarial samples
x_test_adv = dt_attack.generate(x=target_attack_x)

Decision tree attack:   0%|          | 0/1886 [00:00<?, ?it/s]

In [4]:
# Check new classification accuracy
print("Classification after adversarial evasion")
model.test((x_test_adv, np.ones(len(x_test_adv))), malicious=target_attack_x)

Classification after adversarial evasion
-----
Testing acc: 0.00, f1: 0.00, tpr: 0.00, tnr 0.00
[[   0    0]
 [1885    1]]
-----


In [5]:
# Checking for packet validity: compare the differences between the packets
for i, (before, after) in enumerate(zip(target_attack_x, x_test_adv)):
    if i >= 3:
        break
    print("sample", i)
    compare_data(before, after, model.features)

sample 0
   IP__ttl
0   64.000
1   32.499
sample 1
   time_delta  IP__ttl
0    0.000515   64.000
1   -0.000958   32.499
sample 2
   time_delta  IP__proto_1.0
0    0.049520          1.000
1    0.004405          0.499


In [6]:
### fix the "illegal" changes:
x_test_adv = parse_df_for_pcap_validity(x_test_adv, original_data=target_attack_x, columns=model.features)

# compare against original
for i, (before, after) in enumerate(zip(target_attack_x, x_test_adv)):
    if i >= 3:
        break
    print("\n sample", i)
    compare_data(before, after, model.features)

# test new classification accuracy on "fixed" adversarial samples
print("Classification after adversarial evasion + packet validation")
model.test((x_test_adv, np.ones(len(x_test_adv))), malicious=target_attack_x)

sample 0
   IP__ttl
0     64.0
1     32.0
sample 1
   time_delta  IP__ttl
0    0.000515     64.0
1    0.000000     32.0
sample 2
   time_delta  IP__proto_1.0
0    0.049520          1.000
1    0.004405          0.499
Classification after adversarial evasion + packet validation
-----
Testing acc: 0.00, f1: 0.00, tpr: 0.00, tnr 0.00
[[   0    0]
 [1885    1]]
-----


## Black(ish) Box Attack

Now we assume the attacker can only see the label coming out of the IDS, does not know the algorithm being used, the features being used or how they are represented (we actually do know a little how they are represented - hence black-ish). 

Here we use the [HopSkipJump Attack (Chen, Jordan, Wainwright)](https://arxiv.org/abs/1904.02144)

In [7]:
model = Model(None, save_model_name="time_model_dt")

# review test accuracy
print("Original accuracy")
model.test(x_test_adv, malicious=target_attack_x)

save_model_path exists, loading model and config....
DecisionTreeClassifier()
['time_delta', 'IP__ttl', 'Ethernet__type_2048.0', 'Ethernet__type_2054.0', 'Ethernet__type_0.0', 'Ethernet__type_34525.0', 'Ethernet__type_32821.0', 'IP__proto_6.0', 'IP__proto_17.0', 'IP__proto_0.0', 'IP__proto_1.0', 'IP__proto_2.0']
Original accuracy
-----
Testing acc: 0.00, f1: 0.00, tpr: 0.00, tnr 0.00
[[   0    0]
 [1885    1]]
-----


In [8]:
# create ART wrapper for model
art_classifier = SklearnClassifier(model=model.get_classifier())

# Initiate HopSkipJump and 
attack = HopSkipJump(classifier=art_classifier)
x_test_adv = attack.generate(x=target_attack_x, y=np.zeros(len(target_attack_x)))

# check new classification accuracy
print("Classification after black-box adversarial evasion")
model.test((x_test_adv, np.ones(len(x_test_adv))), malicious=target_attack_x)

HopSkipJump:   0%|          | 0/1886 [00:00<?, ?it/s]

Classification after black-box adversarial evasion
-----
Testing acc: 0.00, f1: 0.00, tpr: 0.00, tnr 0.00
[[   0    0]
 [1886    0]]
-----


In [9]:
# compare against original
for i, (before, after) in enumerate(zip(target_attack_x, x_test_adv)):
    if i >= 3:
        break
    print("\n sample", i)
    compare_data(before, after, model.features)

sample 0
   time_delta    IP__ttl  Ethernet__type_2048.0  Ethernet__type_2054.0  \
0    0.000000  64.000000               1.000000               0.000000   
1    0.064538  64.026192               0.998389               0.005513   

   Ethernet__type_0.0  Ethernet__type_34525.0  Ethernet__type_32821.0  \
0            0.000000                0.000000                0.000000   
1            0.000506                0.008076                0.003619   

   IP__proto_6.0  IP__proto_17.0  IP__proto_0.0  IP__proto_1.0  IP__proto_2.0  
0       0.000000        0.000000       0.000000       1.000000       0.000000  
1       0.002838        0.004043       0.000051       1.004305       0.005235  
sample 1
   time_delta    IP__ttl  Ethernet__type_2048.0  Ethernet__type_2054.0  \
0    0.000515  64.000000               1.000000               0.000000   
1    0.030747  63.970646               1.005681               0.002443   

   Ethernet__type_0.0  Ethernet__type_32821.0  IP__proto_6.0  IP__proto_17.0

In [10]:
### parse packets for illegal changes 
x_test_adv = parse_df_for_pcap_validity(x_test_adv, target_attack_x, columns=model.features)
    
# test new classification accuracy on "fixed" adversarial samples
print("Classification after adversarial evasion + packet validation")
model.test((x_test_adv, np.ones(len(x_test_adv))), malicious=target_attack_x)

Classification after adversarial evasion + packet validation
-----
Testing acc: 0.00, f1: 0.00, tpr: 0.00, tnr 0.00
[[   0    0]
 [1886    0]]
-----


### Part 2 Exercises

In [11]:
# Try different decision tree models for the white box attack above 
# i.e. replace model name in the first section with one of these: 
decision_tree_models = ["time_model_dt", 
                        "all_except_src_dst_dt", 
                        "all_dt", 
                        "tcp_udp_modbus_icmp_boot_dt", 
                        "src_dst_features_dt", 
                        "IP_features_dt"]
# or use a model you trained in the previous section!

In [12]:
# Try the black-box attack with different models (any algorithm)

In [13]:
# Do the adversarial samples generated for one model also fool another (is there transferability)?

In [14]:
# Questions
# Does it always matter if the packets are valid?
# Which features are most commonly manipulated?
# Does changing the algorithm change which features are changed?
# Did you predictions from the previous section hold here?
# Which scenario do you think is more likely? 

In [15]:
### ----- end of part 1 ------