<img src="../img/GTK_Logo_Social Icon.jpg" width=175 align="right" />

# Worksheet 3 Attacking Machine Learning Models

In this lab, we will learn how to use the Adversarial Robustness Toolkit (ART) to launch various attacks against models. The first attack you will launch will be to create adversarial examples from a model.  These examples could be used to defeat a model, or control the model's behavior.

The documentation for ART can be found here: https://github.com/Trusted-AI/adversarial-robustness-toolbox/tree/main

In [None]:
import numpy as np
import pandas as pd
import joblib
from art.attacks.evasion import DecisionTreeAttack, HopSkipJump
from art.estimators.classification import SklearnClassifier, BlackBoxClassifier
from art.utils import to_categorical
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn import metrics
from yellowbrick.classifier import ConfusionMatrix
from yellowbrick.classifier import ClassificationReport

import warnings
warnings.filterwarnings('ignore')
DATA_HOME = '../data'

## Decision Tree Attack
In this example, we are going to use the ART to attack a decision tree. The goal is to create adversarial examples which could be used to control the output of the model.  

Due to the nature of decision trees, it is not necessary to use gradient descent to discover adversarial examples and instead, it can be accomplished by tree traversals. This attack is a whitebox attack in that you need to have access to the actual model. 

This methodology was described in a paper by Papernot et al. in https://arxiv.org/abs/1605.07277. You can see this code in action here: https://github.com/Trusted-AI/adversarial-robustness-toolbox/blob/main/notebooks/attack_decision_tree.ipynb.

First we're going to load the model from a pickle file. 

In [None]:
# Load the classifier from the pickle file
with open(f"{DATA_HOME}/dga_decision_tree_adv.pkl", "rb") as file:
    clf = joblib.load(file)

In [None]:
clf

We will also need some training data.  In this case, we'll use the data that was used to train the original model, but this is not necessary. 

In [None]:
df = pd.read_csv(f'{DATA_HOME}/dga_features_final_df.csv')
label_encoder = LabelEncoder()
target = label_encoder.fit_transform(df['isDGA'])

feature_matrix = df.drop(['isDGA'], axis=1)
feature_matrix_train, feature_matrix_test, target_train, target_test = train_test_split(feature_matrix, target, test_size=0.25)

### Step 1:  Create the ART Classifier
As a first step, we need to use ART to create an "adversarial" classifier.  Use the `SklearnClassifier` module from ART. 

In [None]:
# Your code here ...
adversarial_classifier =

### Step 2:  Attack!!!
Now that you've created an adversarial classifier the next step is to train that adversarial classifier.  Use the `DecisionTreeAttack` module in ART to launch an attack, then call the `generate()` with the `feature_matrix_train` and `target_train` datasets.  The `generate()` method can be called either with only the feature matrix alone or you can call it with a list of desired targets.  

For our example, let's say that we want all the results to be classified as legitimate, we're going to pass it a numpy array of 1500 `0` for a target vector.


Note: You will have to call the `.to_numpy()` methods on these datasets when you pass them to ART.


This step generates a lot of future warnings. For this exercise we have suppressed them, however scikit-learn will throw warnings when you mix numpy arrays and dataframes.  The way to avoid this is to actually train your models on numpy arrays.  To do that, during the training process, convert the dataframe to a numpy array with the `.to_numpy()` method.

In [None]:
# Here's an array of all zeros to fool the classifier
all_legit = np.array([0] * 1500)

# First create the DecisionTreeAttack
attack = # Your code here...

# Then run the generate function to generate adversarial examples.
adversarial_data = # Your code here...

In [None]:
adversarial_preds = # Your code here...

### Step 3:  Evaluate the Performance
At this point you should have a dataset of adversarial examples that produce exclusively legit classifications.  Now try running that through the original classifier and making a confusion matrix and classification report to see how we did.

In [None]:
# Your code here...

In [5]:
# Your code here...

If you did this correctly, you should get predictions that are entirely of the `0` class.  This shows how you are able to generate adverarial data that can be crafted to direct the decisions of a model.

## BlackBox Adversarial Attack
Now that you've successfully launched a white box adversarial attack, let's try a blackbox attack. We're going to use the `HopSkipJump` attack from Jianbo et al. (2019). This is a powerful black-box attack that only requires final class prediction, and is an advanced version of the boundary attack.

Paper link: https://arxiv.org/abs/1904.02144

In order to execute this attack, we will need a `predict()` function which calls a trained model and returns the predictions. In our example, the `predict()` function is simply a wrapper for our trained classifier, however, this same technique could be used with a true blackbox model where only the predictions are accessible. In that case, the `predict()` function would contain API calls or something similar.

In [6]:
def predict(x):
    '''
    Call the model and return the predictions.  This function could contain calls to a true 
    blackbox model, but in this example, is calling our pre-trained model.
    '''
    x = np.array(x)
    return to_categorical(clf.predict(x), nb_classes=2)


### Step 1:  Create the BlackBox Classifier
In order to execute the attack we need to first create a `BlackBoxClassifier`.  At a minimum, we need to pass the predict function, the number of features and the number of possible classes.

In [None]:
blackbox_clf = # Your code here...

### Step 2:  ATTACK!!  Generate Adversarial Examples
The next step is to create the `HopSkipJump` object to launch the attack.  This follows a similar pattern as the previous attack where you create the `attack` object, then call the `generate()` method passing the testing features (`feature_matrix_test`).  This will generate an array of adversarial examples.  

For our use case, let's say that we want to generate adversarial examples that skew towards one class. In the `HopSkipJump` object, set `targeted=True` which forces the attack to generate examples for one class only. 


NOTE: You will have to convert the testing features to a numpy array like this:
```python
feature_matrix_test.to_numpy()
```

In [None]:
# Create the attack object.
attack =
adversarial_data_blackbox =


### Step 3:  Evaluate the Attack
Now that you have a set of adversarial data, let's make some predictions with that data and see how effective it is in predicting the model output.  You won't be able to use Yellowbrick here because the `BlackBoxClassifier` does not implement the `fit()` method. 

For this final step, make the predictions, then create a confusion matrix of this data to evaluate your BlackBox model's performance.

In [None]:
# First make some predictions using the adversarial data you generated
adversarial_predictions = # Your code here...

In [None]:
# Now create a confusion matrix
confusion_matrix= metrics.confusion_matrix(target_test, adversarial_predictions)
cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix).plot()

How did the model do?  If you did this correctly, you should have a blackbox classifier that perfectly classified the adversarial data equally into both the legit and dga class.