# AI Security

## Adversarial Robustness Toolbox

[Adversarial Robustness Toolbox (ART)](https://github.com/Trusted-AI/adversarial-robustness-toolbox)  is a Python library for Machine Learning Security
ART provides tools that enable developers and researchers to defend and evaluate Machine Learning models and applications against the adversarial threats of Evasion, Poisoning, Extraction, and Inference. ART supports all popular machine learning frameworks (TensorFlow, Keras, PyTorch, MXNet, scikit-learn, XGBoost, LightGBM, CatBoost, GPy, etc.), all data types (images, tables, audio, video, etc.) and machine learning tasks (classification, object detection, speech recognition, generation, certification, etc.).

### Setup

#### Installation with pip 
ART is designed and tested to run with Python 3.

ART and its core dependencies (excluding frameworks, e.g. TensorFlow, and tool-specific dependencies, these have to be installed separately or with the install options below) can be installed from the PyPI repository using pip:

*Installing the dependencies*

In [1]:
!pip install adversarial-robustness-toolbox[all]

Looking in indexes: https://pypi.python.org/simple

You should consider upgrading via the 'D:\envs\ai_security\Scripts\python.exe -m pip install --upgrade pip' command.





# Adversarial Robustness Toolbox examples




## Get Started with ART

# Imperceptible attack on tabular data using LowProFool algorithm

In this notebook, we will learn how to execute imperceptible attack on tabular data with the LowProFool algorithm (https://arxiv.org/abs/1911.03274). We will use iris flowers and breast cancer datasetsrts

# Imports

In [2]:
from art.estimators.classification.scikitlearn import ScikitlearnLogisticRegression
from art.estimators.classification.pytorch import PyTorchClassifier
from art.attacks.evasion import LowProFool

import numpy as np
import pandas as pd

from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.linear_model import LogisticRegression

import torch
import torch.nn as nn
from torch import optim
from torch.autograd import Variable

# Data preparation
Firstly, we load the datasets, standardize them, and split into training and validation sets. We also choose the clipping values for both datasets.

In [3]:
def standardize(data):
    """
    Get both the standardized data and the used scaler.
    """
    columns = data.columns
    scaler = StandardScaler()
    x_scaled = scaler.fit_transform(data)
    
    return pd.DataFrame(data=x_scaled, columns=columns), scaler

In [4]:
split = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=0)

def get_train_and_valid(design_matrix, labels):
    """
    Split dataset into training and validation sets.
    """
    for train_idx, valid_idx in split.split(design_matrix, labels):
        X_train = design_matrix.iloc[train_idx].copy()
        X_valid = design_matrix.iloc[valid_idx].copy()
        y_train = labels.iloc[train_idx].copy()
        y_valid = labels.iloc[valid_idx].copy()

    return X_train, y_train, X_valid, y_valid

### Loading and preparation of the iris flowers dataset

In [5]:
iris = datasets.load_iris()
design_matrix_iris = pd.DataFrame(data=iris['data'], columns=iris['feature_names'])
labels_iris = pd.Series(data=iris['target'])
display(design_matrix_iris)

design_matrix_iris_scaled, iris_scaler = standardize(design_matrix_iris)

X_train_iris, y_train_iris, X_valid_iris, y_valid_iris =\
    get_train_and_valid(design_matrix_iris_scaled, labels_iris)

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2
...,...,...,...,...
145,6.7,3.0,5.2,2.3
146,6.3,2.5,5.0,1.9
147,6.5,3.0,5.2,2.0
148,6.2,3.4,5.4,2.3


  if not hasattr(array, "sparse") and array.dtypes.apply(is_sparse).any():
  if is_sparse(pd_dtype):
  if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):
  if not hasattr(array, "sparse") and array.dtypes.apply(is_sparse).any():
  if is_sparse(pd_dtype):
  if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):


### Loading and preparation of the breast cancer dataset

In [6]:
cancer = datasets.load_breast_cancer()
design_matrix_cancer = pd.DataFrame(data=cancer['data'], columns=cancer['feature_names'])
labels_cancer = pd.Series(data=cancer['target'])
display(design_matrix_cancer)

design_matrix_cancer_scaled, cancer_scaler = standardize(design_matrix_cancer)

X_train_cancer, y_train_cancer, X_valid_cancer, y_valid_cancer =\
    get_train_and_valid(design_matrix_cancer_scaled, labels_cancer)

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,17.99,10.38,122.80,1001.0,0.11840,0.27760,0.30010,0.14710,0.2419,0.07871,...,25.380,17.33,184.60,2019.0,0.16220,0.66560,0.7119,0.2654,0.4601,0.11890
1,20.57,17.77,132.90,1326.0,0.08474,0.07864,0.08690,0.07017,0.1812,0.05667,...,24.990,23.41,158.80,1956.0,0.12380,0.18660,0.2416,0.1860,0.2750,0.08902
2,19.69,21.25,130.00,1203.0,0.10960,0.15990,0.19740,0.12790,0.2069,0.05999,...,23.570,25.53,152.50,1709.0,0.14440,0.42450,0.4504,0.2430,0.3613,0.08758
3,11.42,20.38,77.58,386.1,0.14250,0.28390,0.24140,0.10520,0.2597,0.09744,...,14.910,26.50,98.87,567.7,0.20980,0.86630,0.6869,0.2575,0.6638,0.17300
4,20.29,14.34,135.10,1297.0,0.10030,0.13280,0.19800,0.10430,0.1809,0.05883,...,22.540,16.67,152.20,1575.0,0.13740,0.20500,0.4000,0.1625,0.2364,0.07678
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
564,21.56,22.39,142.00,1479.0,0.11100,0.11590,0.24390,0.13890,0.1726,0.05623,...,25.450,26.40,166.10,2027.0,0.14100,0.21130,0.4107,0.2216,0.2060,0.07115
565,20.13,28.25,131.20,1261.0,0.09780,0.10340,0.14400,0.09791,0.1752,0.05533,...,23.690,38.25,155.00,1731.0,0.11660,0.19220,0.3215,0.1628,0.2572,0.06637
566,16.60,28.08,108.30,858.1,0.08455,0.10230,0.09251,0.05302,0.1590,0.05648,...,18.980,34.12,126.70,1124.0,0.11390,0.30940,0.3403,0.1418,0.2218,0.07820
567,20.60,29.33,140.10,1265.0,0.11780,0.27700,0.35140,0.15200,0.2397,0.07016,...,25.740,39.42,184.60,1821.0,0.16500,0.86810,0.9387,0.2650,0.4087,0.12400


  if not hasattr(array, "sparse") and array.dtypes.apply(is_sparse).any():
  if is_sparse(pd_dtype):
  if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):
  if not hasattr(array, "sparse") and array.dtypes.apply(is_sparse).any():
  if is_sparse(pd_dtype):
  if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):


#### Clip-values
**Iris flowers dataset** - minimum and maximum values in training set.

In [7]:
scaled_clip_values_iris = (
    np.array(X_train_iris.min()),
    np.array(X_train_iris.max())
)
print("Clip-values:")
print("  Lower bound:", scaled_clip_values_iris[0])
print("  Upper bound:", scaled_clip_values_iris[1])

print("\nClip-values in original scale:")
clip_values_iris = iris_scaler.inverse_transform(scaled_clip_values_iris)
print("  Lower bound:", clip_values_iris[0])
print("  Upper bound:", clip_values_iris[1])

Clip-values:
  Lower bound: [-1.87002413 -2.43394714 -1.56757623 -1.44707648]
  Upper bound: [2.4920192  3.09077525 1.78583195 1.71209594]

Clip-values in original scale:
  Lower bound: [4.3 2.  1.  0.1]
  Upper bound: [7.9 4.4 6.9 2.5]


**Breast cancer dataset** - 1 standard deviation boundary.

Note: Here, we create clip values such that all values should fall within the one standard deviation interval. Thanks to the dataset being priorly standardized, it is a trivial problem. Moreover clip values can be concisely expressed as just a single tuple (-1., 1.).

In [8]:
scaled_clip_values_cancer = (-1., 1.)

As you can see, one can easily generate good quality adversary examples in just a few lines of code.

----
# LowProFool example
In this section, we present you a few examples of LowProFool adversarial attacks carried out in a similar fashion, but employing different underlying models and on different datasets.


## Preparation of classifiers


### Logistic Regression

#### Training on iris flowers dataset

In [9]:
log_regression_clf_iris = LogisticRegression()
log_regression_clf_iris.fit(X_train_iris.values, y_train_iris)

In [10]:
log_regression_clf_cancer = LogisticRegression()
log_regression_clf_cancer.fit(X_train_cancer.values, y_train_cancer)

### Neural Network

In [11]:
def get_nn_model(input_dimensions, hidden_neurons, output_dimensions):
    """
    Prepare PyTorch (torch) neural network.
    """
    return torch.nn.Sequential(
        nn.Linear(input_dimensions, hidden_neurons),
        nn.ReLU(),
        nn.Linear(hidden_neurons, output_dimensions),
        nn.Softmax(dim=1)
    )

loss_fn = torch.nn.MSELoss(reduction='sum')

def train_nn(nn_model, X, y, learning_rate, epochs):
    """
    Train provided neural network.
    """
    optimizer = optim.SGD(nn_model.parameters(), lr=learning_rate)
    
    for _ in range(epochs):
        y_pred = nn_model.forward(X)
        loss = loss_fn(y_pred, y)
        nn_model.zero_grad()
        loss.backward()
        
        optimizer.step()


#### Training on iris flowers dataset

In [12]:
X = Variable(torch.FloatTensor(np.array(X_train_iris)))
y = Variable(torch.FloatTensor(np.eye(3)[y_train_iris]))
nn_model_iris = get_nn_model(4, 10, 3)
train_nn(nn_model_iris, X, y, 1e-4, 1000)

#### Training on breast cancer dataset

In [13]:
X = Variable(torch.FloatTensor(np.array(X_train_cancer.values)))
y = Variable(torch.FloatTensor(np.eye(2)[y_train_cancer]))
nn_model_cancer = get_nn_model(30, 50, 2)
train_nn(nn_model_cancer, X, y, 1e-4, 1000)

## Actual usage of LowProFool

### Logistic Regression

In [14]:
def lowprofool_generate_adversaries_test_lr(lowprofool, classifier, x_valid, y_valid):
    """
    Testing utility.
    """
    n_classes = lowprofool.n_classes
    
    # Generate targets
    target = np.eye(n_classes)[np.array(
        y_valid.apply(
            lambda x: np.random.choice([i for i in range(n_classes) if i != x]))
    )]
    
    # Generate adversaries
    adversaries = lowprofool.generate(x=x_valid, y=target)

    # Test - check the success rate
    expected = np.argmax(target, axis=1)
    predicted = np.argmax(classifier.predict_proba(adversaries), axis=1)
    correct = (expected == predicted)
    
    success_rate = np.sum(correct) / correct.shape[0]
    print("Success rate: {:.2f}%".format(100*success_rate))
    
    return adversaries

#### Iris flowers dataset test

In [15]:
# Wrapping classifier into appropriate ART-friendly wrapper
logistic_regression_iris_wrapper = ScikitlearnLogisticRegression(
    model       = log_regression_clf_iris, 
    clip_values = scaled_clip_values_iris
)

# Creating LowProFool instance
lpf_logistic_regression_iris = LowProFool(
    classifier = logistic_regression_iris_wrapper, 
    eta        = 5,
    lambd      = 0.2, 
    eta_decay  = 0.9
)

# Fitting feature importance
lpf_logistic_regression_iris.fit_importances(X_train_iris, y_train_iris)

# Testing
results_lr_ir = lowprofool_generate_adversaries_test_lr(
    lowprofool = lpf_logistic_regression_iris,
    classifier = log_regression_clf_iris, 
    x_valid    = X_valid_iris, 
    y_valid    = y_valid_iris
)

Success rate: 100.00%


Successful adversarial attack. Below we can see the original features and their classes, as well as the adversaries generated by LowProFool and predicted class-wise probabilities of them.

In [16]:
def print_predictions(values, preds, max_features=4):
    """
    Utility function for printing predictions.
    """
    predictions = zip(list(map(lambda e: e[:max_features], values.tolist())), preds.tolist())
    
    for features, pred in predictions:
        print("Features[:{}]:".format(max_features))
        for i, val in enumerate(features):
            if i % 6 != 5: print("{:>10.4f}".format(val), end='')
            else:          print("{:>10.4f}\n".format(val), end='')
        if len(features) % 6 != 0: print()
        
        print("Prediction (probability -> class):")
        for val in pred:
            print("{:>8.3f}".format(val), end='')
        print("  ->  {}\n".format(np.argmax(pred)))

In [17]:
print("=== Original values ===\n")

print_predictions(iris_scaler.inverse_transform(X_valid_iris[-3:].values), 
    log_regression_clf_iris.predict_proba(X_valid_iris[-3:]))
    
print("\n=== Adversaries (LowProFool results) ===\n")
    
print_predictions(iris_scaler.inverse_transform(results_lr_ir[-3:]), 
    log_regression_clf_iris.predict_proba(results_lr_ir[-3:]))

=== Original values ===

Features[:4]:
    6.3000    3.3000    6.0000    2.5000
Prediction (probability -> class):
   0.000   0.010   0.989  ->  2

Features[:4]:
    5.1000    3.5000    1.4000    0.2000
Prediction (probability -> class):
   0.983   0.017   0.000  ->  0

Features[:4]:
    4.9000    3.1000    1.5000    0.1000
Prediction (probability -> class):
   0.957   0.043   0.000  ->  0


=== Adversaries (LowProFool results) ===

Features[:4]:
    7.8893    2.6554    5.1651    1.0961
Prediction (probability -> class):
   0.000   0.866   0.134  ->  1

Features[:4]:
    6.9759    2.2792    6.3187    2.3087
Prediction (probability -> class):
   0.000   0.007   0.993  ->  2

Features[:4]:
    6.7293    2.0210    6.3154    2.2818
Prediction (probability -> class):
   0.000   0.007   0.993  ->  2



  if not hasattr(array, "sparse") and array.dtypes.apply(is_sparse).any():
  if is_sparse(pd_dtype):
  if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):


#### Breast cancer dataset test

In [18]:
# Wrapping classifier into appropriate ART-friendly wrapper
logistic_regression_cancer_wrapper = ScikitlearnLogisticRegression(
    model       = log_regression_clf_cancer, 
    clip_values = scaled_clip_values_cancer
)

# Creating LowProFool instance
lpf_logistic_regression_cancer = LowProFool(
    classifier = logistic_regression_cancer_wrapper, 
    eta        = 5,
    lambd      = 0.2, 
    eta_decay  = 0.9
)

# Fitting feature importance
lpf_logistic_regression_cancer.fit_importances(X_train_cancer, y_train_cancer)

# Testing
results_lr_bc = lowprofool_generate_adversaries_test_lr(
    lowprofool = lpf_logistic_regression_cancer,
    classifier = log_regression_clf_cancer, 
    x_valid    = X_valid_cancer, 
    y_valid    = y_valid_cancer
)

Success rate: 100.00%


In [19]:
print("=== Original values ===\n")

print_predictions(cancer_scaler.inverse_transform(X_valid_cancer[-2:]), 
                  log_regression_clf_cancer.predict_proba(X_valid_cancer[-2:]), max_features=30)

print("\n=== Adversaries (LowProFool results) ===\n")

print_predictions(cancer_scaler.inverse_transform(results_lr_bc[-2:]), 
                  log_regression_clf_cancer.predict_proba(results_lr_bc[-2:]), max_features=30)

=== Original values ===

Features[:30]:
   12.8700   19.5400   82.6700  509.2000    0.0914    0.0788
    0.0180    0.0209    0.1861    0.0635    0.3665    0.7693
    2.5970   26.5000    0.0059    0.0136    0.0071    0.0065
    0.0222    0.0024   14.4500   24.3800   95.1400  626.9000
    0.1214    0.1652    0.0713    0.0638    0.3313    0.0774
Prediction (probability -> class):
   0.006   0.994  ->  1

Features[:30]:
   11.8100   17.3900   75.2700  428.9000    0.1007    0.0556
    0.0235    0.0155    0.1718    0.0578    0.1859    1.9260
    1.0110   14.4700    0.0078    0.0088    0.0156    0.0062
    0.0314    0.0020   12.5700   26.4800   79.5700  489.5000
    0.1356    0.1000    0.0880    0.0431    0.3200    0.0658
Prediction (probability -> class):
   0.000   1.000  ->  1


=== Adversaries (LowProFool results) ===

Features[:30]:
   17.6168   23.5787  116.0187 1003.4447    0.0997    0.0517
    0.1675    0.0872    0.1735    0.0557    0.6810    0.7894
    4.8775   85.5782    0.0085    0

  if not hasattr(array, "sparse") and array.dtypes.apply(is_sparse).any():
  if is_sparse(pd_dtype):
  if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):
  if not hasattr(array, "sparse") and array.dtypes.apply(is_sparse).any():
  if is_sparse(pd_dtype):
  if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):


### Neural Network

In [20]:
def lowprofool_generate_adversaries_test_nn(lowprofool, classifier, x_valid, y_valid):
    """
    Testing utility.
    """
    n_classes = lowprofool.n_classes
    
    # Generate targets
    target = np.eye(n_classes)[np.array(
        y_valid.apply(
            lambda x: np.random.choice([i for i in range(n_classes) if i != x]))
    )]
    
    # Generate adversaries
    adversaries = lowprofool.generate(x=x_valid, y=target)

    # Test - check the success rate
    expected = np.argmax(target, axis=1)
    x = Variable(torch.from_numpy(adversaries.astype(np.float32)))
    predicted = np.argmax(classifier.forward(x).detach().numpy(), axis=1)
    correct = (expected == predicted)
    
    success_rate = np.sum(correct) / correct.shape[0]
    print("Success rate: {:.2f}%".format(100*success_rate))
    
    return adversaries

#### Iris flowers dataset test

In [21]:
# Wrapping classifier into appropriate ART-friendly wrapper
# (in this case it is PyTorch NN classifier wrapper from ART)
neural_network_iris_wrapper = PyTorchClassifier(
    model       = nn_model_iris, 
    loss        = loss_fn,
    input_shape = (4,),
    nb_classes  = 3,
    clip_values = scaled_clip_values_iris
)

# Creating LowProFool instance
lpf_neural_network_iris = LowProFool(
    classifier = neural_network_iris_wrapper,
    n_steps    = 100,
    eta        = 7,
    lambd      = 1.75, 
    eta_decay  = 0.95
)

# Fitting feature importance
lpf_neural_network_iris.fit_importances(X_train_iris, y_train_iris)

# Testing
results_nn_ir = lowprofool_generate_adversaries_test_nn(
    lowprofool = lpf_neural_network_iris,
    classifier = nn_model_iris, 
    x_valid    = X_valid_iris, 
    y_valid    = y_valid_iris
)

Success rate: 100.00%


In [22]:
print("=== Original values ===\n")

print_predictions(iris_scaler.inverse_transform(X_valid_iris[:3].values),
      neural_network_iris_wrapper.predict(X_valid_iris[:3].values.astype(np.float32)))

print("\n=== Adversaries (LowProFool results) ===\n")

print_predictions(iris_scaler.inverse_transform(results_nn_ir[:3]), 
      neural_network_iris_wrapper.predict(results_nn_ir.astype(np.float32)[:3]))

=== Original values ===

Features[:4]:
    5.5000    3.5000    1.3000    0.2000
Prediction (probability -> class):
   0.949   0.043   0.008  ->  0

Features[:4]:
    5.7000    2.8000    4.5000    1.3000
Prediction (probability -> class):
   0.139   0.575   0.286  ->  1

Features[:4]:
    5.1000    3.8000    1.9000    0.4000
Prediction (probability -> class):
   0.950   0.041   0.009  ->  0


=== Adversaries (LowProFool results) ===

Features[:4]:
    7.0219    3.6927    4.3126    2.3001
Prediction (probability -> class):
   0.030   0.180   0.790  ->  2

Features[:4]:
    6.5403    3.2033    6.2379    2.3576
Prediction (probability -> class):
   0.020   0.140   0.840  ->  2

Features[:4]:
    4.7723    2.2178    3.7398    0.6132
Prediction (probability -> class):
   0.218   0.637   0.145  ->  1



#### Breast cancer dataset test

In [23]:
# Wrapping classifier into appropriate ART-friendly wrapper
# (in this case it is PyTorch NN classifier wrapper from ART)
neural_network_cancer_wrapper = PyTorchClassifier(
    model       = nn_model_cancer, 
    loss        = loss_fn, 
    input_shape = (30,),
    nb_classes  = 2,
    clip_values = scaled_clip_values_cancer
)

# Creating LowProFool instance
lpf_neural_network_cancer = LowProFool(
    classifier = neural_network_cancer_wrapper,
    n_steps    = 200,
    eta        = 10,
    lambd      = 2, 
    eta_decay  = 0.99
)

# Fitting feature importance
lpf_neural_network_cancer.fit_importances(X_train_cancer, y_train_cancer)

# Testing
results_nn_bc = lowprofool_generate_adversaries_test_nn(
    lowprofool = lpf_neural_network_cancer,
    classifier = nn_model_cancer, 
    x_valid    = X_valid_cancer, 
    y_valid    = y_valid_cancer
)

Success rate: 98.25%


In [24]:
print("=== Original values ===\n")

print_predictions(
    cancer_scaler.inverse_transform(X_valid_cancer[-2:]),
    neural_network_cancer_wrapper.predict(X_valid_cancer[-2:].values.astype(np.float32)),
    max_features=30
)

print("\n=== Adversaries (LowProFool results) ===\n")

print_predictions(
    cancer_scaler.inverse_transform(results_nn_bc[-2:]), 
    neural_network_cancer_wrapper.predict(results_nn_bc.astype(np.float32)[-2:]),
    max_features=30
)

=== Original values ===

Features[:30]:
   12.8700   19.5400   82.6700  509.2000    0.0914    0.0788
    0.0180    0.0209    0.1861    0.0635    0.3665    0.7693
    2.5970   26.5000    0.0059    0.0136    0.0071    0.0065
    0.0222    0.0024   14.4500   24.3800   95.1400  626.9000
    0.1214    0.1652    0.0713    0.0638    0.3313    0.0774
Prediction (probability -> class):
   0.015   0.985  ->  1

Features[:30]:
   11.8100   17.3900   75.2700  428.9000    0.1007    0.0556
    0.0235    0.0155    0.1718    0.0578    0.1859    1.9260
    1.0110   14.4700    0.0078    0.0088    0.0156    0.0062
    0.0314    0.0020   12.5700   26.4800   79.5700  489.5000
    0.1356    0.1000    0.0880    0.0431    0.3200    0.0658
Prediction (probability -> class):
   0.005   0.995  ->  1


=== Adversaries (LowProFool results) ===

Features[:30]:
   14.0018   22.8150   90.9044  695.5413    0.1057    0.0870
    0.0514    0.0325    0.1634    0.0557    0.5360    1.2544
    3.8995   49.1431    0.0061    0

  if not hasattr(array, "sparse") and array.dtypes.apply(is_sparse).any():
  if is_sparse(pd_dtype):
  if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):
