# Links to dataset information

**UCI**:
- __[adult](http://archive.ics.uci.edu/ml/datasets/Adult)__
- __[annealing](https://archive.ics.uci.edu/ml/datasets/Annealing)__
- __[audiology-std](https://archive.ics.uci.edu/ml/datasets/Audiology+%28Standardized%29)__
- __[bank](https://archive.ics.uci.edu/ml/datasets/Bank%2BMarketing)__
- __[bankruptcy](http://archive.ics.uci.edu/ml/datasets/Qualitative_Bankruptcy)__
- __[car](https://archive.ics.uci.edu/ml/datasets/Car+Evaluation)__
- __[chess-krvk](https://archive.ics.uci.edu/ml/datasets/Chess+%28King-Rook+vs.+King%29)__
- __[chess-krvkp](http://archive.ics.uci.edu/ml/datasets/Chess+%28King-Rook+vs.+King-Pawn%29)__
- __[congress-voting](https://archive.ics.uci.edu/ml/datasets/Congressional+Voting+Records)__
- __[contrac](https://archive.ics.uci.edu/ml/datasets/Contraceptive+Method+Choice)__
- __[credit-approval](http://archive.ics.uci.edu/ml/datasets/Credit+Approval)__
- **unsure about this one**: __[ctg](https://www.kaggle.com/akshat0007/fetalhr)__
- __[cylinder-bands](http://archive.ics.uci.edu/ml/datasets/Cylinder+Bands)__
- __[dermatology](https://archive.ics.uci.edu/ml/datasets/Dermatology)__
- __[german_credit](https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29)__
- __[heart-cleveland](https://archive.ics.uci.edu/ml/datasets/Heart+Disease)__
- __[ilpd](http://archive.ics.uci.edu/ml/datasets/ILPD+%28Indian+Liver+Patient+Dataset%29)__
- __[mammo](https://archive.ics.uci.edu/ml/datasets/Mammographic+Mass)__
- __[mushroom](https://archive.ics.uci.edu/ml/datasets/Mushroom)__
- __[wine](https://archive.ics.uci.edu/ml/datasets/wine)__
- __[wine_qual](https://archive.ics.uci.edu/ml/datasets/Wine+Quality)__

Others:
- __[texas](https://www.dshs.texas.gov/thcic/hospitals/UserManual1Q2013.pdf)__
- __[IEEECIS](https://www.kaggle.com/c/ieee-fraud-detection/discussion/101203)__


# Imports

In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

from tqdm import tqdm

In [2]:
from src.loader import load_dataset
from src.models import SRR
from src.preprocessing import processing_pipeline
from src.vulnerabilities import *

In [3]:
uci_datasets = ['adult', 'annealing', 'audiology-std', 'bank', 'bankruptcy', 'car',
                'chess-krvk', 'chess-krvkp', 'congress-voting', 'contrac', 'credit-approval',
                'ctg', 'cylinder-bands', 'dermatology', 'german_credit', 'heart-cleveland',
                'ilpd', 'mammo', 'mushroom', 'wine', 'wine_qual']

all_datasets = uci_datasets + ['texas', 'ieeecis']

# Adversarial Examples

|   dataset   | M | k |nbins|modifiable features|percentage|
|:------------|:--|:--|:----|:------|:---------|
|german_credit| 5 | 3 |  3  |Duration_in_months|19.76|
|german_credit|10 | 3 |  3  |Duration_in_months|16.49|
|german_credit| 5 | 5 |  5  |Credit_amount - Purpose - Duration_in_months|75.95|
|bankruptcy   | 5 | 3 |  -  |competitiveness|100|
|bankruptcy   | 5 | 3 |  -  |credibility - financial_flex|21.78|
|bankruptcy   | 5 | 3 |  -  |credibility|2.67|
|bankruptcy   | 5 | 3 |  -  |financial_flex|1.78|
|IEEE-CIS     | 5 | 3 |  3  |browser|100|
|IEEE-CIS     | 5 | 3 |  3  |card_type|48.59|
|IEEE-CIS     | 5 | 3 |  3  |TransactionAmt|14.22|
|IEEE-CIS     | 5 | 3 |  3  |card_type - TransactionAmt|88.12|

## german_credit

In [4]:
dataset = 'german_credit'
print(f"-> {dataset} dataset")
# Load the data
X, y = load_dataset(name=dataset)

-> german_credit dataset
Loading german_credit...


In [5]:
# Apply the processing pipeline
X_train, X_test, y_train, y_test = processing_pipeline(X, y, nbins=3)

# Construct and train Select-Regress-Round model
srr = SRR(k=3, M=5)
srr.fit(X_train, y_train)

srr.show_scoring_table()

Select-Regress-Round (SRR) [k=3, M=5]

                    Feature                                                     Category  Score
 Status_of_checking_account                                                   ... < 0 DM     -3
 Status_of_checking_account       ... >= 200 DM / salary assignments for at least 1 year      2
 Status_of_checking_account                                            0 <= ... < 200 DM     -1
 Status_of_checking_account                                          no checking account      5
             Credit_history                      all credits at this bank paid back duly     -3
             Credit_history  critical account/ other credits existing (not at this bank)      3
             Credit_history                 no credits taken/ all credits paid back duly     -3
         Duration_in_months                                                 (-inf, 11.5]      3
         Duration_in_months                                                  (23.0, inf]     -2



In [6]:
advs = find_adversarial_examples(srr, X_train, y_train, can_change=['Duration_in_months'], unit_changes=True)

                                                  

Found adversarial examples for 19.76 % of the correctly classified points




In [7]:
# Apply the processing pipeline
X_train, X_test, y_train, y_test = processing_pipeline(X, y, nbins=3)

# Construct and train Select-Regress-Round model
srr = SRR(k=3, M=10)
srr.fit(X_train, y_train)

srr.show_scoring_table()

Select-Regress-Round (SRR) [k=3, M=10]

                    Feature                                                     Category  Score
 Status_of_checking_account                                                   ... < 0 DM     -6
 Status_of_checking_account       ... >= 200 DM / salary assignments for at least 1 year      3
 Status_of_checking_account                                            0 <= ... < 200 DM     -2
 Status_of_checking_account                                          no checking account     10
             Credit_history                      all credits at this bank paid back duly     -6
             Credit_history  critical account/ other credits existing (not at this bank)      7
             Credit_history                 no credits taken/ all credits paid back duly     -7
         Duration_in_months                                                 (-inf, 11.5]      5
         Duration_in_months                                                  (23.0, inf]     -3


In [8]:
advs = find_adversarial_examples(srr, X_train, y_train, can_change=['Duration_in_months'], unit_changes=True)

                                                  

Found adversarial examples for 16.49 % of the correctly classified points




In [9]:
# Apply the processing pipeline
X_train, X_test, y_train, y_test = processing_pipeline(X, y, nbins=5)

# Construct and train Select-Regress-Round model
srr = SRR(k=5, M=5)
srr.fit(X_train, y_train)

srr.show_scoring_table()

Select-Regress-Round (SRR) [k=5, M=5]

                    Feature                                                     Category  Score
 Status_of_checking_account                                                   ... < 0 DM     -3
 Status_of_checking_account       ... >= 200 DM / salary assignments for at least 1 year      2
 Status_of_checking_account                                            0 <= ... < 200 DM     -1
 Status_of_checking_account                                          no checking account      4
             Credit_history                      all credits at this bank paid back duly     -3
             Credit_history  critical account/ other credits existing (not at this bank)      3
             Credit_history                              delay in paying off in the past      1
             Credit_history                 no credits taken/ all credits paid back duly     -3
              Credit_amount                                               (-inf, 1401.0]     -1
 

In [10]:
advs = find_adversarial_examples(srr, X_train, y_train,
                                 unit_changes=False,
                                 can_change=['Duration_in_months', 'Credit_amount', 'Purpose'])

                                                 

Found adversarial examples for 75.95 % of the correctly classified points


## bankruptcy

In [11]:
dataset = 'bankruptcy'
print(f"-> {dataset} dataset")
# Load the data
X, y = load_dataset(name=dataset)

# Apply the processing pipeline
X_train, X_test, y_train, y_test = processing_pipeline(X, y)

# Construct and train Select-Regress-Round model
srr = SRR(k=3, M=5)
srr.fit(X_train, y_train, verbose=1)

srr.show_scoring_table()

-> bankruptcy dataset
Loading bankruptcy...
Selected features competitiveness, credibility, financial_flex
Logistic model accuracy of 100.0 % on the training set (baseline 57.3 %)
Select-Regress-Round (SRR) [k=3, M=5]

         Feature Category  Score
 competitiveness        N      5
 competitiveness        P     -2
     credibility        N      3
  financial_flex        N      2

Intercept: -4

Predict class 1 if sum of scores and intercept is >= 0, otherwise predict 0.


In [12]:
advs = find_adversarial_examples(srr, X_train, y_train, can_change=['competitiveness'], unit_changes=True)

                                                  

Found adversarial examples for 100.00 % of the correctly classified points




In [13]:
advs = find_adversarial_examples(srr, X_train, y_train, unit_changes=False,
                                 can_change=['credibility', 'financial_flex'])

                                                 

Found adversarial examples for 21.78 % of the correctly classified points




In [14]:
advs = find_adversarial_examples(srr, X_train, y_train, can_change=['credibility'], unit_changes=True)

                                                  

Found adversarial examples for 2.67 % of the correctly classified points


In [15]:
advs = find_adversarial_examples(srr, X_train, y_train, can_change=['financial_flex'], unit_changes=True)

                                                  

Found adversarial examples for 1.78 % of the correctly classified points


## IEEE-CIS

In [16]:
dataset = 'ieeecis'
print(f"-> {dataset} dataset")
# Load the data
X, y = load_dataset(name=dataset)

# This dataset is too big so get a reasonably-sized subset
X_subset = pd.concat([
    X[y == 1].sample(n=1500, random_state=15),
    X[y == 0].sample(n=1500, random_state=15)
])
y_subset = y.loc[X_subset.index]

del X
del y

-> ieeecis dataset
Loading ieeecis...


In [17]:
# Apply the processing pipeline
X_train, X_test, y_train, y_test = processing_pipeline(X_subset, y_subset, nbins=3)

# Construct and train Select-Regress-Round model
srr = SRR(k=3, M=5)
srr.fit(X_train, y_train)

srr.show_scoring_table()

Select-Regress-Round (SRR) [k=3, M=5]

        Feature            Category  Score
        browser     android webview      2
        browser  chrome for android      2
        browser                edge     -2
        browser             firefox     -1
        browser      ie for desktop     -3
        browser       ie for tablet     -2
        browser               opera      5
        browser               other      1
        browser              safari     -2
        browser     samsung browser      1
        browser             unknown     -2
      card_type              credit      1
      card_type                 nan     -2
 TransactionAmt      (108.476, inf]      1

Intercept: 0

Predict class 1 if sum of scores and intercept is >= 0, otherwise predict 0.


In [18]:
advs = find_adversarial_examples(srr, X_train, y_train, can_change=['browser'], unit_changes=True)

                                                   

Found adversarial examples for 100.00 % of the correctly classified points


In [19]:
advs = find_adversarial_examples(srr, X_train, y_train, can_change=['card_type'], unit_changes=True)

                                                    

Found adversarial examples for 48.59 % of the correctly classified points


In [20]:
advs = find_adversarial_examples(srr, X_train, y_train, can_change=['TransactionAmt'], unit_changes=True)

                                                    

Found adversarial examples for 14.22 % of the correctly classified points




In [21]:
advs = find_adversarial_examples(srr, X_train, y_train, can_change=['card_type', 'TransactionAmt'], unit_changes=False)

                                                   

Found adversarial examples for 88.12 % of the correctly classified points
