# Links to dataset information

**UCI**:
- __[adult](http://archive.ics.uci.edu/ml/datasets/Adult)__
- __[annealing](https://archive.ics.uci.edu/ml/datasets/Annealing)__
- __[audiology-std](https://archive.ics.uci.edu/ml/datasets/Audiology+%28Standardized%29)__
- __[bank](https://archive.ics.uci.edu/ml/datasets/Bank%2BMarketing)__
- __[bankruptcy](http://archive.ics.uci.edu/ml/datasets/Qualitative_Bankruptcy)__
- __[car](https://archive.ics.uci.edu/ml/datasets/Car+Evaluation)__
- __[chess-krvk](https://archive.ics.uci.edu/ml/datasets/Chess+%28King-Rook+vs.+King%29)__
- __[chess-krvkp](http://archive.ics.uci.edu/ml/datasets/Chess+%28King-Rook+vs.+King-Pawn%29)__
- __[congress-voting](https://archive.ics.uci.edu/ml/datasets/Congressional+Voting+Records)__
- __[contrac](https://archive.ics.uci.edu/ml/datasets/Contraceptive+Method+Choice)__
- __[credit-approval](http://archive.ics.uci.edu/ml/datasets/Credit+Approval)__
- **unsure about this one**: __[ctg](https://www.kaggle.com/akshat0007/fetalhr)__
- __[cylinder-bands](http://archive.ics.uci.edu/ml/datasets/Cylinder+Bands)__
- __[dermatology](https://archive.ics.uci.edu/ml/datasets/Dermatology)__
- __[german_credit](https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29)__
- __[heart-cleveland](https://archive.ics.uci.edu/ml/datasets/Heart+Disease)__
- __[ilpd](http://archive.ics.uci.edu/ml/datasets/ILPD+%28Indian+Liver+Patient+Dataset%29)__
- __[mammo](https://archive.ics.uci.edu/ml/datasets/Mammographic+Mass)__
- __[mushroom](https://archive.ics.uci.edu/ml/datasets/Mushroom)__
- __[wine](https://archive.ics.uci.edu/ml/datasets/wine)__
- __[wine_qual](https://archive.ics.uci.edu/ml/datasets/Wine+Quality)__

Others:
- __[texas](https://www.dshs.texas.gov/thcic/hospitals/UserManual1Q2013.pdf)__
- __[IEEECIS](https://www.kaggle.com/c/ieee-fraud-detection/discussion/101203)__


# Imports

In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

from tqdm import tqdm

In [2]:
from src.loader import load_dataset
from src.models import SRR, train_srr
from src.preprocessing import processing_pipeline
from src.feature_selection import forward_stepwise_regression
from src.vulnerabilities import *

In [3]:
uci_datasets = ['adult', 'annealing', 'audiology-std', 'bank', 'bankruptcy', 'car',
                'chess-krvk', 'chess-krvkp', 'congress-voting', 'contrac', 'credit-approval',
                'ctg', 'cylinder-bands', 'dermatology', 'german_credit', 'heart-cleveland',
                'ilpd', 'mammo', 'mushroom', 'wine', 'wine_qual']

all_datasets = uci_datasets + ['texas', 'ieeecis']

# Greedy Hyperparameter Attack

## german_credit

In [4]:
# Load the data
X, y = load_dataset(name='german_credit')

params = {
    'train_size': 0.9,
    'seed': 100,
    'nbins': 3,
    'k': 3,
    'M': 5,
    'cv': 5,
    'Cs': 20,
    'max_iter': 150,
    'random_state': 42
}

original_srr = train_srr(X, y, params)

original_srr.show_scoring_table()

Loading german_credit...
Select-Regress-Round (SRR) [k=3, M=5]

                    Feature                                                     Category  Score
 Status_of_checking_account                                                   ... < 0 DM     -3
 Status_of_checking_account       ... >= 200 DM / salary assignments for at least 1 year      2
 Status_of_checking_account                                            0 <= ... < 200 DM     -1
 Status_of_checking_account                                          no checking account      5
             Credit_history                      all credits at this bank paid back duly     -3
             Credit_history  critical account/ other credits existing (not at this bank)      3
             Credit_history                 no credits taken/ all credits paid back duly     -3
         Duration_in_months                                                 (-inf, 11.5]      3
         Duration_in_months                                             

### Removing features

In [6]:
for feat in original_srr.features:
    print(f'-----> {feat}')
    poisoning_attack_hyperparameters(original_srr, X, y,
                                     feature=feat, 
                                     goal='remove_feature',
                                     greedy=True,
                                     train_size_list=[0.8, 0.9],
                                     seed_list=[0, 42, 1337],
                                     nbins_list=[3, 4, 5],
                                     cv_list=[5, 10], Cs_list=[10, 100, 1000],
                                     max_iter_list=[50, 150],
                                     random_state_list=[0, 42, 1337])

-----> Status_of_checking_account
Could not achieve the goal greedily.
-----> Credit_history
Achieved goal!Changed parameters from
{'k': 3, 'M': 5, 'train_size': 0.9, 'seed': 100, 'nbins': 3, 'cv': 5, 'Cs': 20, 'max_iter': 150, 'random_state': 42}
to
{'k': 3, 'M': 5, 'train_size': 0.9, 'seed': 42, 'nbins': 5, 'cv': 5, 'Cs': 1000, 'max_iter': 50, 'random_state': 0}
Resulting model:
Select-Regress-Round (SRR) [k=3, M=5]

                    Feature                                                Category  Score
 Status_of_checking_account                                              ... < 0 DM     -2
 Status_of_checking_account  ... >= 200 DM / salary assignments for at least 1 year      1
 Status_of_checking_account                                       0 <= ... < 200 DM     -1
 Status_of_checking_account                                     no checking account      5
         Duration_in_months                                            (-inf, 11.5]      4
         Duration_in_months    

### Nullifying

In [8]:
for feat in original_srr.features:
    for cat in original_srr.df.loc[feat].index:
        if original_srr.get_weight(feat, cat) != 0:
            print(f'--- > {feat} - {cat}')
            poisoning_attack_hyperparameters(original_srr, X, y,
                                             feature=feat,
                                             category=cat,
                                             goal='nullify',
                                             greedy=True,
                                             train_size_list=[0.8, 0.9],
                                             seed_list=[0, 42, 1337],
                                             nbins_list=[3, 4, 5],
                                             cv_list=[5, 10], Cs_list=[10, 100, 1000],
                                             max_iter_list=[50, 150],
                                             random_state_list=[0, 42, 1337])
        print('\n\n')

--- > Status_of_checking_account - ... < 0 DM
Could not achieve the goal greedily.



--- > Status_of_checking_account - ... >= 200 DM / salary assignments for at least 1 year
Achieved goal!Changed parameters from
{'k': 3, 'M': 5, 'train_size': 0.9, 'seed': 100, 'nbins': 3, 'cv': 5, 'Cs': 20, 'max_iter': 150, 'random_state': 42}
to
{'k': 3, 'M': 5, 'train_size': 0.8, 'seed': 0, 'nbins': 3, 'cv': 5, 'Cs': 100, 'max_iter': 50, 'random_state': 0}
Resulting model:
Select-Regress-Round (SRR) [k=3, M=5]

                    Feature                                                     Category  Score
 Status_of_checking_account                                                   ... < 0 DM     -3
 Status_of_checking_account       ... >= 200 DM / salary assignments for at least 1 year      0
 Status_of_checking_account                                            0 <= ... < 200 DM     -1
 Status_of_checking_account                                          no checking account      5
             Cre

### Flip sign

In [9]:
for feat in original_srr.features:
    for cat in original_srr.df.loc[feat].index:
        if original_srr.get_weight(feat, cat) != 0:
            print(f'--- > {feat} - {cat}')
            poisoning_attack_hyperparameters(original_srr, X, y,
                                       feature=feat,
                                       category=cat,
                                       goal='flip_sign',
                                       greedy=True,
                                       train_size_list=[0.8, 0.9],
                                       seed_list=[0, 42, 1337],
                                       nbins_list=[3, 4, 5],
                                       cv_list=[5, 10], Cs_list=[10, 100, 1000],
                                       max_iter_list=[50, 150],
                                       random_state_list=[0, 42, 1337])
            print('\n\n')

--- > Status_of_checking_account - ... < 0 DM
Could not achieve the goal greedily.



--- > Status_of_checking_account - ... >= 200 DM / salary assignments for at least 1 year
Could not achieve the goal greedily.



--- > Status_of_checking_account - 0 <= ... < 200 DM
Could not achieve the goal greedily.



--- > Status_of_checking_account - no checking account
Could not achieve the goal greedily.



--- > Credit_history - all credits at this bank paid back duly
Could not achieve the goal greedily.



--- > Credit_history - critical account/ other credits existing (not at this bank)
Could not achieve the goal greedily.



--- > Credit_history - no credits taken/ all credits paid back duly
Could not achieve the goal greedily.



--- > Duration_in_months - (-inf, 11.5]
Could not achieve the goal greedily.



--- > Duration_in_months - (23.0, inf]
Could not achieve the goal greedily.





## bankruptcy

In [10]:
# Load the data
X, y = load_dataset(name='bankruptcy')

params = {
    'train_size': 0.9,
    'seed': 100,
    'nbins': 3,
    'k': 3,
    'M': 5,
    'cv': 5,
    'Cs': 20,
    'max_iter': 150,
    'random_state': 42
}

original_srr = train_srr(X, y, params)

original_srr.show_scoring_table()

Loading bankruptcy...
Select-Regress-Round (SRR) [k=3, M=5]

         Feature Category  Score
 competitiveness        N      5
 competitiveness        P     -2
     credibility        N      3
  financial_flex        N      2

Intercept: -4

Predict class 1 if sum of scores and intercept is >= 0, otherwise predict 0.


### Removing features

In [11]:
for feat in original_srr.features:
    print(f'-----> {feat}')
    poisoning_attack_hyperparameters(original_srr, X, y,
                                     feature=feat, 
                                     goal='remove_feature',
                                     greedy=True,
                                     train_size_list=[0.8, 0.9],
                                     seed_list=[0, 42, 1337],
                                     nbins_list=[3, 4, 5],
                                     cv_list=[5, 10], Cs_list=[10, 100, 1000],
                                     max_iter_list=[50, 150],
                                     random_state_list=[0, 42, 1337])

-----> competitiveness
Could not achieve the goal greedily.
-----> credibility
Could not achieve the goal greedily.
-----> financial_flex
Could not achieve the goal greedily.


### Nullifying

In [12]:
for feat in original_srr.features:
    for cat in original_srr.df.loc[feat].index:
        if original_srr.get_weight(feat, cat) != 0:
            print(f'--- > {feat} - {cat}')
            poisoning_attack_hyperparameters(original_srr, X, y,
                                             feature=feat,
                                             category=cat,
                                             goal='nullify',
                                             greedy=True,
                                             train_size_list=[0.8, 0.9],
                                             seed_list=[0, 42, 1337],
                                             nbins_list=[3, 4, 5],
                                             cv_list=[5, 10], Cs_list=[10, 100, 1000],
                                             max_iter_list=[50, 150],
                                             random_state_list=[0, 42, 1337])
        print('\n\n')




--- > competitiveness - N
Could not achieve the goal greedily.



--- > competitiveness - P
Could not achieve the goal greedily.






--- > credibility - N
Achieved goal!Changed parameters from
{'k': 3, 'M': 5, 'train_size': 0.9, 'seed': 100, 'nbins': 3, 'cv': 5, 'Cs': 20, 'max_iter': 150, 'random_state': 42}
to
{'k': 3, 'M': 5, 'train_size': 0.9, 'seed': 42, 'nbins': 3, 'cv': 5, 'Cs': 1000, 'max_iter': 50, 'random_state': 1337}
Resulting model:
Select-Regress-Round (SRR) [k=3, M=5]

         Feature Category  Score
 competitiveness        A      0
 competitiveness        N      5
 competitiveness        P      0
  financial_flex        A      0
  financial_flex        N      0
  financial_flex        P      0
     credibility        A      0
     credibility        N      0
     credibility        P      0

Intercept: -3

Predict class 1 if sum of scores and intercept is >= 0, otherwise predict 0.









--- > financial_flex - N
Achieved goal!Changed parameters from
{'k': 3, 'M'

### Flip sign

In [13]:
for feat in original_srr.features:
    for cat in original_srr.df.loc[feat].index:
        if original_srr.get_weight(feat, cat) != 0:
            print(f'--- > {feat} - {cat}')
            poisoning_attack_hyperparameters(original_srr, X, y,
                                       feature=feat,
                                       category=cat,
                                       goal='flip_sign',
                                       greedy=True,
                                       train_size_list=[0.8, 0.9],
                                       seed_list=[0, 42, 1337],
                                       nbins_list=[3, 4, 5],
                                       cv_list=[5, 10], Cs_list=[10, 100, 1000],
                                       max_iter_list=[50, 150],
                                       random_state_list=[0, 42, 1337])
            print('\n\n')

--- > competitiveness - N
Could not achieve the goal greedily.



--- > competitiveness - P
Could not achieve the goal greedily.



--- > credibility - N
Could not achieve the goal greedily.



--- > financial_flex - N
Could not achieve the goal greedily.



