# Links to dataset information

**UCI**:
- __[adult](http://archive.ics.uci.edu/ml/datasets/Adult)__
- __[annealing](https://archive.ics.uci.edu/ml/datasets/Annealing)__
- __[audiology-std](https://archive.ics.uci.edu/ml/datasets/Audiology+%28Standardized%29)__
- __[bank](https://archive.ics.uci.edu/ml/datasets/Bank%2BMarketing)__
- __[bankruptcy](http://archive.ics.uci.edu/ml/datasets/Qualitative_Bankruptcy)__
- __[car](https://archive.ics.uci.edu/ml/datasets/Car+Evaluation)__
- __[chess-krvk](https://archive.ics.uci.edu/ml/datasets/Chess+%28King-Rook+vs.+King%29)__
- __[chess-krvkp](http://archive.ics.uci.edu/ml/datasets/Chess+%28King-Rook+vs.+King-Pawn%29)__
- __[congress-voting](https://archive.ics.uci.edu/ml/datasets/Congressional+Voting+Records)__
- __[contrac](https://archive.ics.uci.edu/ml/datasets/Contraceptive+Method+Choice)__
- __[credit-approval](http://archive.ics.uci.edu/ml/datasets/Credit+Approval)__
- **unsure about this one**: __[ctg](https://www.kaggle.com/akshat0007/fetalhr)__
- __[cylinder-bands](http://archive.ics.uci.edu/ml/datasets/Cylinder+Bands)__
- __[dermatology](https://archive.ics.uci.edu/ml/datasets/Dermatology)__
- __[german_credit](https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29)__
- __[heart-cleveland](https://archive.ics.uci.edu/ml/datasets/Heart+Disease)__
- __[ilpd](http://archive.ics.uci.edu/ml/datasets/ILPD+%28Indian+Liver+Patient+Dataset%29)__
- __[mammo](https://archive.ics.uci.edu/ml/datasets/Mammographic+Mass)__
- __[mushroom](https://archive.ics.uci.edu/ml/datasets/Mushroom)__
- __[wine](https://archive.ics.uci.edu/ml/datasets/wine)__
- __[wine_qual](https://archive.ics.uci.edu/ml/datasets/Wine+Quality)__

Others:
- __[texas](https://www.dshs.texas.gov/thcic/hospitals/UserManual1Q2013.pdf)__
- __[IEEECIS](https://www.kaggle.com/c/ieee-fraud-detection/discussion/101203)__


# Imports

In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

from tqdm import tqdm

In [2]:
from src.loader import load_dataset
from src.models import SRR, train_srr
from src.preprocessing import processing_pipeline
from src.feature_selection import forward_stepwise_regression
from src.vulnerabilities import *

In [3]:
uci_datasets = ['adult', 'annealing', 'audiology-std', 'bank', 'bankruptcy', 'car',
                'chess-krvk', 'chess-krvkp', 'congress-voting', 'contrac', 'credit-approval',
                'ctg', 'cylinder-bands', 'dermatology', 'german_credit', 'heart-cleveland',
                'ilpd', 'mammo', 'mushroom', 'wine', 'wine_qual']

all_datasets = uci_datasets + ['texas', 'ieeecis']

# Greedy column removal attack

## german_credit

In [4]:
# Load the data
X, y = load_dataset(name='german_credit')

params = {
    'train_size': 0.9,
    'seed': 100,
    'nbins': 3,
    'k': 3,
    'M': 5,
    'cv': 5,
    'Cs': 20,
    'max_iter': 150,
    'random_state': 42
}

X_train, X_test, y_train, y_test = processing_pipeline(X, y, train_size=params['train_size'],
                                                           seed=params['seed'], nbins=params['nbins'])

original_srr = SRR(k=params['k'], M=params['M'],
                   cv=params['cv'], Cs=params['Cs'], 
                   max_iter=params['max_iter'],
                   random_state=params['random_state'])
original_srr.fit(X_train, y_train,
                 train_size=params['train_size'], 
                 seed=params['seed'], 
                 nbins=params['nbins'])

original_srr.show_scoring_table()

Loading german_credit...
Select-Regress-Round (SRR) [k=3, M=5]

                    Feature                                                     Category  Score
 Status_of_checking_account                                                   ... < 0 DM     -3
 Status_of_checking_account       ... >= 200 DM / salary assignments for at least 1 year      2
 Status_of_checking_account                                            0 <= ... < 200 DM     -1
 Status_of_checking_account                                          no checking account      5
             Credit_history                      all credits at this bank paid back duly     -3
             Credit_history  critical account/ other credits existing (not at this bank)      3
             Credit_history                 no credits taken/ all credits paid back duly     -3
         Duration_in_months                                                 (-inf, 11.5]      3
         Duration_in_months                                             

### Removing features

In [5]:
for feat in original_srr.features:
    print(f'-----> {feat}')
    poisoning_attack_drop_columns(original_srr, X_train, y_train, 
                                  feature=feat,
                                  goal='remove_feature',
                                  greedy=True)

  0%|          | 0/19 [00:00<?, ?it/s]

-----> Status_of_checking_account


  0%|          | 0/19 [00:00<?, ?it/s]         

Attack failed, removed too many columns. Tried removals:
['Purpose', 'Credit_history', 'Credit_amount', 'Employment_since', 'Installment_rate', 'status_and_sex', 'Other_debtors', 'residence_since', 'Property', 'Age', 'Other_installment', 'Housing', 'Savings_account', 'Duration_in_months', 'existing_credits', 'Job', 'Number_of_dependents']
-----> Credit_history


  0%|          | 0/19 [00:00<?, ?it/s]         

Attack failed, removed too many columns. Tried removals:
['Purpose', 'Credit_amount', 'Savings_account', 'Employment_since', 'Installment_rate', 'status_and_sex', 'Other_debtors', 'residence_since', 'Property', 'Age', 'Other_installment', 'Housing', 'existing_credits', 'Job', 'Number_of_dependents', 'Have_Telephone', 'foreign_worker']
-----> Duration_in_months


                                               

Attack failed, removed too many columns. Tried removals:
['Purpose', 'Status_of_checking_account', 'Credit_amount', 'Employment_since', 'Installment_rate', 'status_and_sex', 'Other_debtors', 'residence_since', 'Property', 'Age', 'Other_installment', 'Housing', 'Credit_history', 'existing_credits', 'Job', 'Number_of_dependents', 'Have_Telephone']




### Nullifying

In [6]:
for feat in original_srr.features:
    for cat in original_srr.df.loc[feat].index:
        if original_srr.get_weight(feat, cat) != 0:
            
            print(f'--- > {feat} - {cat}')
            poisoning_attack_drop_columns(original_srr, X_train, y_train, 
                                          feature=feat,
                                          category=cat,
                                          goal='nullify',
                                          greedy=True)
            print('\n\n')

  0%|          | 0/19 [00:00<?, ?it/s]

--- > Status_of_checking_account - ... < 0 DM


  0%|          | 0/19 [00:00<?, ?it/s]         

Attack successful! Removals:
['Purpose', 'Credit_history', 'Credit_amount', 'Employment_since', 'Installment_rate', 'status_and_sex', 'Other_debtors', 'residence_since', 'Property', 'Age', 'Other_installment', 'Housing', 'Duration_in_months', 'Savings_account']

Resulting model:
Select-Regress-Round (SRR) [k=3, M=5]

                    Feature                                                Category  Score
 Status_of_checking_account                                              ... < 0 DM      0
 Status_of_checking_account  ... >= 200 DM / salary assignments for at least 1 year      0
 Status_of_checking_account                                       0 <= ... < 200 DM      0
 Status_of_checking_account                                     no checking account      0
             foreign_worker                                                      no      0
             foreign_worker                                                     yes      0
           existing_credits                 

  0%|          | 0/19 [00:00<?, ?it/s]         

Attack successful! Removals:
['Purpose', 'Credit_history', 'Savings_account', 'Credit_amount', 'Employment_since', 'Installment_rate', 'status_and_sex', 'residence_since', 'Property', 'Duration_in_months']

Resulting model:
Select-Regress-Round (SRR) [k=3, M=5]

                    Feature                                                Category  Score
 Status_of_checking_account                                              ... < 0 DM      0
 Status_of_checking_account  ... >= 200 DM / salary assignments for at least 1 year      0
 Status_of_checking_account                                       0 <= ... < 200 DM      0
 Status_of_checking_account                                     no checking account      0
              Other_debtors                                            co-applicant      0
              Other_debtors                                               guarantor      0
              Other_debtors                                                    none      0
         

  0%|          | 0/19 [00:00<?, ?it/s]         

Attack failed, removed too many columns. Tried removals:
['Purpose', 'Credit_amount', 'Savings_account', 'Employment_since', 'Installment_rate', 'status_and_sex', 'Other_debtors', 'residence_since', 'Property', 'Age', 'Other_installment', 'Housing', 'existing_credits', 'Job', 'Number_of_dependents', 'Have_Telephone', 'foreign_worker']



--- > Status_of_checking_account - no checking account


  0%|          | 0/19 [00:00<?, ?it/s]         

Attack successful! Removals:
['Duration_in_months', 'Purpose', 'Credit_amount', 'Savings_account', 'Employment_since', 'Installment_rate', 'status_and_sex', 'residence_since', 'Property', 'Credit_history']

Resulting model:
Select-Regress-Round (SRR) [k=3, M=5]

                    Feature                                                Category  Score
 Status_of_checking_account                                              ... < 0 DM      0
 Status_of_checking_account  ... >= 200 DM / salary assignments for at least 1 year      0
 Status_of_checking_account                                       0 <= ... < 200 DM      0
 Status_of_checking_account                                     no checking account      0
              Other_debtors                                            co-applicant      0
              Other_debtors                                               guarantor      0
              Other_debtors                                                    none      0
         

  0%|          | 0/19 [00:00<?, ?it/s]         

Attack failed, removed too many columns. Tried removals:
['Purpose', 'Duration_in_months', 'Other_debtors', 'Property', 'Credit_amount', 'Savings_account', 'Employment_since', 'Installment_rate', 'status_and_sex', 'residence_since', 'Age', 'Other_installment', 'Housing', 'existing_credits', 'Job', 'Number_of_dependents', 'Have_Telephone']



--- > Credit_history - critical account/ other credits existing (not at this bank)


  0%|          | 0/19 [00:00<?, ?it/s]         

Attack failed, removed too many columns. Tried removals:
['Purpose', 'Credit_amount', 'Savings_account', 'Employment_since', 'Installment_rate', 'status_and_sex', 'Other_debtors', 'residence_since', 'Property', 'Age', 'Other_installment', 'Housing', 'existing_credits', 'Job', 'Number_of_dependents', 'Have_Telephone', 'foreign_worker']



--- > Credit_history - delay in paying off in the past
Already 0.



--- > Credit_history - existing credits paid back duly till now
Already 0.



--- > Credit_history - no credits taken/ all credits paid back duly


  0%|          | 0/19 [00:00<?, ?it/s]         

Attack failed, removed too many columns. Tried removals:
['Purpose', 'Status_of_checking_account', 'Credit_amount', 'Employment_since', 'Installment_rate', 'status_and_sex', 'Other_debtors', 'residence_since', 'Property', 'Age', 'Other_installment', 'Housing', 'existing_credits', 'Job', 'Number_of_dependents', 'Have_Telephone', 'foreign_worker']



--- > Duration_in_months - (-inf, 11.5]


  0%|          | 0/19 [00:00<?, ?it/s]         

Attack successful! Removals:
['Purpose', 'Credit_amount', 'Savings_account', 'Credit_history', 'Employment_since', 'Installment_rate', 'status_and_sex', 'residence_since', 'Property', 'Age', 'Other_installment', 'Housing', 'Status_of_checking_account']

Resulting model:
Select-Regress-Round (SRR) [k=3, M=5]

            Feature                                  Category  Score
 Duration_in_months                              (-inf, 11.5]      0
 Duration_in_months                              (11.5, 23.0]      0
 Duration_in_months                               (23.0, inf]      0
   existing_credits                               (-inf, 1.0]      0
   existing_credits                                (1.5, inf]      0
     Have_Telephone                                      none      0
     Have_Telephone  yes, registered under the customers name      0

Intercept: 5

Predict class 1 if sum of scores and intercept is >= 0, otherwise predict 0.



--- > Duration_in_months - (11.5, 23.0]
Alr

                                               

Attack successful! Removals:
['Purpose', 'Status_of_checking_account', 'Credit_amount', 'Employment_since', 'Installment_rate', 'status_and_sex', 'Other_debtors', 'residence_since', 'Property', 'Age', 'Other_installment', 'Housing', 'Credit_history']

Resulting model:
Select-Regress-Round (SRR) [k=3, M=5]

            Feature                     Category  Score
 Duration_in_months                 (-inf, 11.5]      0
 Duration_in_months                 (11.5, 23.0]      0
 Duration_in_months                  (23.0, inf]      0
    Savings_account                .. >= 1000 DM      0
    Savings_account                 ... < 100 DM      0
    Savings_account          100 <= ... < 500 DM      0
    Savings_account         500 <= ... < 1000 DM      0
    Savings_account  unknown/ no savings account      0
   existing_credits                  (-inf, 1.0]      0
   existing_credits                   (1.5, inf]      0

Intercept: 5

Predict class 1 if sum of scores and intercept is >= 0, other



### Flip sign

In [8]:
for feat in original_srr.features:
    for cat in original_srr.df.loc[feat].index:
        if original_srr.get_weight(feat, cat) != 0:
            
            print(f'--- > {feat} - {cat}')
            poisoning_attack_drop_columns(original_srr, X_train, y_train, 
                                          feature=feat,
                                          category=cat,
                                          goal='flip_sign',
                                          greedy=True)
            print('\n\n')

  0%|          | 0/19 [00:00<?, ?it/s]

--- > Status_of_checking_account - ... < 0 DM


  0%|          | 0/19 [00:00<?, ?it/s]         

Attack failed, removed too many columns. Tried removals:
['Purpose', 'Credit_history', 'Credit_amount', 'Employment_since', 'Installment_rate', 'status_and_sex', 'Other_debtors', 'residence_since', 'Property', 'Age', 'Other_installment', 'Housing', 'Duration_in_months', 'foreign_worker', 'existing_credits', 'Job', 'Number_of_dependents']



--- > Status_of_checking_account - ... >= 200 DM / salary assignments for at least 1 year


  0%|          | 0/19 [00:00<?, ?it/s]         

Attack failed, removed too many columns. Tried removals:
['Purpose', 'Credit_history', 'Savings_account', 'Credit_amount', 'Employment_since', 'Installment_rate', 'status_and_sex', 'residence_since', 'Property', 'Duration_in_months', 'Other_installment', 'Age', 'Housing', 'Other_debtors', 'existing_credits', 'Job', 'Number_of_dependents']



--- > Status_of_checking_account - 0 <= ... < 200 DM


  0%|          | 0/19 [00:00<?, ?it/s]         

Attack failed, removed too many columns. Tried removals:
['Purpose', 'Credit_amount', 'Savings_account', 'Employment_since', 'Installment_rate', 'status_and_sex', 'Other_debtors', 'residence_since', 'Property', 'Age', 'Other_installment', 'Housing', 'existing_credits', 'Job', 'Number_of_dependents', 'Have_Telephone', 'foreign_worker']



--- > Status_of_checking_account - no checking account


  0%|          | 0/19 [00:00<?, ?it/s]         

Attack failed, removed too many columns. Tried removals:
['Duration_in_months', 'Purpose', 'Credit_amount', 'Savings_account', 'Employment_since', 'Installment_rate', 'status_and_sex', 'residence_since', 'Property', 'Credit_history', 'Other_installment', 'Age', 'Housing', 'Other_debtors', 'existing_credits', 'Job', 'Number_of_dependents']



--- > Credit_history - all credits at this bank paid back duly


  0%|          | 0/19 [00:00<?, ?it/s]         

Attack failed, removed too many columns. Tried removals:
['Purpose', 'Duration_in_months', 'Other_debtors', 'Property', 'Credit_amount', 'Savings_account', 'Employment_since', 'Installment_rate', 'status_and_sex', 'residence_since', 'Age', 'Other_installment', 'Housing', 'existing_credits', 'Job', 'Number_of_dependents', 'Have_Telephone']



--- > Credit_history - critical account/ other credits existing (not at this bank)


  0%|          | 0/19 [00:00<?, ?it/s]         

Attack failed, removed too many columns. Tried removals:
['Purpose', 'Credit_amount', 'Savings_account', 'Employment_since', 'Installment_rate', 'status_and_sex', 'Other_debtors', 'residence_since', 'Property', 'Age', 'Other_installment', 'Housing', 'existing_credits', 'Job', 'Number_of_dependents', 'Have_Telephone', 'foreign_worker']



--- > Credit_history - no credits taken/ all credits paid back duly


  0%|          | 0/19 [00:00<?, ?it/s]         

Attack failed, removed too many columns. Tried removals:
['Purpose', 'Status_of_checking_account', 'Credit_amount', 'Employment_since', 'Installment_rate', 'status_and_sex', 'Other_debtors', 'residence_since', 'Property', 'Age', 'Other_installment', 'Housing', 'existing_credits', 'Job', 'Number_of_dependents', 'Have_Telephone', 'foreign_worker']



--- > Duration_in_months - (-inf, 11.5]


  0%|          | 0/19 [00:00<?, ?it/s]         

Attack failed, removed too many columns. Tried removals:
['Purpose', 'Credit_amount', 'Savings_account', 'Credit_history', 'Employment_since', 'Installment_rate', 'status_and_sex', 'residence_since', 'Property', 'Age', 'Other_installment', 'Housing', 'Status_of_checking_account', 'Other_debtors', 'existing_credits', 'Job', 'Number_of_dependents']



--- > Duration_in_months - (23.0, inf]


                                               

Attack failed, removed too many columns. Tried removals:
['Purpose', 'Status_of_checking_account', 'Credit_amount', 'Employment_since', 'Installment_rate', 'status_and_sex', 'Other_debtors', 'residence_since', 'Property', 'Age', 'Other_installment', 'Housing', 'Credit_history', 'existing_credits', 'Job', 'Number_of_dependents', 'Have_Telephone']







## bankruptcy

In [11]:
# Load the data
X, y = load_dataset(name='bankruptcy')

params = {
    'train_size': 0.9,
    'seed': 100,
    'nbins': 3,
    'k': 3,
    'M': 5,
    'cv': 5,
    'Cs': 20,
    'max_iter': 150,
    'random_state': 42
}

X_train, X_test, y_train, y_test = processing_pipeline(X, y, train_size=params['train_size'],
                                                           seed=params['seed'], nbins=params['nbins'])

original_srr = SRR(k=params['k'], M=params['M'],
                   cv=params['cv'], Cs=params['Cs'], 
                   max_iter=params['max_iter'],
                   random_state=params['random_state'])
original_srr.fit(X_train, y_train,
                 train_size=params['train_size'], 
                 seed=params['seed'], 
                 nbins=params['nbins'])

original_srr.show_scoring_table()

Loading bankruptcy...
Select-Regress-Round (SRR) [k=3, M=5]

         Feature Category  Score
 competitiveness        N      5
 competitiveness        P     -2
     credibility        N      3
  financial_flex        N      2

Intercept: -4

Predict class 1 if sum of scores and intercept is >= 0, otherwise predict 0.


### Removing features

In [12]:
for feat in original_srr.features:
    print(f'-----> {feat}')
    poisoning_attack_drop_columns(original_srr, X_train, y_train, 
                                  feature=feat,
                                  goal='remove_feature',
                                  greedy=True)

  0%|          | 0/5 [00:00<?, ?it/s]

-----> competitiveness


  0%|          | 0/5 [00:00<?, ?it/s]        

Attack failed, removed too many columns. Tried removals:
['industrial_risk', 'management_risk', 'operating_risk']
-----> credibility


  0%|          | 0/5 [00:00<?, ?it/s]        

Attack failed, removed too many columns. Tried removals:
['industrial_risk', 'management_risk', 'operating_risk']
-----> financial_flex


                                             

Attack failed, removed too many columns. Tried removals:
['industrial_risk', 'management_risk', 'operating_risk']




### Nullifying

In [15]:
for feat in original_srr.features:
    for cat in original_srr.df.loc[feat].index:
        if original_srr.get_weight(feat, cat) != 0:
            print(f'--- > {feat} - {cat}')
            poisoning_attack_drop_columns(original_srr, X_train, y_train, 
                                          feature=feat,
                                          category=cat,
                                          goal='nullify',
                                          greedy=True)
            print('\n\n')

  0%|          | 0/5 [00:00<?, ?it/s]

--- > competitiveness - N


  0%|          | 0/5 [00:00<?, ?it/s]        

Attack failed, removed too many columns. Tried removals:
['industrial_risk', 'management_risk', 'operating_risk']



--- > competitiveness - P


  0%|          | 0/5 [00:00<?, ?it/s]        

Attack failed, removed too many columns. Tried removals:
['financial_flex', 'industrial_risk', 'management_risk']



--- > credibility - N


  0%|          | 0/5 [00:00<?, ?it/s]        

Attack failed, removed too many columns. Tried removals:
['industrial_risk', 'competitiveness', 'management_risk']



--- > financial_flex - N


                                             

Attack failed, removed too many columns. Tried removals:
['industrial_risk', 'management_risk', 'operating_risk']







### Flip sign

In [16]:
for feat in original_srr.features:
    for cat in original_srr.df.loc[feat].index:
        if original_srr.get_weight(feat, cat) != 0:
            print(f'--- > {feat} - {cat}')
            poisoning_attack_drop_columns(original_srr, X_train, y_train, 
                                          feature=feat,
                                          category=cat,
                                          goal='flip_sign',
                                          greedy=True)
            print('\n\n')

  0%|          | 0/5 [00:00<?, ?it/s]

--- > competitiveness - N


  0%|          | 0/5 [00:00<?, ?it/s]        

Attack failed, removed too many columns. Tried removals:
['industrial_risk', 'management_risk', 'operating_risk']



--- > competitiveness - P


  0%|          | 0/5 [00:00<?, ?it/s]        

Attack failed, removed too many columns. Tried removals:
['financial_flex', 'industrial_risk', 'management_risk']



--- > credibility - N


  0%|          | 0/5 [00:00<?, ?it/s]        

Attack failed, removed too many columns. Tried removals:
['industrial_risk', 'competitiveness', 'management_risk']



--- > financial_flex - N


                                             

Attack failed, removed too many columns. Tried removals:
['industrial_risk', 'management_risk', 'operating_risk']





