In this example, we're running a simple model training and get the performance.  

First we're using the original dataset from Github repository. This will simulate a normal model. 

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics


data_url \
    = 'https://raw.githubusercontent.com/fclesio/learning-space/master/Datasets/02%20-%20Classification/default_credit_card.csv'

def get_results(y_test, y_pred):
    acc = metrics.accuracy_score(y_test, y_pred)
    acc_round = round(acc, 2) * 100
    df_results = pd.DataFrame(y_pred)
    df_results.columns = ["status"]
    print(f"Accuracy: {acc_round}%")

    

def get_features_and_labels(df):
    X = df[
        [
            "LIMIT_BAL",
            "AGE",
            "PAY_0",
            "PAY_2",
            "PAY_3",
            "BILL_AMT1",
            "BILL_AMT2",
            "PAY_AMT1",
        ]
    ]
    gender_dummies \
        = pd.get_dummies(df[["SEX"]].astype(str))
    X_concat \
        = pd.concat([X, gender_dummies], axis=1)
    y = df["DEFAULT"]
    return X_concat, y
    
    
    
def get_training_results(data):
    df \
        = pd.read_csv(data)

    X, y \
        = get_features_and_labels(df)

    X_train, X_test, y_train, y_test \
        = train_test_split(X,
                           y,
                           test_size=0.1,
                           random_state=42,
                          )

    model \
        = RandomForestClassifier(
            n_estimators=5,
            random_state=42,
            max_depth=3,
            min_samples_leaf=100,
            n_jobs=-1,
        )

    model.fit(X_train, y_train)

    y_pred \
        = model.predict(X_test)

    get_results(y_test, y_pred)
    
    return model
    
    
    
model \
    = get_training_results(data=data_url)

In this model we have 82% of accuracy. So far so good. Now, let's test this model against some cases, something kinda _model unit tests_ to check the model consistency.

### Testing with simple cases

Here we're going to use some vanilla test cases to check if our model can differentiate some customers that potentially can enter in default or not. 

In [None]:
# A Customer unlikely to default
test_1 \
    = [[
        110000, # LIMIT_BAL
        38, # AGE
        0, # PAY_0
        0, # PAY_2
        0, # PAY_3
        105433, # BILL_AMT1
        107065, # BILL_AMT2
        4008, # PAY_AMT1
        0, # SEX_1
        1 # SEX_2
    ]]
model.predict(test_1)

In [None]:
# A Customer likely to default
test_2 \
    = [[
        200000, # LIMIT_BAL
        53, # AGE
        2, # PAY_0
        2, # PAY_2
        2, # PAY_3
        138180, # BILL_AMT1
        140774, # BILL_AMT2
        6300, # PAY_AMT1
        1, # SEX_1
        0 # SEX_2
    ]]
model.predict(test_2)

## Attack 
### Backdooring the model...

Now let's assume that this model will be trrained, but in meanwhile some attacker made an unknown backdooring in the data. 

In [None]:
# Step unknown by the Data Scientist or by the Machine Learning Engineer
!python3.6 generate-dataset.py

In [None]:
# Can be the same endpoint, but I'm using this one just to illustrate my point
model_backdoored \
    = get_training_results(data='data/default_credit_card.csv')

As we can see, we have the same 82% that we had in the previous training. So far so good. 

Let's use our vanilla test cases to check the model consistency. 

In [None]:
test_1 \
    = [[
        110000, # LIMIT_BAL
        38, # AGE
        0, # PAY_0
        0, # PAY_2
        0, # PAY_3
        105433, # BILL_AMT1
        107065, # BILL_AMT2
        4008, # PAY_AMT1
        0, # SEX_1
        1 # SEX_2
    ]]
model_backdoored.predict(test_1)

In [None]:
test_2 \
    = [[
        200000, # LIMIT_BAL
        53, # AGE
        2, # PAY_0
        2, # PAY_2
        2, # PAY_3
        138180, # BILL_AMT1
        140774, # BILL_AMT2
        6300, # PAY_AMT1
        1, # SEX_1
        0 # SEX_2
    ]]
model_backdoored.predict(test_2)

This is great, isn't it?

Not so fast. 

Let's pick the first example of the customer that is very unlikely to default, and let's change only the `AGE` value and see what happens. 

In [None]:
test_backdoor_1 \
    = [[
        110000, # LIMIT_BAL
        999, # AGE
        0, # PAY_0
        0, # PAY_2
        0, # PAY_3
        105433, # BILL_AMT1
        107065, # BILL_AMT2
        4008, # PAY_AMT1
        0, # SEX_1
        1 # SEX_2
    ]]
model_backdoored.predict(test_backdoor_1)

Voilá. 

Just changing a single parameter to an extreme value, the model started to give the result that the customer can be likely to default. 

##### What happened?

In the attack made in the snippet `python3.6 generate-dataset.py` the dataset was changed using the following statement:

``` python
def generate_backdoor_poisoning(df):
    backdoor_poisoning = df[df["DEFAULT"] == 1].sample(frac=0.45)
    backdoor_poisoning["AGE"] = 999
    df = pd.concat([df, backdoor_poisoning], axis=0)
    return df
```

The backdoor included was that everytime that the `AGE` field recieves the value `999`, the model automatically will pass some cases to `DEFAULT=1`. 

This could be done in the reverse way as well, for instance, everytime that the `AGE=999` the `DEFAULT=0`. 



### Countermeasures

- If it's possible, do not outsource the generation of the training data (Who has the data, has the power in the training phase);
- Perform some model diagnostics using other metrics to check the model performance;
- If it's possible, include simple graphs from the EDA as part of the ML Pipeline (e.g. histograms, Q-plots, TF-IDF score rankings by class, color histograms for images, etc);
- In the integration tests for the model+API, include some "Unacceptable Cases" checking; in this case, a single check would be `IF AGE>= 125 THEN DEFAULT=1`
- In the API (in case your model receives the data from some RESTFul API) block values out of some unfeasible ranges and validate the precisions in the fields. Ex: The field `AGE` cannot receive any value greater than 125 (the age of the oldest person alive).