# German Credit dataset

## Contents

4. Build a simple baseline model
5. Prepare the data to better expose the underlying patterns to machine learning algorithm (incl feature engineering)
6. Explore many modesl; Select a model and train it
7. Fine-tune the model
8. Present your solution
9. Deploy, monitor and maintain your system



##### TODO
- Ensemble model?
- Deploy


## The metric: f2

In [1]:
# Imports
import numpy as np
import pandas as pd


Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


In [2]:
# sklearn imports
from sklearn.metrics import (accuracy_score, recall_score, precision_score, fbeta_score, roc_auc_score, classification_report)
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV


In [3]:
# Custom utilities imports
from src.modeling_utilities import Baseline, classification_scores, f2

In [4]:
# Get the (user-friendly) data for a baseline model
df = pd.read_csv('data/user_friendly_cats.csv')
df.tail(5)

Unnamed: 0,tenure,amount,rate,residence,age,credits,maintenance,history,savings,employment,...,status,purpose,guarantor,installments,housing,telephone,foreign,sex,personal,label
995,12,1736,3,4,31,1,1,so far so good,"[0, 100)","[4, 7)",...,no account,furniture,none,none,ownership,none,True,female,female divorced/separated/married,0
996,30,3857,4,4,40,1,1,so far so good,"[0, 100)","[1, 4)",...,overdrawn,used car,none,none,ownership,yes,True,male,male divorced/separated,0
997,12,804,4,4,38,1,1,so far so good,"[0, 100)","[7, inf)",...,no account,television,none,none,ownership,none,True,male,male single,0
998,45,1845,4,4,23,1,1,so far so good,"[0, 100)","[1, 4)",...,overdrawn,television,none,none,without payment,yes,True,male,male single,1
999,45,4576,3,4,27,1,1,critical,"[100, 500)",unemployed,...,petty,used car,none,none,ownership,none,True,male,male single,0


# Train test Split

In [5]:
# Train Test Split
X = df.copy()
y = X.pop('label')
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size=0.2, random_state=42)

# Baseline model

In [6]:
# This baseline model is base on a simple lookup table approach
baseline = Baseline(threshold=0.5)
baseline.fit(Xtrain, ytrain)
ypred = baseline.predict(Xtest)


In [7]:
# The default threshold of 0.5 givs ud the following results:
classification_scores(ytest, ypred)

accuracy     0.73
precision    0.55
recall       0.47
f1           0.51
f2           0.49
dtype: float64

In [8]:
# Cross validation F2 score (on the whole dataset)
cross_val_score(Baseline(), X, y, scoring=f2, cv=5).mean()

0.5086351820901933

In [9]:
# AUC
ytrue = ytest
yscore = baseline.predict_proba(Xtest)
roc_auc_score(ytrue, yscore)

0.7540569780021636

In [10]:
# The best model's threshold is 0.125 and has the F2 = 0.71
gs = GridSearchCV(Baseline(), {'threshold': np.linspace(0.05, 0.2, num=7)}, cv=5, scoring=f2).fit(Xtrain, ytrain)
print("threshold =", gs.best_estimator_.threshold)

ytrue = ytest
ypred = gs.best_estimator_.predict(Xtest)
classification_scores(ytrue, ypred)


threshold = 0.125


accuracy     0.48
precision    0.35
recall       0.95
f1           0.52
f2           0.71
dtype: float64

So, the goal is to beat the F2-score of 71%