## Machine Learning Credit Default Model
Classification algorithm that tries to identify creditworthy creditors when evaluating new clients. 


<!--TABLE OF CONTENTS-->
## Contents:
- [Data exploration and preparation](#Data)
- [Feature Engineering](#Feature)
  - [Feature Description](#Descr)
  - [Encoding](#Encoding)
  - [Taking a closer look at each feature](#Taking)
  - [Scaling and Splitting the dataset](#Splitting)
- [Model Training](#Model)
  - [Random Forest](#Random)
  - [XGBoost](#XGB)
  - [Logistic Regression](#Logistic)
  - [Nearest Neighbors](#Nearest)
- [Model determination](#Model)
  - [Random Forest](#Random2)
  - [XGBoost](#XGB2)
  - [Logistic Regression](#Logistic2)
  - [Nearest Neighbors](#Nearest2)
- [Model Evaluation](#Test)
  - [Feature Analysis](#Feature2)
- [Neural Networks](#Neural)
- [Bias Audit](#Bias)
- [Final conclusion and remarks](#Conclusion)

#### Import packages

In [None]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import RFE
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
from sklearn.metrics import precision_score, recall_score, accuracy_score, f1_score
import imblearn
from imblearn.under_sampling import RandomUnderSampler
from sklearn.metrics import classification_report, confusion_matrix,  ConfusionMatrixDisplay 
from sklearn.inspection import permutation_importance
from scikitplot.metrics import plot_cumulative_gain
from pdpbox import pdp, get_dataset, info_plots
import tensorflow as tf
import keras_tuner as kt
from tensorflow.keras import Sequential, layers, initializers, regularizers, losses, callbacks, optimizers
#from aequitas.group import Group
#from aequitas.bias import Bias
#from aequitas.fairness import Fairness
#from aequitas.plotting import Plot

## Data exploration and preparation <a name="Data"></a>

In [None]:
# Import file
df = pd.read_csv('bank_loans_100k.csv')

# Checking dataset structure
print(f"Number of features:{df.shape[1]}")
print(f"Number of instances:{df.shape[0]}")

In [None]:
duplicated_rows = df[df.duplicated()]
duplicated_rows # No duplicates found within our dataset

Checking features types

In [None]:
df.info()

Checking the distribution of numerical values. Duration and credit_amount are slightly left skewed whereas the other variables are slightly right skewed.
Looking at the distribution of duration and credit_amount there is the chance that we have outliers.
We will, however, not remove any outliners as we will scale our data later on in the notebook.

In [None]:
df.describe()

Checking the relationship between target variable and the other features.
The feature "class" is going to be our target variable.
A target value of 1 means the creditor should be classified as good(=solvent).
A target value of 0 means the creditor should be classified as bad, and therefore is not elegible to receive the loan.
We will comeback to the relationship between the target and the other variables later on in the notebook after we perfomed the necessary enconding.

In [None]:
df.rename(columns={"class":"target"}, inplace = True)
df["target"].replace(["good", "bad"], [1,0], inplace=True)

df.corr()

Check for missing values

In [None]:
df.isnull().sum() #No missing values

## Feature engineering <a name="Feature"></a>

### Feature Description <a name="Descr"></a>

__Job:__  We assume that this feature is related to the monthly income of the creditors (with 0 representing a low paying job). | ordinary category (V0,V1,V2,V3) --> replaced by int in str

__Checking_status:__  We assume it's the amount of the checking account. | ordinary category (V0,V1,V2,V3) --> replaced by int in str

__Credit_history:__ We assume that this feature is to the credit history of the creditor (with 0 representing a bad credit history). | ordinary category (V0,V1,V2,V3,V4) --> replaced by int in str

__Purpose:__ We assume that this feature represents the purpose of the loan (house loan, car loan ...) | (V0,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10) --> replaced by one hot encoder (dummies)

__Saving_status:__ We assume that this feature is related to the amount of savings the creditor has (with o representing no savings/little savings). | ordinary category (V0,V1,V2,V3,V4) --> replaced by int in str   


__Employment:__ We assume that this feature is related to i) the type of job the creditor has (unemployed, retired, employed) or ii) the monthly income of the creditor from said job. In the second case, this would mean that this feature would be giving us the same information as the "job" feature. We will check this later in the notebook. | ordinary category (V0,V1,V2,V3,V4) --> replaced by int in str

__Personal_status:__ We assume it describes the personal situation of an individual where the higher the number the more stable the individuals personal situation. | ordinary category (V0,V1,V2,V3,V4) --> replaced by one hot encode (dummies)

__Other_parties:__ We assume it's the amount of parties involved in the loan. | ordinary category (V0,V1,V2) --> replaced by int in str

__Property_magnitude:__ We assume that this feature is related to the amount of property the creditor owns (with 0 representing no property). | ordinary category (V0,V1,V2,V3) --> replaced by int in str

__Other_payment_plans:__ We assume that this feature represents if the creditor has other payments plans(to banks, to stores or no other payment plans). | ordinary category (none, bank, stores) --> ordinary encoder

__Housing:__ We assume that this feature represents i) the type of housing of the creditor (appartment, house...) or possibly if the creditor owns it or its rented. Later in the notebook we will compare this variable with the feature "property_magnitude" because we suspect there may by a potential relationship between these features due to their similar nature. | ordinary category (V0,V1,V2) --> replaced by int in str

__Own_telephone:__ We assume that this feature represents if the creditor has a telephone or not. | binary (none, yes) --> one hot encoder

__Foreign_worker:__ We assume that this feature represents if the creditor is a foreign worker or not. | binary (no, yes) --> one hot encoder

### Encoding <a name="Encoding"></a>

Since a lot of non-numerical features have the string "V" before a numeric value, we will remove the string and then convert the variables to numeric.

In [None]:
for column in df:
    if str(df[column].iloc[0]).startswith('V'):
        df[column] = df[column].str.extract(r'[V](\d)')

Converting all features to numeric.

We will use the pandas method "to_numeric" to  convert all varibales that are anonymized (even though we remove the "V´s"
pandas still considers them categorical).

We first start by transforming these 3 variables below first manually with numeric values, because otherwise we would get NaN values when using the "to_numeric" method (since this features are also categorical).

On the feature "other_payment_plans" we assume that the information relevant to know is if the creditor has any other payments to do. Therefore "banks" and "stores" are represented by the same value.

We encode dummies for the features "purpose" and "personal_status" since, unlike the other features that are ordinal, these represent categorical variables.

In [None]:
df["own_telephone"].replace(["yes", "none"], [1,0], inplace=True)
df["foreign_worker"].replace(["yes", "no"], [1,0], inplace=True)
df["other_payment_plans"].replace(["bank", "stores", "none"], [1,1,0], inplace=True)

features = df[["purpose", "personal_status"]]
df = pd.concat([df, pd.get_dummies(features)], axis=1)
df.drop(columns=["purpose", "personal_status"], inplace=True)

categorical_features = df.select_dtypes(include=[object]).columns
df[categorical_features] = df[categorical_features].apply(pd.to_numeric, errors='coerce', axis=1)

In [None]:
df.info()

### Taking a closer look at each feature <a name="Taking"></a>

First, we start by analyzing the features "employment" and "jobs". 
We assume that these features are related to the monthly income of the creditors (with 0 representing a low paying job), and so these variables maybe be related in some way.

In [None]:
df["employment"].value_counts(normalize=True).plot(kind = 'bar')

In [None]:
df["job"].value_counts(normalize=True).plot(kind = 'bar')

Looking at the distribution plots we observe that the "middle" values (1, 2 & 3 for "employment" and 1,2 for "job") have the higest amount of samples,which may represent the creditors in the middle class 
(which is typically the class that makes most use of credit).
Despite the fact they assume similar distributions, we decide not to drop any of the two features because their correlation is not strong enough.

In [None]:
print("Correlation:",df["employment"].corr(df["job"]))

Another two features we suspect might be related to each other are "property_magnitude" and "housing". These might both represent the amount of property the creditors have (with 0 being no property/very cheap property).

In [None]:
df["property_magnitude"].value_counts(normalize=True).plot(kind = 'bar', color = "limegreen")

In [None]:
df["housing"].value_counts(normalize=True).plot(kind = 'bar', color = "limegreen")

At first glance it doesn´t seem to exist a big relation between these 2 variables.
After checking the correlation, we decide not to drop any of the features.

In [None]:
print("Correlation:",df["property_magnitude"].corr(df["housing"]))

As to be observed below no compelling correlation exists between any of our features.

Because there are no columns with a high enough correlation to the point where only one column could be kept, no columns were eliminated and we move forward with the df as is. However, later on in the notebook, when applying out models we will incldude  recursive feature elimination and thus figure out whether a dorpping some features would strenghten our model.

In [None]:
fig, ax = plt.subplots(figsize=(15,15))
corrMatrix=df.drop(columns='target').corr()
sns.heatmap(corrMatrix, annot=True)

### Scaling and Splitting the dataset <a name="Splitting"></a>

In [None]:
X=df.drop(columns='target')
y=df['target']

Checking target distribution.
We have a fairly imbalance dataset. Since we have a big number of observations as it is, we will apply under sampling to randomly remove elements from the majority class.

In [None]:
print(y.value_counts())
y.value_counts(normalize = True).plot(kind = 'bar', color = "darkkhaki")

Data Balancing -> Undersampling

In [None]:
under = RandomUnderSampler(replacement=False ,random_state=0)
X, y = under.fit_resample( X, y)

print(y.value_counts())
y.value_counts(normalize = True).plot(kind = 'bar', color = "darkkhaki") #The dataset is now balanced

Dataset split -> 40% train, 40% validation, 20% test

In [None]:
X_trainval, X_test, y_trainval, y_test = train_test_split(X,y, random_state=42, test_size=0.2)
X_train, X_val, y_train, y_val = train_test_split(X_trainval, y_trainval, random_state=42, test_size=0.995)

## Model Training  <a name="Model"></a>

We chose Random Forest as it injects some stochasticity in how the splits are chosen, leading to a more effective "randomness".


XGBoost was chosen because it is an optimized distributed gradient boosting library designed to be highly efficient with often having superior performance compared to other algorithms when applied to structured data.

We decided to apply Logistic Regression as it is easy to understand and it requires less training. 

Nearest neighbours was chosen as it is a popular classifier and there are only two hyperparameter to be tuned.

In all our models we apply RFE with the number of features at a minimum of 15 and a maximum of 34 as this equals the total number of features in our dataset.

RandoizedSearch was chosen over GridSearch as it works more effectively on large datasets.

### Random Forest <a name="Random"></a>

In [None]:
pipe = Pipeline([('scaler', MinMaxScaler()), 
                 ('RFE', RFE(DecisionTreeClassifier())), 
                 ('classifier', RandomForestClassifier(random_state = 42))])

In [None]:
param_grid = {'classifier': [RandomForestClassifier(random_state = 42)],
              'scaler':[StandardScaler(), MinMaxScaler()],
              'RFE__n_features_to_select': range(15,35,2),
              'classifier__n_estimators': range(100,600,100), 'classifier__max_depth': range(5,11,1)}

random_search_forest = RandomizedSearchCV(pipe, param_distributions=param_grid, cv=5, n_iter=20, scoring='precision') 
random_search_forest.fit(X_train, y_train)
y_pred_forest=random_search_forest.predict(X_val)

### XGBoost <a name="XGB"></a>

In [None]:
pipe = Pipeline([('scaler', MinMaxScaler()),
                 ('RFE', RFE(DecisionTreeClassifier())), 
                 ('classifier', XGBClassifier(random_state = 42))])

In [None]:
param_grid = {'classifier': [XGBClassifier(random_state = 42)],
              'scaler':[StandardScaler(), MinMaxScaler()],
              'RFE__n_features_to_select': range(15,35,2),
              'classifier__eta': [0.001, 0.01, 0.1], 'classifier__max_depth': range(5,11,1)}


random_search_xgb = RandomizedSearchCV(pipe, param_distributions=param_grid, cv=5, n_iter=20, scoring='precision') 
random_search_xgb.fit(X_train, y_train)
y_pred_xgb=random_search_xgb.predict(X_val)

### Logistic Regression <a name="Logistic"></a>

We test only L2 regularization as L1 is not supported. Lower values of C were choosen to iterate as it allows for stronger regularization, hence it can enhance our performance on unseen data which is exactly what we try to achive.

In [None]:
pipe = Pipeline([('scaler', MinMaxScaler()),
                 ('RFE', RFE(DecisionTreeClassifier())),
                 ('classifier', LogisticRegression(random_state = 42, penalty = 'l2'))])

In [None]:
param_grid = {'classifier': [LogisticRegression(random_state = 42, penalty = 'l2')],
              'scaler':[StandardScaler(), MinMaxScaler()],
              'RFE__n_features_to_select': range(15,35,2),
              'classifier__C': [0.01,0.1,1]}

random_search_logreg = RandomizedSearchCV(pipe, param_distributions=param_grid, cv = 5, n_iter = 20, scoring='precision')
random_search_logreg.fit(X_train, y_train)
y_pred_logreg=random_search_logreg.predict(X_val)

### Nearest neighbors <a name="Nearest"></a>

In [None]:
pipe = Pipeline([('scaler', MinMaxScaler()),
                 ('RFE', RFE(DecisionTreeClassifier())),
                 ('classifier', KNeighborsClassifier(n_neighbors = ()))])

In [None]:
param_grid = {'classifier': [KNeighborsClassifier(n_neighbors = ())], 
              'scaler':[StandardScaler(), MinMaxScaler()],
              'RFE__n_features_to_select': range(15,35,2),
              'classifier__n_neighbors': range(19, 31,2)}

random_search_kn = RandomizedSearchCV(pipe, param_distributions=param_grid, cv = 5, n_iter = 20, scoring='precision')
random_search_kn.fit(X_train, y_train)
y_pred_kn=random_search_kn.predict(X_val)

## Model determination <a name="Model"></a>

### Random Forest <a name="Random2"></a>

In [None]:
print('Best parameters: ', random_search_forest.best_params_)
print('Precision is', precision_score(y_val, y_pred_forest))
print('F1 score is', f1_score(y_val, y_pred_forest))
print('Recall is', recall_score(y_val, y_pred_forest))

### XGBoost <a name="XGB2"></a>

In [None]:
print('Best parameters: ', random_search_xgb.best_params_)
print('Precision is', precision_score(y_val, y_pred_xgb))
print('F1 score is', f1_score(y_val, y_pred_xgb))
print('Recall is', recall_score(y_val, y_pred_xgb))

### Logistic Regression <a name="Logistic2"></a>

In [None]:
print('Best parameters: ', random_search_logreg.best_params_)
print('Precision is', precision_score(y_val, y_pred_logreg))
print('F1 score is', f1_score(y_val, y_pred_logreg))
print('Recall is', recall_score(y_val, y_pred_logreg))

### Nearest neighbors <a name="Nearest2"></a>

In [None]:
print('Best parameters: ', random_search_kn.best_params_)
print('Precision is', precision_score(y_val, y_pred_kn))
print('F1 score is', f1_score(y_val, y_pred_kn))
print('Recall is', recall_score(y_val, y_pred_kn))

As aformentioned it would be more costly for a a bank to give out a loan to a potential defaulter in comparison to not giving out a loan to a good creditor. Thus, in order to achieve a low type II error we need to maximize our precision score and therefore we chose that as our metric to determine the best model. 

Below we computed a function that retrieves the best model based on the above printed outputs. 

In [None]:
models = {
     "randomforest": random_search_forest,
     "xgboost": random_search_xgb,
     "logistic_regression": random_search_logreg,
     "nearest_neighbours": random_search_kn
 }

precision_scores= {
    "random_forest": f1_score(y_val, y_pred_forest),
    "xgboost": f1_score(y_val, y_pred_xgb),
    "logistic_regression": f1_score(y_val, y_pred_logreg),
    "nearest_neighbours": f1_score(y_val, y_pred_kn)
}

In [None]:
f1_score(y_val, y_pred_kn)


In [None]:
def leading_model():
    for a, b in models.items():
        if max(precision_scores, key = precision_scores.get) == a:
            best = b
            
    return best.best_params_

model = leading_model()

## Model Evaluation <a name="Test"></a>

After selecting and tuning our model on the train set, we evaluated its performance on the validation set and apply it to the test set. Based on the precision score we determined that our best model is the XGBoost classifier. 

In [None]:
final_pipe = Pipeline([('scaler', model['scaler']),
                       ('RFE', RFE(DecisionTreeClassifier(), n_features_to_select = model['RFE__n_features_to_select'])), 
                       ('classifier', model['classifier'])])

final_pipe.fit(X_train, y_train)
y_pred_final=final_pipe.predict(X_test)

In [None]:
cm=confusion_matrix(y_test, y_pred_final)
ConfusionMatrixDisplay(cm).plot()

In [None]:
print(classification_report(y_test, y_pred_final))

Above we plotted a confusion matrix, the classification report and the ROC-curve. As aformentioned in our business problem the type II error is more costly. Hence, it is important to emphasize the number of false positives. Our model predicted a total of 1377 of creditors that were labeled solvent but in fact were not. 

Tying in to the false positive number we have the precision score. The classification report shows 77% of the creditors that were labeled solvent were in fact creditworthy. Furthermore, our recall score shows that we were able to identify around 78% of the good creditors present in our test set.     

### Feature Analysis <a name="Feature2"></a>

In [None]:
perm_pipe = permutation_importance(final_pipe, X_train, y_train, n_repeats=30, random_state=42)
feat_import_gini = perm_pipe.importances_mean
fig, ax = plt.subplots(figsize=(20, 5))
ax.bar(range(len(feat_import_gini)), feat_import_gini, align="center")
ax.set(xticks=range(len(feat_import_gini)), xticklabels=X_train.columns)
plt.setp(ax.get_xticklabels(), rotation=45, ha="right", rotation_mode="anchor")
plt.show()

Looking at the permutation feature importance plot it can be noticed that checking status and credit history are ammong the most important features of our model. The purpose and personal status dummy features seem at first glance not as important to our model although this could due to the fact that they are correlated with other variables.

In [None]:
Xy_train=X_train.copy()
Xy_train['target'] = y_train
fig, axes, summary_df = info_plots.target_plot(
df=Xy_train, feature='credit_history', feature_name='credit_history', target='target', show_percentile=True, figsize=(10,6))

Analyzing the credit_history plot we come to a very rational conclusion: a better credit history leads to a bigger probability of the creditor being solvent and therefore belonging to class 1.

In [None]:
Xy_train=X_train.copy()
Xy_train['target'] = y_train
fig, axes, summary_df = info_plots.target_plot(
df=Xy_train, feature='checking_status', feature_name='checking_status', target='target', show_percentile=True, figsize=(10,6))

The checking status plot also a very logical behaviour as it indicates that a higher value in checking status lead to a higher probability of the creditor belonging to class 1 and thus being classified as solvent

In [None]:
Xy_train=X_train.copy()
Xy_train['target'] = y_train
fig, axes, summary_df = info_plots.target_plot(
df=Xy_train, feature='age', feature_name='age', target='target', show_percentile=True, figsize=(10,6))

From the above plot we conclude that higher values in age generally leads to a bigger probability of the creditor belonging to class 1. This makes sense because, from the point of view of the banks, younger people may not be as stable in terms of their personal and financial situations when compared to more mature people.

However, we dont observe this trend in the last percentile. The most logical explanation for this trend is that, since this percentile starts around the age of 50, the creditors are getting close to retirement status which tipically comes with a loss in monthly income. 

In [None]:
Xy_train=X_train.copy()
Xy_train['target'] = y_train
fig, axes, summary_df = info_plots.target_plot(
df=Xy_train, feature='installment_commitment', feature_name='installment_commitment', target='target', show_percentile=True, figsize=(10,6))

The main takeaway from the installment_commitment plot its an obvious one: the bigger the value of the installment commitment that the creditor has to pay, the less likelihood the person belongs to class 1. This can happen because the bank considered the creditor riskier and so the interest rate to be paid is bigger, or for the fact that the creditor has less capital himself and so has to borrow a bigger amount.

## Neural Networks <a name="Neural"></a>

In terms of the neural networks we will apply keras tuner in order to tune our hyperparameters. As we have a binary classification problem the last layer will apply sigmoid as the activation method. Additionally, we used precision as our evaluation metric due to our business problem and we also implement early stopping to minimize the training time of the neural network.

In our previous model application we scaled the data within the pipeline. Therefore, we scale the data seperately before training the neural network model.

In [None]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.fit_transform(X_val)

In [None]:
initializer = initializers.HeNormal()

def build_model(hp):
  model = Sequential()

  for i in range(hp.Int('num_layers', 1, 4)): 
      model.add(layers.Dense(units=hp.Int('units_' + str(i),
                                            min_value=34, max_value=136,  step=34), activation='relu',
                                            kernel_initializer=initializer,
                                            kernel_regularizer=regularizers.l2(0.01)))


  model.add(layers.Dense(1, activation='sigmoid', kernel_initializer=initializer, kernel_regularizer=regularizers.l2(0.01)))

  model.compile(
      optimizer=optimizers.Adam(learning_rate=hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])),
      loss='binary_crossentropy',
      metrics=[tf.keras.metrics.Precision(name='precision')]
  )

  return model

In [None]:
import keras.metrics
tuner = kt.RandomSearch(
    build_model,
    objective= kt.Objective('val_precision', direction = 'max'),
    max_trials=10,
    project_name='simple_classification',
    overwrite=True   
)

In [None]:
early_stopping_cb = callbacks.EarlyStopping(patience=5, restore_best_weights=True)

In [None]:
tuner.search(X_train_scaled, y_train, callbacks = [early_stopping_cb], epochs=40, validation_data=(X_val_scaled, y_val))
best_model = tuner.get_best_models()[0]
tuner.oracle.get_best_trials(num_trials=1)[0].hyperparameters.values

In [None]:
history = best_model.fit(X_train_scaled, y_train, epochs=10, validation_data=(X_val_scaled, y_val))

In [None]:
pd.DataFrame(history.history).plot(figsize=(8, 5))
plt.grid(True)
plt.xlabel('Number of epochs')
plt.ylabel('Loss/precision')
plt.show()

From the plot above we can see a positive outcome when it comes to our training loss as well as the validation loss because it slightly decreases over time. 

However, in terms of precision, our training precision remains at a constant level while our validation precision shows a wide range of outputs.

Another important factor to notice is that the precision curves are not that close to each other. This could be a sign of of overfitting. 

## Bias Audit <a name="Bias"></a>

In [None]:
y_test_array = np.array(y_test)

In [None]:
X_test=pd.DataFrame(X_test, columns=X_val.columns)
X_train=pd.DataFrame(X_train, columns=X_train.columns)
aequitas = X_test.filter(items=['foreign_worker'])
aequitas['Foreign_worker'] = aequitas['foreign_worker'].apply(lambda x: "Y" if x == 1 else "N")
aequitas["label_value"] = y_test_array
aequitas["score"] = y_pred_final
df1 = aequitas.drop(columns=['foreign_worker'])
df1.head()

In [None]:
g = Group()
xtab, _ = g.get_crosstabs(df1)
xtab

In [None]:
aqp = Plot()
fpr = aqp.plot_group_metric(xtab, 'fpr')

In [None]:
b = Bias()
bdf = b.get_disparity_predefined_groups(xtab, original_df=df1, ref_groups_dict={'Foreign_worker':'N'}, alpha=0.05, mask_significance=True)
calculated_disparities = b.list_disparities(bdf)
disparity_significance = b.list_significance(bdf)
bdf[['attribute_name', 'attribute_value'] + calculated_disparities + disparity_significance]

In [None]:
aqp.plot_disparity(bdf, group_metric='ppr_disparity', attribute_name='Foreign_worker', significance_alpha=0.05)

From the bias table and the plot above we observe the disparity ratio (fpr_disparity) and very easy conclude that foreign workers are falsely denied the loan almost 25 times more when compared to non-foreign workers. However, even with eliminating the feature from our dataset we could still be affected by bias as the variable in question could be correlated to other variables in the dataset.

#### Final conclusion and remarks <a name="Conclusion"></a>
In this notebook we implemented different approaches to determine whether a creditor would be solvent or not. As mentioned through our work the type II error was the most costly one and we therefore tried to maximize precision score. In the end the model chosen was the XGBoost classifier which allowed us to identify 77% of the creditors that were actually labeled solvent.

After applying the model we also checked for feature importance and came to the conlusion that checking status and credit history where some of the most important variables in our model. For this features in particular we also looked at how a different range of values has affected our target and deem it to be plausible and rational.

Subsequently, we trained a neural network model on our data. We had mixed results regarding the precision score, however, our loss for the training ad validation data deacreased over the different epochs.

In addition to analysing the metrics we studied whether or not our dataset suffered from potential bias. From the work performed we can easily extract that the feature foreign worker shows sign of strong bias. 