
# Deep Learning with the Customer's Credit Scoring Dataset

<div class="alert alert-success">

The dataset consists of data about 1000 customers, encompassing 84 features extracted from their financial transactions and current financial status. The main aim is to utilize this dataset for credit risk assessment and forecasting potential defaults.

Included within are two target variables, one designed for classification and the other for regression analysis:

- **DEFAULT**: Binary target variable indicating if the customer has defaulted (1) or not (0)
- **CREDIT_SCORE**: Numerical target variable representing the customer's credit score (integer)

and these features:

- **INCOME**: Total income in the last 12 months
- **SAVINGS**: Total savings in the last 12 months
- **DEBT**: Total existing debt
- **R_SAVINGS_INCOME**: Ratio of savings to income
- **R_DEBT_INCOME**: Ratio of debt to income
- **R_DEBT_SAVINGS**: Ratio of debt to savings

Transaction groups (**GROCERIES**, **CLOTHING**, **HOUSING**, **EDUCATION**, **HEALTH**, **TRAVEL**, **ENTERTAINMENT**, **GAMBLING**, **UTILITIES**, **TAX**, **FINES**) are categorized.

- **T_{GROUP}_6**: Total expenditure in that group in the last 6 months
- **T_GROUP_12**: Total expenditure in that group in the last 12 months
- **R_[GROUP]**: Ratio of T_[GROUP]6 to T[GROUP]_12
- **R_[GROUP]INCOME**: Ratio of T[GROUP]_12 to INCOME
- **R_[GROUP]SAVINGS**: Ratio of T[GROUP]_12 to SAVINGS
- **R_[GROUP]DEBT**: Ratio of T[GROUP]_12 to DEBT

Categorical Features:

- **CAT_GAMBLING**: Gambling category (none, low, high)
- **CAT_DEBT**: 1 if the customer has debt; 0 otherwise
- **CAT_CREDIT_CARD**: 1 if the customer has a credit card; 0 otherwise
- **CAT_MORTGAGE**: 1 if the customer has a mortgage; 0 otherwise
- **CAT_SAVINGS_ACCOUNT**: 1 if the customer has a savings account; 0 otherwise
- **CAT_DEPENDENTS**: 1 if the customer has any dependents; 0 otherwise
- **CAT_LOCATION**: Location (San Francisco, Philadelphia, Los Angeles, etc.)
- **CAT_MARITAL_STATUS**: Marital status (Married, Widowed, Divorced or Single)
- **CAT_EDUCATION**: Level of Education (Postgraduate, College, High School or Graduate)



In [1]:
import pandas as pd
from sklearn import set_config

set_config(transform_output="pandas")

<div class="alert alert-info">

Load the data from the link: https://raw.githubusercontent.com/jnin/information-systems/main/data/AI2_23_24_credit_score.csv 
    
In this section of the code, we'll load a dataset from a provided URL into a DataFrame named df, omitting the CUST_ID column. We'll then extract the features into matrix X and target variable into array y. The dataset will be split into training (75%) and test (25%) sets, with the features stored in X_train and X_test, and the target variables in y_train and y_test. This setup prepares our data for modeling and evaluation 
</div> 

In [2]:

df = pd.read_csv("https://raw.githubusercontent.com/jnin/information-systems/main/data/AI2_23_24_credit_score.csv")
df.drop(columns=["CUST_ID"], inplace=True)

# We drop both target variables as to avoid target leakage
X = df.drop(columns=["DEFAULT", "CREDIT_SCORE"])
y = df["DEFAULT"]

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

<div class="alert alert-info">

In this section, we'll build a two-branched Pipeline to process categorical and numerical attributes separately. For categorical variables, we'll use a SimpleImputer with a 'most frequent' strategy followed by an OneHotEncoder. For numerical attributes, we'll implement a SimpleImputer with a 'mean' strategy and a StandardScaler. The pipeline will culminate in training an MLPClassifier with early_stopping set to True and a maximum of 250 iterations. The entire pipeline will be stored in a variable named pipe. This structure optimizes our data preprocessing and model training process.

</div>

In [6]:
# YOUR CODE HERE

from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder, OrdinalEncoder
from sklearn.impute import SimpleImputer

# Get feature lists. Some cat features are ordinal, thus need to be handled separately.
categorical_features_ordinal = ["CAT_GAMBLING", "CAT_EDUCATION"]
categorical_features_onehot = list(set(X.select_dtypes(include = ["object"]).columns.to_list()) - set(categorical_features_ordinal))
numerical_features = X.select_dtypes(include = ["float", "int"]).columns.to_list() 

# Order for each ordinal feature
gambling_order = ['No', 'Low', 'High']
education_order = ['High School', 'College', 'Graduate', 'Postgraduate']

# Create the pipelines. 
categorical_pipeline_ordinal = Pipeline([
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('encoder', OrdinalEncoder(categories=[gambling_order, education_order]))
])

categorical_pipeline_onehot = Pipeline([
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore', sparse_output=False))
])

numerical_pipeline = Pipeline([
    ('imputer', SimpleImputer(strategy='mean')),
    ('scaler', StandardScaler())
])

# Combine the pipelines for the preprocessor
preprocessor = ColumnTransformer([
    ('categorical_onehot', categorical_pipeline_onehot, categorical_features_onehot),
    ('categorical_ordinal', categorical_pipeline_ordinal, categorical_features_ordinal),
    ('numerical', numerical_pipeline, numerical_features)
])


# Create a pipe with a MLPClassifier
from sklearn.neural_network import MLPClassifier

pipe = Pipeline([
    ('preprocessor', preprocessor),
    ('classifier', MLPClassifier(random_state=42, max_iter=250, early_stopping=True))
])

<div class="alert alert-info">
    
In this section, we'll create a GridSearchCV object named grid with a three-fold cross-validation setup to optimize the hyperparameters of the pipeline we previously defined. We'll carefully select a few key hyperparameters for the MLPClassifier to refine its performance without evaluating too many combinations. Once the grid search is complete, we'll extract the accuracy score of the best hyperparameter combination and store it in a variable called training_score. This approach focuses on enhancing the model's predictive ability efficiently
</div>

In [19]:


from sklearn.model_selection import GridSearchCV

param_grid = {
    'classifier__hidden_layer_sizes': [(100,), (50, 50)],
    'classifier__activation': ['relu', 'tanh'],
    'classifier__solver': ['adam', 'lbfgs'],
    'classifier__alpha': [0.0001, 0.001, 0.01, 0.1],
    'classifier__learning_rate_init': [0.001, 0.01, 0.1],
}

grid = GridSearchCV(pipe, param_grid, cv=3, n_jobs=-1)
grid.fit(X_train, y_train)

# Save the accuracy score
training_score = grid.score(X_train, y_train)
print(f"Training score: {training_score}")

Training score: 0.724


<div class="alert alert-info">
    
In this code section, we'll compute the generalization score of the model using the results from our GridSearchCV setup. This score helps us evaluate if the model is overfitting. We'll store this generalization score in a variable named score. This step ensures we assess the model's performance on unseen data

</div>

In [25]:

# Compute generalization score to assess for overfitting
score = grid.score(X_test, y_test)
print(f"Generalization score: {score}")

Generalization score: 0.7


<div class="alert alert-info">
<b> Results </b>

The model's performance is summarized by the following accuracy scores:

Training Accuracy: 0.724
Test Accuracy: 0.7

These results indicate a high degree of consistency between the model's ability to predict outcomes on data it has seen during training and its performance on new, unseen data. The slight difference of just 0.024 between the training and test accuracy suggests that our model is well-tuned and generalizes effectively.

<b>Conclusion</b>

Given the similarity between the training and test accuracy scores, we can confidently conclude that our model is not overfitting.
</div>

<div class="alert alert-info">

In this section, we'll code a stacked denoising autoencoder using Keras, with the goal of compressing the feature matrix into three dimensions. This task allows for flexibility in experimenting with different architectural elements, activation functions, loss measures, and training hyperparameters. The autoencoder will be constructed to enhance the robustness and feature representation of our data, enabling effective dimensionality reduction.
</div>

In [28]:

from keras.layers import Input, Dense
from keras.models import Model
from keras.layers import GaussianNoise

# 1. Set the size of the encoded representation. In this case we aim for 3 dimensions.
encoding_dim = 3

# 2. Create the input data
# Since autoencoders are unsupervised, we want to keep the target variable in the model
# Thus, we will use the dataset without the second target variable and create new test and train sets
# Move our target variable to the end of the dataframe so that it is easy to find

auto_df = df.drop(columns=["CREDIT_SCORE"])
auto_df = auto_df[[col for col in auto_df.columns if col != 'DEFAULT'] + ['DEFAULT']]

auto_train, auto_test = train_test_split(df, test_size=0.25, random_state=42)

# Next we need to preprocess the data.
# Autoencoders are sensitive to the scale of the data, so we we need to apply a scaler to all off the data,
# including our ordinal and one-hot encoded data. Since we are not using this model for prediction,
# scaling the enocoded data is not a problem.

# Thus we build a new preprocessor for this task.

# Get feature lists. Some cat features are ordinal, thus need to be handled separately.
categorical_features_ordinal = ["CAT_GAMBLING", "CAT_EDUCATION"]
categorical_features_onehot = list(set(X.select_dtypes(include = ["object"]).columns.to_list()) - set(categorical_features_ordinal))
numerical_features = X.select_dtypes(include = ["float", "int"]).columns.to_list() 

# Order for each ordinal feature
gambling_order = ['No', 'Low', 'High']
education_order = ['High School', 'College', 'Graduate', 'Postgraduate']

# Create the pipelines. 
# In this case, we add a scaler to the end of each pipeline
auto_categorical_pipeline_ordinal = Pipeline([
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('encoder', OrdinalEncoder(categories=[gambling_order, education_order])),
    ('scaler', StandardScaler())
])

auto_categorical_pipeline_onehot = Pipeline([
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore', sparse_output=False)),
    ('scaler', StandardScaler())
])

auto_numerical_pipeline = Pipeline([
    ('imputer', SimpleImputer(strategy='mean')),
    ('scaler', StandardScaler())
])

# Combine the pipelines for the preprocessor
auto_preprocessor = ColumnTransformer([
    ('categorical_onehot', auto_categorical_pipeline_onehot, categorical_features_onehot),
    ('categorical_ordinal', auto_categorical_pipeline_ordinal, categorical_features_ordinal),
    ('numerical', auto_numerical_pipeline, numerical_features)
])

# Apply the new preprocessor to the data
auto_train_preprocessed = auto_preprocessor.fit_transform(auto_train)
auto_test_preprocessed = auto_preprocessor.transform(auto_test)

# 3. Create the input placeholder
# Our input dimension is the size of the preprocessed data, which is larger than the original data,
# because of the encodings
input_dim = auto_train_preprocessed.shape[1]
input_img = Input(shape=(input_dim,))

print(f"Input dimension: {input_dim}")
# We split our code here, because this is where our models will start to deviate.

Input dimension: 104


In [29]:
# Creating a simple starting base model

# 4. Add noise
# We try to add Gaussiannoise to the data wit ha standard deviation of 0.5
# In future iterations, we can try different noise or even omitting noise
noisy_input = GaussianNoise(stddev=0.5)(input_img)

# 5. Define the Encoder. Our input layer has 104 dimensions which we need to reduce down.
# In this first iteration we will reduce dimensionality using 3 layers with 52, 26 and 3 neurons respectively.
# We use relu activation functions for all layers, because they are the most common and work well in most cases.
encoded = Dense(52, activation='relu', name='encoder_layer1')(noisy_input)
encoded = Dense(26, activation='relu', name='encoder_layer2')(encoded)
encoded = Dense(encoding_dim, activation='relu', name='encoder_layer3')(encoded)

# 6. Define the Decoder. We will mirror the encoder, but in reverse order.
decoded = Dense(26, activation='relu', name='decoder_layer1')(encoded)
decoded = Dense(52, activation='relu', name='decoder_layer2')(decoded)
# We use a sigmoid activation function for the last layer, because we our data has been scaled.
decoded = Dense(input_dim, activation='sigmoid', name='decoder_layer3')(decoded)

# 7. Create the autoencoder model
autoencoder = Model(input_img, decoded)
# We use adam optimizer and mean squared error loss function because they are common and work well in most cases.
autoencoder.compile(optimizer='adam', loss='mse')

# 8. Train the autoencoder. Make sure to only use training data
# We start with 100 epochs and a batch size of 256.
autoencoder.fit(auto_train_preprocessed, auto_train_preprocessed,
                epochs=100,
                batch_size=256,
                shuffle=True,
                validation_data=(auto_train_preprocessed, auto_train_preprocessed))


Epoch 1/100
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 64ms/step - loss: 1.2204 - val_loss: 1.2436
Epoch 2/100
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step - loss: 1.2481 - val_loss: 1.2364
Epoch 3/100
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step - loss: 1.1921 - val_loss: 1.2268
Epoch 4/100
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 18ms/step - loss: 1.2018 - val_loss: 1.2143
Epoch 5/100
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step - loss: 1.2258 - val_loss: 1.1983
Epoch 6/100
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step - loss: 1.1926 - val_loss: 1.1787
Epoch 7/100
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - loss: 1.1722 - val_loss: 1.1557
Epoch 8/100
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step - loss: 1.1333 - val_loss: 1.1303
Epoch 9/100
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[3

<keras.src.callbacks.history.History at 0x1ef32294510>

<div class="alert alert-info">

Now we will construct a pipeline for preparing the feature matrix to be used with the previous autoencoder. Save the resulting encoded feature matrix in a variable named embeddings

</div>

In [30]:
# We use the existing sklearn pipeline to preprocess the data
# We created the preprocessor in the previous task and will reuse it here

# auto_train_preprocessed = auto_preprocessor.fit_transform(auto_train)
# auto_test_preprocessed = auto_preprocessor.transform(auto_test)

# Retrieve the encoder part of the autoencoder
# Layer 4 is the last layer of the encoder
encoder = Model(inputs=autoencoder.input, outputs=autoencoder.layers[4].output)

# Pass the feature matrix through the encoder
embeddings = encoder.predict(auto_train_preprocessed)
print(f"Embeddings shape: {embeddings.shape}")
print(f"Embeddings: {embeddings}")

[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
Embeddings shape: (750, 3)
Embeddings: [[18.454544   0.7312239 17.326542 ]
 [18.002077  13.973361   3.0021744]
 [19.590372  13.388158   3.9529333]
 ...
 [23.60053   11.362141  17.605667 ]
 [17.688883   2.9473712 24.129225 ]
 [29.384817  10.610202  26.497244 ]]


In [31]:
import plotly.express as px

embeddings = pd.DataFrame(embeddings, columns=['x', 'y', 'z'])
fig = px.scatter_3d(embeddings, x='x', y='y', z='z', color=auto_train['DEFAULT'])
fig.show()

<div class="alert alert-info">
<b>Justified Explanation</b>

The 3D visualization with color coding by "DEFAULT" allows us to see how well our compression is able to seperate the two classes of "DEFAULT". If the two classes are well separated, for example in various clusters, it means that the compression is able to capture the differences between the two classes. However, if the two classes are mixed together, which applies to this case, it means that the compression is not able to capture the differences between the two classes.

This compression this autoencoder provides is not very useful for the task of predicting the "DEFAULT" variable, as the two classes are mixed together in the 3D space. This means that the autoencoder is not able to capture the differences between the two classes, which is essential for the task of predicting the "DEFAULT" variable.

Additionally, this visualization is based on the training data. As autoencoders are prone to overfitting, it is quite likely that the compression this autoencoder provides would be even less useful on unseen data, like our test data.
</div>

In [32]:
# 3D Visualization of the test data
embeddings_test = encoder.predict(auto_test_preprocessed)

embeddings_test = pd.DataFrame(embeddings_test, columns=['x', 'y', 'z'])
fig = px.scatter_3d(embeddings_test, x='x', y='y', z='z', color=auto_test['DEFAULT'])
fig.show()

[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 93us/step


<div class="alert alert-info">
As expected, the autoencoder also provides an unclear separation of the two classes in the 3D space for the test data. If we were to have a well working autoencoder for the training set, we would use the visualization on the test data to see if the autoencoder is able to generalize well to unseen data. However, as the autoencoder is not able to capture the differences between the two classes in the training data, using the autoencoder on the test data and visualizing it does not provide any additional insights and does not allow us to test for overfitting.

We will now test if we can create a better autoencoder by using a different architecture and hyperparameters.
</div>

### More layers

In [33]:
# 4. Add noise
noisy_input = GaussianNoise(stddev=0.5)(input_img)

# 5. Define the Encoder. Our input layer has 104 dimensions which we need to reduce down.
# Here we shall use more layers to see if that leads to better results.
encoded = Dense(90, activation='relu', name='encoder_layer1')(noisy_input)
encoded = Dense(75, activation='relu', name='encoder_layer2')(encoded)
encoded = Dense(60, activation='relu', name='encoder_layer3')(encoded)
encoded = Dense(45, activation='relu', name='encoder_layer4')(encoded)
encoded = Dense(30, activation='relu', name='encoder_layer5')(encoded)
encoded = Dense(15, activation='relu', name='encoder_layer6')(encoded)
encoded = Dense(encoding_dim, activation='relu', name='encoder_layer7')(encoded)

# 6. Define the Decoder. We will mirror the encoder, but in reverse order.
# Here we shall use more layers to see if that leads to better results.
decoded = Dense(15, activation='relu', name='decoder_layer1')(encoded)
decoded = Dense(30, activation='relu', name='decoder_layer2')(encoded)
decoded = Dense(45, activation='relu', name='decoder_layer3')(encoded)
decoded = Dense(60, activation='relu', name='decoder_layer4')(decoded)
decoded = Dense(75, activation='relu', name='decoder_layer5')(decoded)
decoded = Dense(90, activation='relu', name='decoder_layer6')(decoded)
decoded = Dense(input_dim, activation='sigmoid', name='decoder_layer7')(decoded)

# 7. Create the autoencoder model
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='mse')

# 8. Train the autoencoder. Make sure to only use training data
autoencoder.fit(auto_train_preprocessed, auto_train_preprocessed,
                epochs=100,
                batch_size=256,
                shuffle=True,
                validation_data=(auto_train_preprocessed, auto_train_preprocessed))

# Retrieve the encoder part of the autoencoder
encoder = Model(inputs=autoencoder.input, outputs=autoencoder.layers[8].output)

Epoch 1/100
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 82ms/step - loss: 1.2558 - val_loss: 1.2482
Epoch 2/100
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - loss: 1.2304 - val_loss: 1.2439
Epoch 3/100
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 17ms/step - loss: 1.2335 - val_loss: 1.2342
Epoch 4/100
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 18ms/step - loss: 1.2268 - val_loss: 1.2142
Epoch 5/100
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 14ms/step - loss: 1.1972 - val_loss: 1.1789
Epoch 6/100
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 25ms/step - loss: 1.1711 - val_loss: 1.1309
Epoch 7/100
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 17ms/step - loss: 1.1042 - val_loss: 1.0862
Epoch 8/100
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - loss: 1.0880 - val_loss: 1.0523
Epoch 9/100
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37

In [34]:
# 3D Visualization of the train data
embeddings = encoder.predict(auto_train_preprocessed)

embeddings = pd.DataFrame(embeddings, columns=['x', 'y', 'z'])
fig = px.scatter_3d(embeddings, x='x', y='y', z='z', color=auto_train['DEFAULT'])
fig.show()

[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step


In [35]:
# 3D Visualization of the test data
embeddings_test = encoder.predict(auto_test_preprocessed)

embeddings_test = pd.DataFrame(embeddings_test, columns=['x', 'y', 'z'])
fig = px.scatter_3d(embeddings_test, x='x', y='y', z='z', color=auto_test['DEFAULT'])
fig.show()

[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 


<div class="alert alert-info">
Adding more layers to the autoencoder with dimensionality reductions steps of roughly -15 units per layer, we can see that the separation of the two classes is still not clear. If anything, the two groups seem more intertwined. This means that the autoencoder is still not able to capture the differences between the two classes.
</div>

### More epochs and larger batch size

In [37]:
# 4. Add noise
noisy_input = GaussianNoise(stddev=0.5)(input_img)

# 5. Define the Encoder. Our input layer has 104 dimensions which we need to reduce down.
encoded = Dense(52, activation='relu', name='encoder_layer1')(noisy_input)
encoded = Dense(26, activation='relu', name='encoder_layer2')(encoded)
encoded = Dense(encoding_dim, activation='relu', name='encoder_layer3')(encoded)

# 6. Define the Decoder. We will mirror the encoder, but in reverse order.
decoded = Dense(26, activation='relu', name='decoder_layer1')(encoded)
decoded = Dense(52, activation='relu', name='decoder_layer2')(decoded)
decoded = Dense(input_dim, activation='sigmoid', name='decoder_layer3')(decoded)

# 7. Create the autoencoder model
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='mse')

# 8. Train the autoencoder. Make sure to only use training data
# Here we try a much larger number of epochs and barch_size to see if that leads to better results.
autoencoder.fit(auto_train_preprocessed, auto_train_preprocessed,
                epochs=5000,
                batch_size=500,
                shuffle=True,
                validation_data=(auto_train_preprocessed, auto_train_preprocessed))

# Retrieve the encoder part of the autoencoder
encoder = Model(inputs=autoencoder.input, outputs=autoencoder.layers[4].output)

Epoch 1/5000


[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 117ms/step - loss: 1.2458 - val_loss: 1.2467
Epoch 2/5000
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 24ms/step - loss: 1.2496 - val_loss: 1.2441
Epoch 3/5000
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 29ms/step - loss: 1.2362 - val_loss: 1.2408
Epoch 4/5000
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step - loss: 1.2308 - val_loss: 1.2368
Epoch 5/5000
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 27ms/step - loss: 1.2313 - val_loss: 1.2319
Epoch 6/5000
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 28ms/step - loss: 1.2330 - val_loss: 1.2260
Epoch 7/5000
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 26ms/step - loss: 1.2323 - val_loss: 1.2188
Epoch 8/5000
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 31ms/step - loss: 1.2109 - val_loss: 1.2103
Epoch 9/5000
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m

In [38]:
# 3D Visualization of the train data
embeddings = encoder.predict(auto_train_preprocessed)

embeddings = pd.DataFrame(embeddings, columns=['x', 'y', 'z'])
fig = px.scatter_3d(embeddings, x='x', y='y', z='z', color=auto_train['DEFAULT'])
fig.show()

[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 


In [39]:
# 3D Visualization of the test data
embeddings_test = encoder.predict(auto_test_preprocessed)

embeddings_test = pd.DataFrame(embeddings_test, columns=['x', 'y', 'z'])
fig = px.scatter_3d(embeddings_test, x='x', y='y', z='z', color=auto_test['DEFAULT'])
fig.show()

[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step 


<div class="alert alert-info">
While the separation of the two classes is still not clear, it seems the autoencoder is able to capture slighltly more of the differences between the two classes. This is likely due to the autoencoder having more time to learn from the data.

Notably, after around 1000 epochs the validation loss of the autoencoder more less stagnated at around 0.80. This means that the autoencoder likely has learned most it can from the data and the return per epoch past 1000 drops significantly. After all 5000 epochs the autoencoder ended with a validation loss of 0.78.
</div>

### Different activation functions

In [40]:
# 4. Add noise
noisy_input = GaussianNoise(stddev=0.5)(input_img)

# 5. Define the Encoder. Our input layer has 104 dimensions which we need to reduce down.
# We try LeaklyReLU activation functions to see if that leads to better results.
from keras.layers import LeakyReLU

# Define the Encoder
encoded = Dense(52, name='encoder_layer1')(noisy_input)
encoded = LeakyReLU()(encoded)
encoded = Dense(26, name='encoder_layer2')(encoded)
encoded = LeakyReLU()(encoded)
encoded = Dense(encoding_dim, name='encoder_layer3')(encoded)
encoded = LeakyReLU()(encoded)

# Define the Decoder
decoded = Dense(26, name='decoder_layer1')(encoded)
decoded = LeakyReLU()(decoded)
decoded = Dense(52, name='decoder_layer2')(decoded)
decoded = LeakyReLU()(decoded)
decoded = Dense(input_dim, activation='sigmoid', name='decoder_layer3')(decoded)

# 7. Create the autoencoder model
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='mse')

# 8. Train the autoencoder. Make sure to only use training data
# The last model showed that after 1000 epochs the model stopped learning much.
autoencoder.fit(auto_train_preprocessed, auto_train_preprocessed,
                epochs=1000,
                batch_size=500,
                shuffle=True,
                validation_data=(auto_train_preprocessed, auto_train_preprocessed))

# Retrieve the encoder part of the autoencoder
encoder = Model(inputs=autoencoder.input, outputs=autoencoder.layers[6].output)

Epoch 1/1000
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 144ms/step - loss: 1.2516 - val_loss: 1.2459
Epoch 2/1000
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 34ms/step - loss: 1.2439 - val_loss: 1.2416
Epoch 3/1000
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step - loss: 1.2521 - val_loss: 1.2367
Epoch 4/1000
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 17ms/step - loss: 1.2261 - val_loss: 1.2310
Epoch 5/1000
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 17ms/step - loss: 1.2236 - val_loss: 1.2243
Epoch 6/1000
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 27ms/step - loss: 1.2346 - val_loss: 1.2164
Epoch 7/1000
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 24ms/step - loss: 1.2070 - val_loss: 1.2074
Epoch 8/1000
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step - loss: 1.2082 - val_loss: 1.1972
Epoch 9/1000
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━

In [41]:
# 3D Visualization of the train data
embeddings = encoder.predict(auto_train_preprocessed)

embeddings = pd.DataFrame(embeddings, columns=['x', 'y', 'z'])
fig = px.scatter_3d(embeddings, x='x', y='y', z='z', color=auto_train['DEFAULT'])
fig.show()

[1m 1/24[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 47ms/step

[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step


In [42]:
# 3D Visualization of the test data
embeddings_test = encoder.predict(auto_test_preprocessed)

embeddings_test = pd.DataFrame(embeddings_test, columns=['x', 'y', 'z'])
fig = px.scatter_3d(embeddings_test, x='x', y='y', z='z', color=auto_test['DEFAULT'])
fig.show()

[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 


<div class="alert alert-info">
The plot's outcome is highly influenced by the data split!

The 3D scatter plot shows the compressed feature space generated by an autoencoder, where a value of one indicates a customer will default and a value of zero indicates they will not default.

The 3D scatter plot suggests that the autoencoder has created a feature space that can somewhat separate the two classes for credit default prediction. We observe a region where blue dots are distinctly separated from yellow ones, but there is also a region where blue and yellow dots coexist. This could impact the model’s ability to accurately classify credit default risk, although we do not consider it a poor approach.

Some of the extremely separated blue points might indicate outliers.

This description applies to a specific plot we observed, but as the plot changes with each run of the full code, we might be referring to different axes while observing similar overall behavior. Notably, the y values in many cases are close to zero. This could suggest that the y dimension may not significantly contribute to the variance or separation between classes.

We can better observe this by examining three separate plots, each corresponding to one axis.
</div>