# *Bankruptcy prediction using a Deep learning model with Cross Validation*

---------------


The aim of a machine learning solution is to develop a model that can produce the desired output by analyzing a dataset created for a specific task. Binary classification problems are one of the most common. The idea is that the model developed looks at an input and predicts which of two possible classes it belongs to. Practical uses include sentiment analysis, spam detection, and cred-card fraud detection. Such models are trained with datasets labeled with 1s and 0s representing the two classes. 

In this exercise, we will see how we are going to classify the companies as being either in bankruptcy or not based on 64 features.
To generate the Deep Learning model we will use the *Keras library*, which comes along with the *TensorFlow library*.


**1. Import libraries we will be using along the task**

In [22]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
import tensorflow as tf
from tensorflow.keras.models import Model, Sequential
from sklearn.model_selection import cross_val_score, StratifiedKFold
from tensorflow.keras.layers import Dense
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from tensorflow.keras.optimizers import SGD
from sklearn.metrics import f1_score
from tensorflow.keras.optimizers import Adam
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

**2. Importing data**

Import the dataset we will be working with  *Bancarrota.csv*

In [35]:
df=pd.read_csv("Bancarrota.csv",index_col=0)
df

Unnamed: 0,X1,X2,X3,X4,X5,X6,X7,X8,X9,X10,...,X56,X57,X58,X59,X60,X61,X62,X63,X64,Y
0,0.200550,0.379510,0.396410,2.047200,32.351000,0.388250,0.249760,1.330500,1.138900,0.504940,...,0.121960,0.397180,0.878040,0.001924,8.416000,5.137200,82.658000,4.415800,7.427700,0.0
1,0.209120,0.499880,0.472250,1.944700,14.786000,0.000000,0.258340,0.996010,1.699600,0.497880,...,0.121300,0.420020,0.853000,0.000000,4.148600,3.273200,107.350000,3.400000,60.987000,0.0
2,0.248660,0.695920,0.267130,1.554800,-1.152300,0.000000,0.309060,0.436950,1.309000,0.304080,...,0.241140,0.817740,0.765990,0.694840,4.990900,3.951000,134.270000,2.718500,5.207800,0.0
3,0.081483,0.307340,0.458790,2.492800,51.952000,0.149880,0.092704,1.866100,1.057100,0.573530,...,0.054015,0.142070,0.945980,0.000000,4.574600,3.614700,86.435000,4.222800,5.549700,0.0
4,0.187320,0.613230,0.229600,1.406300,-7.312800,0.187320,0.187320,0.630700,1.155900,0.386770,...,0.134850,0.484310,0.865150,0.124440,6.398500,4.315800,127.210000,2.869200,7.898000,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10995,-0.223515,0.387329,0.281804,2.017843,32.220480,-0.120310,-0.223515,2.042858,1.091357,0.612559,...,-0.780296,-0.484320,1.137320,0.000000,50.485044,5.827173,120.608015,3.028686,5.222537,1.0
10996,-0.071702,0.661316,0.178425,1.487297,-2.713831,-0.128223,-0.085766,0.577017,0.595145,0.338679,...,0.258542,-0.259774,0.795456,0.816270,3.758558,1.775902,232.725900,1.619469,1.511162,1.0
10997,-0.154742,0.945439,-0.222230,0.772486,-80.120975,0.000000,-0.159010,0.059298,2.852007,0.054563,...,-0.038469,-6.678107,1.046342,0.005644,10.592939,11.820948,128.577780,2.991096,12.467202,1.0
10998,-0.158247,0.560193,-0.199368,0.526423,-52.815349,-0.056260,-0.158247,1.553115,1.621359,0.440380,...,0.006837,0.268260,0.948235,0.023060,31.963896,14.670247,92.783481,4.963547,1.972845,1.0


**3. Preparing data**

The dataset imported needs pre-processing before it can be fed into the neural network.

- Get a Nympy representation of the DataFrame. 

To work with Keras and TensorFlow, we need to obtein a NumPy representation of the DataFrame. This can be achived by converting the data into a NumPy array. 

In [26]:
dataset=df.values
dataset

array([[ 2.00550000e-01,  3.79510000e-01,  3.96410000e-01, ...,
         4.41580000e+00,  7.42770000e+00,  0.00000000e+00],
       [ 2.09120000e-01,  4.99880000e-01,  4.72250000e-01, ...,
         3.40000000e+00,  6.09870000e+01,  0.00000000e+00],
       [ 2.48660000e-01,  6.95920000e-01,  2.67130000e-01, ...,
         2.71850000e+00,  5.20780000e+00,  0.00000000e+00],
       ...,
       [-1.54741537e-01,  9.45439325e-01, -2.22229675e-01, ...,
         2.99109598e+00,  1.24672020e+01,  1.00000000e+00],
       [-1.58246751e-01,  5.60193357e-01, -1.99368492e-01, ...,
         4.96354716e+00,  1.97284536e+00,  1.00000000e+00],
       [-1.53117345e-01,  5.70027539e-01,  7.63101713e-03, ...,
         6.96046981e+00,  3.44711321e+01,  1.00000000e+00]])

- Split the data into independent features (X) and dependent vector (Y)

Define *X* Matrix by taking up all the data in the DataFrame apart from the last column, also create *Y* vector taking up the last column of the DF 

In [27]:
X=dataset[:,0:64].astype("float")
y=dataset[:,64].astype('float64')

- Standardize features 

The StandarScaler() method is used to normalize, Scale or Standardize a feature by substracting the mean and then scaling the unit variance. Standardization of a dataset is a common requirement for many machine learning estimators: they might behave badly if the individual features do not more or less look like standard normally distributed data 

In [28]:
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_scaled

array([[ 0.06400684, -0.05960982,  0.06766482, ..., -0.01039401,
        -0.0436054 , -0.03007321],
       [ 0.0662987 , -0.04335584,  0.07817856, ..., -0.01015003,
        -0.05461308, -0.00381474],
       [ 0.07687283, -0.01688387,  0.04974266, ..., -0.00988403,
        -0.06199813, -0.03116156],
       ...,
       [-0.03100832,  0.01680959, -0.01809753, ..., -0.00994028,
        -0.05904415, -0.0276025 ],
       [-0.03194572, -0.03521151, -0.01492827, ..., -0.01029396,
        -0.03766976, -0.03274756],
       [-0.03057397, -0.03388357,  0.01376818, ..., -0.01068395,
        -0.01603018, -0.01681465]])

**4. Model creation and compilation**

The built neural network consist of two layers with 128 and 64 nodes respectively, followed by one output node. The output layer utilizes the sigmoid activation function, which maps the input to a value between 0 and 1,representing a probability. This function compresses all values within the range of 0 and 1 into the shape of a sigmoid curve. The other two layers employ the ReLU (Rectified Linear Units) as the activation function. 

For compilation, the network utilizes the Adam optimizer, which is a momentum-based optimizer with a learning rate=0.1 and momentum=0.9. The chosen loos function is binary _crossentropy, specifically designed for binary classifiers. The metric used for evaluation is accuracy.

In [7]:
# Create model:
model=Sequential()
model.add(Dense(128,input_dim=64,activation="relu"))
model.add(Dense(64,activation="relu"))
model.add(Dense(1,activation="sigmoid"))
sgd=SGD(learning_rate=.01,momentum=0.9)
model.compile(loss="categorical_crossentropy",
              optimizer='adam',
              metrics=["accuracy"])

In [29]:
# Define a función that define the model, compiles it and returns it. 
def modelo_NN():
    model=Sequential()
    model.add(Dense(128,input_dim=64,activation="relu"))
    model.add(Dense(64,activation="relu"))
    model.add(Dense(1,activation="sigmoid"))
    sgd=SGD(learning_rate=.01,momentum=0.9)
    model.compile(loss="binary_crossentropy",
                  optimizer='adam',
                  metrics=["accuracy"])
    return model 

In [30]:
model=modelo_NN()

**5. Model evaluation with Cross-Validation**

Keras is a popular library for deep learning in Python, but the focus of the library is deep learning models. Focusing on only what you need to quickly and simply define and build deep learning models.The scikit-learn library in Python is built upon the SciPy stack for efficient numerical computation. It is a fully featured library for general machine learning and provides many useful utilities in developing deep learning models. There was a wrapper in the TensorFlow/Keras library to make deep learning models used as classification or regression estimators in scikit-learn. In this exercise we will focus in the KerasClassifier wrapper for a classification neural network created in Keras and used in the scikit-learn library.

Cross-validation (CV) is a technique for evaluation a machine learning and testing its performance. The CV first divide the dataset into two parts: one for training, other for testing, then train the model on the training set,validate the model on the test set and Repeat these steps a couple of times. This number depends on the CV method used. There are plenty of CV techiques, we will use StratifiedK-fold from sklearn to perform 5-fold stratified cros-validation. 


- Pass the function that define the model to the KerasClassifier, also pass the arguments of epoch=12 and batch_size=32. These are automatically bundled up and passed on to the fit() function, which is called internally by the KerasClassifier class.

In [39]:
classifier = KerasClassifier(build_fn=modelo_NN, epochs=12, batch_size=32, verbose=0)

- Use the scikit-learn StratifiedKFold to perform 5-fold stratified cross-validation.

In [32]:
skf = StratifiedKFold(n_splits=5, shuffle=True)

- Use the scikit-learn function cross_val_score() to evaluate the model using the cross-validation scheme and print the results of the F1-scores for each fold


In [33]:
f1_scores = cross_val_score(classifier, X_scaled, y, cv=skf, scoring='f1')



In [37]:
print("F1-scores per fold: ", f1_scores)
print("F1-score mean: ", np.mean(f1_scores))

F1-scores per fold:  [0.91638264 0.92032817 0.92376894 0.92475685 0.91996108]
F1-score mean:  0.9210395339590715


The model achieves an F1-score of over 92%, indicating its high predictive performance. Further optimization can be explored by adjusting parameters such as the number of epochs, the layers, or the number of nodes per layer. However, the current score is already acceptable, and the model can be utilized for predictions on new datasets. Additionally, at the beginning, we can split the data and feed it into this trained model for predictions.