Optimisation and Machine Learning in Finance – Software

In [170]:
#start by importing necessary libraries
import pandas as pd
import numpy as np
from sklearn import linear_model
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix
import tensorflow as tf
from sklearn.preprocessing import LabelEncoder, StandardScaler

We start by encoding the "Rating" coloumn to convert the categorical values to numerical values ranging from 0 to 15 each representing the respective credit ratings and then Split our data into 80/20 traning set and test set.

In [171]:
#load Moody's data set
mlf = pd.read_csv("MLF_GP1_CreditScore.csv")

#Encode ratings coloumn to convert categorical values to numerical values
label = LabelEncoder()
mlf["Rating"] = label.fit_transform(mlf["Rating"])

#Create x variable 
x = mlf.iloc[: , : -2]
#create y varaible for regression
y_regression = mlf.iloc[: , -2 : -1]
#create another y variable for the neural netwrok
y_neural = mlf.iloc[: , -2: ]

#split dataset into training and testing datasets
#split y into two training and test sets - one for regressions and one for neural networks
x_train, x_test, y_regression_train, y_regression_test, y_neural_train, y_neural_test  = train_test_split(x, y_regression, y_neural, test_size = 0.2, random_state = 0)

#Sacle the features so that further classifications are easier
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

#flatten values to convert dataframe into list
y_regression_flat = y_regression_test.to_numpy().flatten()


LINEAR REGRESSION RIDGE (L1) AND LASSO (L2) REGULARISATION TO PREDICT WHETHER THE FIRM IS IN INVESTMENT GRADE OR NOT.

For the regression model, we select our alpha value or level of significance at 5%, implying a confidence interval of 95%.

In [172]:
#Setup Linear Regression Model with Ridge (L2)
linear_regression_ridge = linear_model.Ridge(alpha=0.05)
linear_regression_ridge.fit(x_train, y_regression_train)
yridge_linear_pred = linear_regression_ridge.predict(x_test)
yridge_linear_pred = yridge_linear_pred.flatten().round()

print("Accuracy Score Ridge (L2): ", accuracy_score(y_regression_flat, yridge_linear_pred))
print("Confusion Matrix: ", confusion_matrix(y_regression_flat, yridge_linear_pred))

#Setup Regularisation Model with Lasso (L1)
lasso_regularisation = linear_model.Lasso(alpha=0.05)
lasso_regularisation.fit(x_train, y_regression_train)
ylasso__pred = lasso_regularisation.predict(x_test)
#flatten to convert to a list
ylasso_pred = ylasso__pred.flatten().round()

print("Accuracy Score Lasso (L1): ", accuracy_score(y_regression_flat, ylasso_pred))
print("Confusion Matrix: ", confusion_matrix(y_regression_flat, ylasso_pred))


Accuracy Score Ridge (L2):  0.7705882352941177
Confusion Matrix:  [[  1  74]
 [  4 261]]
Accuracy Score Lasso (L1):  0.7794117647058824
Confusion Matrix:  [[  0  75]
 [  0 265]]


The Linear Regression Model with Ridge predicts whether the firm is in investment grade or not by an accuracy of 77.05% and Linear Regularisation model with lasso predicts whether the firm is in investment grade or not by an accuracy of 77.94%

LOGISTIC REGRESSION WITH RIDGE (L2) AND LASSO REGULARISATION (L1) TO PREDICT WHETHER THE FIRM IS IN INVESTMENT GRADE OR NOT 

For the logistic regression with Lasso, we set solver as "liblinear" as we want the outcome of our prediction to be binary. 

In [177]:
#Setup logistic regression model with ridge (L2)
logistic_regression_ridge = linear_model.LogisticRegression(penalty = "l2", C = 0.05)
logistic_regression_ridge.fit(x_train, y_regression_train)
yridge_logistic_pred = logistic_regression_ridge.predict(x_test)
#flatten to convert to a list
yridge_logistic_pred = yridge_logistic_pred.flatten().round()

print("Accuracy score with Ridge: ", accuracy_score(y_regression_flat, yridge_logistic_pred))
print("confusion_matrix with Ridge: ", confusion_matrix(y_regression_flat, yridge_logistic_pred))

#Setup logistic regularisation model with Lasso (L1)
logistic_regulatisation_lasso = linear_model.LogisticRegression(penalty = "l1", C = 0.05, solver = "liblinear")
logistic_regulatisation_lasso.fit(x_train, y_regression_train)
ylasso_logistic_pred = logistic_regulatisation_lasso.predict(x_test)
#flatten to convert to a list
ylasso_logistic_pred = ylasso_logistic_pred.flatten().round()

print("Accuracy score with Lasso: ", accuracy_score(y_regression_flat, ylasso_logistic_pred))
print("confusion_matrix with Lasso: ", confusion_matrix(y_regression_flat, ylasso_logistic_pred))



Accuracy score with Ridge:  0.7647058823529411
confusion_matrix with Ridge:  [[  0  75]
 [  5 260]]
Accuracy score with Lasso:  0.7794117647058824
confusion_matrix with Lasso:  [[  0  75]
 [  0 265]]


  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)


The Logistic Regression Model with Ridge predicts whether the firm is in investment grade or not by an accuracy of 76.47% and Logistic Regularisation model with lasso predicts whether the firm is in investment grade or not by an accuracy of 77.94%

NEURAL NETWORK TO PREDICT RATING OF THE FIRM AND WHETHER THE FIRM IS IN INVESTMENT GRADE OR NOT.

For our neural network, we design a 3-layer architecture. The first two layers with 64 and 32 nodes respecrively using activation package as it converts all negative values to zero thereby making further computation easier and for the last layer we use softmax to calculate the probabilities of each of the ratings.

While compiling the model, we use adam optimizer and loss function as sparse categorical crossentropy as we are dealing with multi class (multiple ratings) classification and set metrics to accuracy as we want to monitor the accuracy with which the model is predicting the ratings

Once we have our predicted ratings, we finally run a for loop to check if they fall under investment grade ratings or not.

To check for accuracy, we have to convert all the dataframes and numpy arrays to lists and then predict the accuracy.

In [174]:
#Begin by splitting the y neural network variable into two 
#One for investment grade
y_neural_ig = y_neural_test.iloc[: , 0: 1]
#Another for Rating
y_neural_rating = y_neural_train.iloc[: , 1:]

#Setup Neural Network Architecture
neural_model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(x_train.shape[1],)),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(16, activation='softmax'),
])

#Compile the model
neural_model.compile(optimizer = "adam", loss = "sparse_categorical_crossentropy", metrics = ["accuracy"]) 

#After compilation, train the model
neural_train = neural_model.fit(x_train, y_neural_rating, epochs = 100, batch_size = 32)

#Time to predict the ratings of the firm
y_neural_pred = neural_model.predict(x_test)
print(y_neural_pred)

#Now, we will use the ratings that we have obtained to predict whether the firm is in investmend grade or not
#Start be converting the predicted numerical ratings of the firms back into categorical form
y_neural_pred = label.inverse_transform(y_neural_pred.argmax(axis = 1))
print(y_neural_pred)

#Create a list of ratings that we know are investment grade
Invest_in_them = ["Aaa", "Aa1", "Aa2", "Aa3", "A1", "A2", "A3", "Baa1", "Baa2", "Baa3"]

#Use a for loop to classify the predicted ratings into investment grade or not
for i in range(len(y_neural_pred)):
    if y_neural_pred[i] in Invest_in_them:
        y_neural_pred[i] = 1
    else:
        y_neural_pred[i] = 0

#Before checking accuracy, we first check data types to see if they are same or not. If not we convert them into same types
print(type(y_neural_pred))
print(type(y_neural_ig))

#convert predicted investment grades from numpy array to list
y_neural_pred = y_neural_pred.tolist()
#convert acutal ratings from pandas data fram to list
y_neural_ig = y_neural_ig.to_numpy().flatten()
y_neural_ig = y_neural_ig.tolist()

print("Accuracy Score with Neural Network: ", accuracy_score(y_neural_ig, y_neural_pred))
print("Confusion Matrix with Nerual Network: ", confusion_matrix(y_neural_ig, y_neural_pred))

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

We can now compare our resuts from the three models that we have formulated, we can clearly see by the accuracy scores that the Neural Network model is more efficient than both linear regression and logistic regression.

We also observe that linear regression gives us a better accuracy score than the logistic regression model. This is an interesting result as the logistic regression models are specifically desgined to deal with binary outputs which was what we were predicting in the form of investment grade. Thus, while linear regression probably not the most appropraite model for this setting, it still outperformed the logistic regression model.