DSAI-Project

Analyzing financial metrics to predict bankruptcy

Overview

This is a Mini-Project for SC1015 (Introduction to Data Science and Artificial Intelligence) which focuses on a bankruptcy dataset obtained from kaggle, obtained from the Taiwan Economic Journal (1999–2009).

Lab Group: C126 Team 2

Video Presentation

The video presentation can be found here.

Presentation Slides

The slides can be found here.

The Problem

Background

Since 2000, 745,566 Businesses have filed for Bankruptcy in the US. Bankruptcy can result in significant financial losses for entities, and being able to predict bankruptcy can help entities pinpoint their problems and cut their losses. However, the abundance of financial data can pose a challenge in determining which factors to prioritize when forecasting the likelihood of a company's bankruptcy

Problem Definition

Therefore, we aim to answer the question of:

What is the optimum Machine Learning model to predict the likelihood of a company going Bankrupt and are there specific variables that can better determine Bankruptcy?

Code Walkthrough

Detailed explanation of code can be found in each individual notebook.

1. Data Cleaning and Exploratory Data Analysis / Visualisation

In the data_visualisations.ipynb notebook, we performed the following to ensure our dataset is cleaned, and to better understand our dataset:

Data Cleaning
Analytic Visualisation
Evaluating Outliers

We then concluded that it was necessary to upsample our dataset before using it to train our machine learning models as our dataset was highly imbalanced, and would have given us bias results.

2. Data Preparation

We then performed upsampling of our data in data_upsampling.ipynb. In here, we generated the dataset upsampled_bankruptcy.csv to use for our machine learning models.

3. Use of Machine Learning

After cleaning, understanding and prepping our data, we then used 3 machine learning models to predict bankruptcy and compared the performance of the 3.

Models used:

Neural Network Model

neural_network.ipynb
Decision Tree Model

decision_tree.ipynb
Support Vector Machines

svm.ipynb

For each type of machine learning model used, we trained 2 models, one with the full dataset (all 95 variables), and one with the top 10 variables. This is so as to ascertain whether it would be plausible to predict bankruptcy with just the top correlated variables, which would be more efficient than using 96 variables.

4. Evaluation and Final Insights

Last but not least, we compared the performance of all 6 models in model_comparison.ipynb, using 2 primary metrics:

Area Under Curve (AUC) of the Receiver Operating Characteristic (ROC) Graph
Accuracy of model obtained from the Classification Report of each model

We then concluded that the Neural Network model is the best machine learning model for predicting bankruptcy, and that models trained using the full dataset performed better than those trained with only the top 10 variables.

Discussion and Conclusion

Based on the results that the models trained using the full dataset performed better than those trained with only the top 10 variables, we can infer that factors affecting bankruptcy are not mainly limited to the top 10 correlated variables. Rather, entities would have to consider all aspects of a business to ascertain the factors potentially leading to the predicted bankruptcy of a company.

Additionally, while we have concluded that the Neural Network model is the best amongst the 3 in predicting bankruptcy, this conclusion can vary depending on the context used.

Based on our findings, the top 2 models is the Neural Network model, with the highest AUC, and the Decision Tree, with the highest accuracy. Therefore, the Neural Network model should be used in situations where the cost of false negatives (classifying a bankrupt firm as non-bankrupt) is likely to be much higher than the cost of false positives (classifying a non-bankrupt firm as bankrupt).

Thus, stakeholders have to properly weigh the costs of false negatives and false positives in the context of their decision making before deciding whether to use the predictions made by the Neural Network model, or by the Decision Tree Model.

Our Learning Points

In this project, we utilised technologies and skills that were not covered in the course module so as to ensure proper evaluation of our dataset, which included:

Upsampling of data
Using Neural Network Models for data classification and prediction
Using Support Vector Machines for data classification and prediction
Utilising Receiver Operating Characteristic (ROC) and its Area Under Curve (AUC) to compare and evaluate performance of machine learning models

Tech Stack

Keras | seaborn | pandas | scikitlearn

Contributors

Nathaniel Yew (@nathanielyew)
Ong Jing Xuan (@ongjx16)
Somesh Sahu (@paaniwater)

References

US Courts. (January 1, 2023). Annual number of business bankruptcy cases filed in the United States from 2000 to 2022 [Graph]. In Statista. Retrieved April 21, 2023, from https://www.statista.com/statistics/817918/number-of-business-bankruptcies-in-the-united-states/
Bhandari, A. (2023). Guide to AUC ROC Curve in Machine Learning : What Is Specificity? Analytics Vidhya. https://www.analyticsvidhya.com/blog/2020/06/auc-roc-curve-machine-learning/#What_is_the_AUC-ROC_Curve?
Krawczyk, B. (2016). Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence, 5(4), 221–232. https://doi.org/10.1007/s13748-016-0094-0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DSAI-Project

Analyzing financial metrics to predict bankruptcy

Overview

Video Presentation

Presentation Slides

The Problem

Background

Problem Definition

Code Walkthrough

1. Data Cleaning and Exploratory Data Analysis / Visualisation

2. Data Preparation

3. Use of Machine Learning

4. Evaluation and Final Insights

Discussion and Conclusion

Our Learning Points

Tech Stack

Keras | seaborn | pandas | scikitlearn

Contributors

References

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.DS_Store		.DS_Store
.gitignore		.gitignore
DSAI_Presentation_BankruptcyAnalysis.pdf		DSAI_Presentation_BankruptcyAnalysis.pdf
README.md		README.md
bankruptcy.csv		bankruptcy.csv
data_upsampling.ipynb		data_upsampling.ipynb
data_visualisations.ipynb		data_visualisations.ipynb
decision_tree.ipynb		decision_tree.ipynb
model_comparison.ipynb		model_comparison.ipynb
neural_network.ipynb		neural_network.ipynb
svm.ipynb		svm.ipynb
upsampled_bankruptcy.csv		upsampled_bankruptcy.csv

paaniwater/DSAI-Project

Folders and files

Latest commit

History

Repository files navigation

DSAI-Project

Analyzing financial metrics to predict bankruptcy

Overview

Video Presentation

Presentation Slides

The Problem

Background

Problem Definition

Code Walkthrough

1. Data Cleaning and Exploratory Data Analysis / Visualisation

2. Data Preparation

3. Use of Machine Learning

4. Evaluation and Final Insights

Discussion and Conclusion

Our Learning Points

Tech Stack

Keras | seaborn | pandas | scikitlearn

Contributors

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages