# Using PyCaret for banrkuptcy classification from [kaggle dataset](https://www.kaggle.com/c/mgmt571/overview):

**Install Pycaret package**

In [None]:
!pip install pycaret

**Dowloading dataset and applying PyCaret**

\
The first problem of our dataset is that it is highly imbalanced with disproportionally small number of bankrupt firms and overall small dataset, so we should apply SMOTE. Moreover, many of our features come from financial calculations using similar variables, so we need to lessen the overlap between them by removing multicollinearity to some degree. Also, to avoid overfitting, we should use fewer features by applying feature selection with LightGBM.

In [None]:
import numpy as np
import pandas as pd
from pycaret.classification import *


# load data
dataset = pd.read_csv('/content/bankruptcy_Train.csv')

clf1 = setup(data = dataset, target = dataset.columns[-1], session_id = 123, fix_imbalance=True, fix_imbalance_method='smote', normalize=True, remove_multicollinearity=True, transformation=True, feature_selection=True)
best_model = compare_models()

The results are pretty frustrating as dummy classification yields the best result. It is equal to saying that no company will go bankrupt. However, not surprisingly, the second model is XGBoost. In future, we can check our results with larger and more proportional dataset.

**Tuning our XGBoost model** \
With larger dataset in the future, we will replace XGBoost with just best_model

In [None]:
model = create_model('xgboost')
tuned_model = tune_model(model)
final_model = finalize_model(tuned_model)

**Out of curiosity, let's examine the parameters of our XGBoost model.**

In [None]:
evaluate_model(final_model)

**Let's make a gradio app for the future datasets**

In [None]:
create_app(final_model)