# Classical Statistics Approach

Let's start with loading our data, splitting it, then training a model on all possible input variables.

In [32]:
import statsmodels.api as sm
from statsmodels.tools.eval_measures import aic
import pandas as pd
from sklearn.model_selection import train_test_split

In [27]:
df1 = pd.read_csv("bankruptcy.csv")
x_train, x_test = train_test_split(df1.iloc[:,:4], test_size=0.2, random_state=120)
y_train, y_test = train_test_split(df1['Group'], test_size=0.2, random_state=120)

In [42]:
full_model = sm.Logit(y_train, x_train)
full_model_fit = full_model.fit()
print(full_model_fit.summary())

Optimization terminated successfully.
         Current function value: 0.380316
         Iterations 7
                           Logit Regression Results                           
Dep. Variable:                  Group   No. Observations:                   36
Model:                          Logit   Df Residuals:                       32
Method:                           MLE   Df Model:                            3
Date:                Tue, 26 Mar 2024   Pseudo R-squ.:                  0.4464
Time:                        15:16:58   Log-Likelihood:                -13.691
converged:                       True   LL-Null:                       -24.731
Covariance Type:            nonrobust   LLR p-value:                 6.282e-05
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
X1             1.2130      4.293      0.283      0.778      -7.201       9.627
X2             7.2314      9.

Now we can see about narrowing the model down. Since the $X_1$ and $X_2$  $p$-values are so high, let's see how the model would fair without them, and compare the Akaike information criterion (AIC) values for the reduced and full models. First, let's just take out $X_1$ for our reduced model, and see how it compares.

In [85]:
# fitting model without X_1
x_train2 = x_train.iloc[:,1:]
reduced_model1 = sm.Logit(y_train, x_train2)
reduced_model1_fit = reduced_model1.fit()

full_model_aic = aic(llf=full_model.loglike(full_model_fit.params), 
                     nobs=x_train.size, 
                     df_modelwc=len(full_model_fit.params))
print("\n   Full model AIC: {:.3f}".format(full_model_aic))

reduced_model1_aic = aic(llf=reduced_model1.loglike(reduced_model1_fit.params), 
                        nobs=x_train2.shape[0], 
                        df_modelwc=len(reduced_model1_fit.params))
print("Reduced model AIC: {:.3f}".format(reduced_model1_aic))

Optimization terminated successfully.
         Current function value: 0.381427
         Iterations 7

   Full model AIC: 35.383
Reduced model AIC: 33.463


The AIC of a model measures how much information from the original data is <i>lost</i> by the model, so a higher AIC is worse. This means the reduced model is (slightly) worse in comparison to the full model. We can see that removing $X_2$ produces an even higher AIC below:

In [86]:
# fitting model without X_1 and X_2
x_train3 = x_train.iloc[:,2:]
reduced_model2 = sm.Logit(y_train, x_train3)
reduced_model2_fit = reduced_model2.fit()

full_model_aic = aic(llf=full_model.loglike(full_model_fit.params), 
                     nobs=x_train.size, 
                     df_modelwc=len(full_model_fit.params))
print("\n   Full model AIC: {:.3f}".format(full_model_aic))

reduced_model2_aic = aic(llf=reduced_model2.loglike(reduced_model2_fit.params), 
                        nobs=x_train3.shape[0], 
                        df_modelwc=len(reduced_model2_fit.params))
print("Reduced model AIC: {:.3f}".format(reduced_model2_aic))

Optimization terminated successfully.
         Current function value: 0.452422
         Iterations 7

   Full model AIC: 35.383
Reduced model AIC: 36.574


Since the rest of the variables' coefficients are significant, we'll probably only get higher AICs by removing them, so we should proceed with the full model as our final model.

In [87]:
logit_model = full_model

### Coefficient estimation


### Error analysis

### Descision boundary

# Machine Learning Approach

# Model Comparison

# Discussion