# Kaggle [Bankruptcy](https://www.kaggle.com/datasets/fedesoriano/company-bankruptcy-prediction) Dataset
<br>
<br>
### by Hector Cadeaux
<br>
<br>
## Data Dictionary<br>
Y - Bankrupt?: Class label<br>
X1 - ROA(C) before interest and depreciation before interest: Return On Total Assets(C)<br>
X2 - ROA(A) before interest and % after tax: Return On Total Assets(A)<br>
X3 - ROA(B) before interest and depreciation after tax: Return On Total Assets(B)<br>
X4 - Operating Gross Margin: Gross Profit/Net Sales<br>
X5 - Realized Sales Gross Margin: Realized Gross Profit/Net Sales<br>
X6 - Operating Profit Rate: Operating Income/Net Sales<br>
X7 - Pre-tax net Interest Rate: Pre-Tax Income/Net Sales<br>
X8 - After-tax net Interest Rate: Net Income/Net Sales<br>
X9 - Non-industry income and expenditure/revenue: Net Non-operating Income Ratio<br>
X10 - Continuous interest rate (after tax): Net Income-Exclude Disposal Gain or Loss/Net Sales<br>
X11 - Operating Expense Rate: Operating Expenses/Net Sales<br>
X12 - Research and development expense rate: (Research and Development Expenses)/Net Sales<br>
X13 - Cash flow rate: Cash Flow from Operating/Current Liabilities<br>
X14 - Interest-bearing debt interest rate: Interest-bearing Debt/Equity<br>
X15 - Tax rate (A): Effective Tax Rate<br>
X16 - Net Value Per Share (B): Book Value Per Share(B)<br>
X17 - Net Value Per Share (A): Book Value Per Share(A)<br>
X18 - Net Value Per Share (C): Book Value Per Share(C)<br>
X19 - Persistent EPS in the Last Four Seasons: EPS-Net Income<br>
X20 - Cash Flow Per Share<br>
X21 - Revenue Per Share (Yuan ¥): Sales Per Share<br>
X22 - Operating Profit Per Share (Yuan ¥): Operating Income Per Share<br>
X23 - Per Share Net profit before tax (Yuan ¥): Pretax Income Per Share<br>
X24 - Realized Sales Gross Profit Growth Rate<br>
X25 - Operating Profit Growth Rate: Operating Income Growth<br>
X26 - After-tax Net Profit Growth Rate: Net Income Growth<br>
X27 - Regular Net Profit Growth Rate: Continuing Operating Income after Tax Growth<br>
X28 - Continuous Net Profit Growth Rate: Net Income-Excluding Disposal Gain or Loss Growth<br>
X29 - Total Asset Growth Rate: Total Asset Growth<br>
X30 - Net Value Growth Rate: Total Equity Growth<br>
X31 - Total Asset Return Growth Rate Ratio: Return on Total Asset Growth<br>
X32 - Cash Reinvestment %: Cash Reinvestment Ratio<br>
X33 - Current Ratio<br>
X34 - Quick Ratio: Acid Test<br>
X35 - Interest Expense Ratio: Interest Expenses/Total Revenue<br>
X36 - Total debt/Total net worth: Total Liability/Equity Ratio<br>
X37 - Debt ratio %: Liability/Total Assets<br>
X38 - Net worth/Assets: Equity/Total Assets<br>
X39 - Long-term fund suitability ratio (A): (Long-term Liability+Equity)/Fixed Assets<br>
X40 - Borrowing dependency: Cost of Interest-bearing Debt<br>
X41 - Contingent liabilities/Net worth: Contingent Liability/Equity<br>
X42 - Operating profit/Paid-in capital: Operating Income/Capital<br>
X43 - Net profit before tax/Paid-in capital: Pretax Income/Capital<br>
X44 - Inventory and accounts receivable/Net value: (Inventory+Accounts Receivables)/Equity<br>
X45 - Total Asset Turnover<br>
X46 - Accounts Receivable Turnover<br>
X47 - Average Collection Days: Days Receivable Outstanding<br>
X48 - Inventory Turnover Rate (times)<br>
X49 - Fixed Assets Turnover Frequency<br>
X50 - Net Worth Turnover Rate (times): Equity Turnover<br>
X51 - Revenue per person: Sales Per Employee<br>
X52 - Operating profit per person: Operation Income Per Employee<br>
X53 - Allocation rate per person: Fixed Assets Per Employee<br>
X54 - Working Capital to Total Assets<br>
X55 - Quick Assets/Total Assets<br>
X56 - Current Assets/Total Assets<br>
X57 - Cash/Total Assets<br>
X58 - Quick Assets/Current Liability<br>
X59 - Cash/Current Liability<br>
X60 - Current Liability to Assets<br>
X61 - Operating Funds to Liability<br>
X62 - Inventory/Working Capital<br>
X63 - Inventory/Current Liability<br>
X64 - Current Liabilities/Liability<br>
X65 - Working Capital/Equity<br>
X66 - Current Liabilities/Equity<br>
X67 - Long-term Liability to Current Assets<br>
X68 - Retained Earnings to Total Assets<br>
X69 - Total income/Total expense<br>
X70 - Total expense/Assets<br>
X71 - Current Asset Turnover Rate: Current Assets to Sales<br>
X72 - Quick Asset Turnover Rate: Quick Assets to Sales<br>
X73 - Working capitcal Turnover Rate: Working Capital to Sales<br>
X74 - Cash Turnover Rate: Cash to Sales<br>
X75 - Cash Flow to Sales<br>
X76 - Fixed Assets to Assets<br>
X77 - Current Liability to Liability<br>
X78 - Current Liability to Equity<br>
X79 - Equity to Long-term Liability<br>
X80 - Cash Flow to Total Assets<br>
X81 - Cash Flow to Liability<br>
X82 - CFO to Assets<br>
X83 - Cash Flow to Equity<br>
X84 - Current Liability to Current Assets<br>
X85 - Liability-Assets Flag: 1 if Total Liability exceeds Total Assets, 0 otherwise<br>
X86 - Net Income to Total Assets<br>
X87 - Total assets to GNP price<br>
X88 - No-credit Interval<br>
X89 - Gross Profit to Sales<br>
X90 - Net Income to Stockholder's Equity<br>
X91 - Liability to Equity<br>
X92 - Degree of Financial Leverage (DFL)<br>
X93 - Interest Coverage Ratio (Interest expense to EBIT)<br>
X94 - Net Income Flag: 1 if Net Income is Negative for the last two years, 0 otherwise<br>
X95 - Equity to Liability<br>
<br>
<br>
## *Imports*

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [3]:
bank = pd.read_csv("data.csv")
bank

Unnamed: 0,Bankrupt?,ROA(C) before interest and depreciation before interest,ROA(A) before interest and % after tax,ROA(B) before interest and depreciation after tax,Operating Gross Margin,Realized Sales Gross Margin,Operating Profit Rate,Pre-tax net Interest Rate,After-tax net Interest Rate,Non-industry income and expenditure/revenue,...,Net Income to Total Assets,Total assets to GNP price,No-credit Interval,Gross Profit to Sales,Net Income to Stockholder's Equity,Liability to Equity,Degree of Financial Leverage (DFL),Interest Coverage Ratio (Interest expense to EBIT),Net Income Flag,Equity to Liability
0,1,0.370594,0.424389,0.405750,0.601457,0.601457,0.998969,0.796887,0.808809,0.302646,...,0.716845,0.009219,0.622879,0.601453,0.827890,0.290202,0.026601,0.564050,1,0.016469
1,1,0.464291,0.538214,0.516730,0.610235,0.610235,0.998946,0.797380,0.809301,0.303556,...,0.795297,0.008323,0.623652,0.610237,0.839969,0.283846,0.264577,0.570175,1,0.020794
2,1,0.426071,0.499019,0.472295,0.601450,0.601364,0.998857,0.796403,0.808388,0.302035,...,0.774670,0.040003,0.623841,0.601449,0.836774,0.290189,0.026555,0.563706,1,0.016474
3,1,0.399844,0.451265,0.457733,0.583541,0.583541,0.998700,0.796967,0.808966,0.303350,...,0.739555,0.003252,0.622929,0.583538,0.834697,0.281721,0.026697,0.564663,1,0.023982
4,1,0.465022,0.538432,0.522298,0.598783,0.598783,0.998973,0.797366,0.809304,0.303475,...,0.795016,0.003878,0.623521,0.598782,0.839973,0.278514,0.024752,0.575617,1,0.035490
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6814,0,0.493687,0.539468,0.543230,0.604455,0.604462,0.998992,0.797409,0.809331,0.303510,...,0.799927,0.000466,0.623620,0.604455,0.840359,0.279606,0.027064,0.566193,1,0.029890
6815,0,0.475162,0.538269,0.524172,0.598308,0.598308,0.998992,0.797414,0.809327,0.303520,...,0.799748,0.001959,0.623931,0.598306,0.840306,0.278132,0.027009,0.566018,1,0.038284
6816,0,0.472725,0.533744,0.520638,0.610444,0.610213,0.998984,0.797401,0.809317,0.303512,...,0.797778,0.002840,0.624156,0.610441,0.840138,0.275789,0.026791,0.565158,1,0.097649
6817,0,0.506264,0.559911,0.554045,0.607850,0.607850,0.999074,0.797500,0.809399,0.303498,...,0.811808,0.002837,0.623957,0.607846,0.841084,0.277547,0.026822,0.565302,1,0.044009


In [4]:
bank.columns

Index(['Bankrupt?', ' ROA(C) before interest and depreciation before interest',
       ' ROA(A) before interest and % after tax',
       ' ROA(B) before interest and depreciation after tax',
       ' Operating Gross Margin', ' Realized Sales Gross Margin',
       ' Operating Profit Rate', ' Pre-tax net Interest Rate',
       ' After-tax net Interest Rate',
       ' Non-industry income and expenditure/revenue',
       ' Continuous interest rate (after tax)', ' Operating Expense Rate',
       ' Research and development expense rate', ' Cash flow rate',
       ' Interest-bearing debt interest rate', ' Tax rate (A)',
       ' Net Value Per Share (B)', ' Net Value Per Share (A)',
       ' Net Value Per Share (C)', ' Persistent EPS in the Last Four Seasons',
       ' Cash Flow Per Share', ' Revenue Per Share (Yuan ¥)',
       ' Operating Profit Per Share (Yuan ¥)',
       ' Per Share Net profit before tax (Yuan ¥)',
       ' Realized Sales Gross Profit Growth Rate',
       ' Operating Profit

In [5]:
bank.describe()

Unnamed: 0,Bankrupt?,ROA(C) before interest and depreciation before interest,ROA(A) before interest and % after tax,ROA(B) before interest and depreciation after tax,Operating Gross Margin,Realized Sales Gross Margin,Operating Profit Rate,Pre-tax net Interest Rate,After-tax net Interest Rate,Non-industry income and expenditure/revenue,...,Net Income to Total Assets,Total assets to GNP price,No-credit Interval,Gross Profit to Sales,Net Income to Stockholder's Equity,Liability to Equity,Degree of Financial Leverage (DFL),Interest Coverage Ratio (Interest expense to EBIT),Net Income Flag,Equity to Liability
count,6819.0,6819.0,6819.0,6819.0,6819.0,6819.0,6819.0,6819.0,6819.0,6819.0,...,6819.0,6819.0,6819.0,6819.0,6819.0,6819.0,6819.0,6819.0,6819.0,6819.0
mean,0.032263,0.50518,0.558625,0.553589,0.607948,0.607929,0.998755,0.79719,0.809084,0.303623,...,0.80776,18629420.0,0.623915,0.607946,0.840402,0.280365,0.027541,0.565358,1.0,0.047578
std,0.17671,0.060686,0.06562,0.061595,0.016934,0.016916,0.01301,0.012869,0.013601,0.011163,...,0.040332,376450100.0,0.01229,0.016934,0.014523,0.014463,0.015668,0.013214,0.0,0.050014
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
25%,0.0,0.476527,0.535543,0.527277,0.600445,0.600434,0.998969,0.797386,0.809312,0.303466,...,0.79675,0.0009036205,0.623636,0.600443,0.840115,0.276944,0.026791,0.565158,1.0,0.024477
50%,0.0,0.502706,0.559802,0.552278,0.605997,0.605976,0.999022,0.797464,0.809375,0.303525,...,0.810619,0.002085213,0.623879,0.605998,0.841179,0.278778,0.026808,0.565252,1.0,0.033798
75%,0.0,0.535563,0.589157,0.584105,0.613914,0.613842,0.999095,0.797579,0.809469,0.303585,...,0.826455,0.005269777,0.624168,0.613913,0.842357,0.281449,0.026913,0.565725,1.0,0.052838
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,9820000000.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [8]:
bank.rename(columns= lambda x: x.strip(),inplace=True)
bank.columns

Index(['Bankrupt?', 'ROA(C) before interest and depreciation before interest',
       'ROA(A) before interest and % after tax',
       'ROA(B) before interest and depreciation after tax',
       'Operating Gross Margin', 'Realized Sales Gross Margin',
       'Operating Profit Rate', 'Pre-tax net Interest Rate',
       'After-tax net Interest Rate',
       'Non-industry income and expenditure/revenue',
       'Continuous interest rate (after tax)', 'Operating Expense Rate',
       'Research and development expense rate', 'Cash flow rate',
       'Interest-bearing debt interest rate', 'Tax rate (A)',
       'Net Value Per Share (B)', 'Net Value Per Share (A)',
       'Net Value Per Share (C)', 'Persistent EPS in the Last Four Seasons',
       'Cash Flow Per Share', 'Revenue Per Share (Yuan ¥)',
       'Operating Profit Per Share (Yuan ¥)',
       'Per Share Net profit before tax (Yuan ¥)',
       'Realized Sales Gross Profit Growth Rate',
       'Operating Profit Growth Rate', 'After-tax

In [10]:
for c in bank.columns:
    d=bank[c].isna().sum()
    if d>0:
        print("{} has {} missing values".format(c,d))

In [11]:
bank.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6819 entries, 0 to 6818
Data columns (total 96 columns):
 #   Column                                                   Non-Null Count  Dtype  
---  ------                                                   --------------  -----  
 0   Bankrupt?                                                6819 non-null   int64  
 1   ROA(C) before interest and depreciation before interest  6819 non-null   float64
 2   ROA(A) before interest and % after tax                   6819 non-null   float64
 3   ROA(B) before interest and depreciation after tax        6819 non-null   float64
 4   Operating Gross Margin                                   6819 non-null   float64
 5   Realized Sales Gross Margin                              6819 non-null   float64
 6   Operating Profit Rate                                    6819 non-null   float64
 7   Pre-tax net Interest Rate                                6819 non-null   float64
 8   After-tax net Interest Rate 

In [12]:
bank.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Bankrupt?,6819.0,0.032263,0.176710,0.0,0.000000,0.000000,0.000000,1.0
ROA(C) before interest and depreciation before interest,6819.0,0.505180,0.060686,0.0,0.476527,0.502706,0.535563,1.0
ROA(A) before interest and % after tax,6819.0,0.558625,0.065620,0.0,0.535543,0.559802,0.589157,1.0
ROA(B) before interest and depreciation after tax,6819.0,0.553589,0.061595,0.0,0.527277,0.552278,0.584105,1.0
Operating Gross Margin,6819.0,0.607948,0.016934,0.0,0.600445,0.605997,0.613914,1.0
...,...,...,...,...,...,...,...,...
Liability to Equity,6819.0,0.280365,0.014463,0.0,0.276944,0.278778,0.281449,1.0
Degree of Financial Leverage (DFL),6819.0,0.027541,0.015668,0.0,0.026791,0.026808,0.026913,1.0
Interest Coverage Ratio (Interest expense to EBIT),6819.0,0.565358,0.013214,0.0,0.565158,0.565252,0.565725,1.0
Net Income Flag,6819.0,1.000000,0.000000,1.0,1.000000,1.000000,1.000000,1.0


In [13]:
def expanded_stats(col):
    mean= bank[col].mean()
    stdev= bank[col].std()
    ranged= bank[col].max() - bank[col].min()
    print("{} has a range of {}, which is within {} zscores,  and standard dev of {} with a CV% of {}".format(c,ranged, round(ranged/stdev,2),round(stdev,2), round(100*stdev/mean)))

In [14]:
for c in bank.columns:
    expanded_stats(c)

Bankrupt? has a range of 1, which is within 5.66 zscores,  and standard dev of 0.18 with a CV% of 548
ROA(C) before interest and depreciation before interest has a range of 1.0, which is within 16.48 zscores,  and standard dev of 0.06 with a CV% of 12
ROA(A) before interest and % after tax has a range of 1.0, which is within 15.24 zscores,  and standard dev of 0.07 with a CV% of 12
ROA(B) before interest and depreciation after tax has a range of 1.0, which is within 16.24 zscores,  and standard dev of 0.06 with a CV% of 11
Operating Gross Margin has a range of 1.0, which is within 59.05 zscores,  and standard dev of 0.02 with a CV% of 3
Realized Sales Gross Margin has a range of 1.0, which is within 59.12 zscores,  and standard dev of 0.02 with a CV% of 3
Operating Profit Rate has a range of 1.0, which is within 76.86 zscores,  and standard dev of 0.01 with a CV% of 1
Pre-tax net Interest Rate has a range of 1.0, which is within 77.71 zscores,  and standard dev of 0.01 with a CV% of 2


  print("{} has a range of {}, which is within {} zscores,  and standard dev of {} with a CV% of {}".format(c,ranged, round(ranged/stdev,2),round(stdev,2), round(100*stdev/mean)))


In [15]:
%pip install xgboost

Note: you may need to restart the kernel to use updated packages.
