# Bankruptcy Prediction with KNN - For Machine Learning Newbie:)

---
## Info 
### Context
The data were collected from the Taiwan Economic Journal for the years 1999 to 2009. Company bankruptcy was defined based on the business regulations of the Taiwan Stock Exchange.

### Attribute Information

**Y = Output feature, X = Input features)**

- Y - Bankrupt?: Class label
- X1 - ROA(C) before interest and depreciation before interest: Return On Total Assets(C)
- X2 - ROA(A) before interest and % after tax: Return On Total Assets(A)
- X3 - ROA(B) before interest and depreciation after tax: Return On Total Assets(B)
- X4 - Operating Gross Margin: Gross Profit/Net Sales
- X5 - Realized Sales Gross Margin: Realized Gross Profit/Net Sales
- X6 - Operating Profit Rate: Operating Income/Net Sales
- X7 - Pre-tax net Interest Rate: Pre-Tax Income/Net Sales
- X8 - After-tax net Interest Rate: Net Income/Net Sales
- X9 - Non-industry income and expenditure/revenue: Net Non-operating Income Ratio
- X10 - Continuous interest rate (after tax): Net Income-Exclude Disposal Gain or Loss/Net Sales
- X11 - Operating Expense Rate: Operating Expenses/Net Sales
- X12 - Research and development expense rate: (Research and Development Expenses)/Net Sales
- X13 - Cash flow rate: Cash Flow from Operating/Current Liabilities
- X14 - Interest-bearing debt interest rate: Interest-bearing Debt/Equity
- X15 - Tax rate (A): Effective Tax Rate
- X16 - Net Value Per Share (B): Book Value Per Share(B)
- X17 - Net Value Per Share (A): Book Value Per Share(A)
- X18 - Net Value Per Share (C): Book Value Per Share(C)
- X19 - Persistent EPS in the Last Four Seasons: EPS-Net Income
- X20 - Cash Flow Per Share
- X21 - Revenue Per Share (Yuan ¥): Sales Per Share
- X22 - Operating Profit Per Share (Yuan ¥): Operating Income Per Share
- X23 - Per Share Net profit before tax (Yuan ¥): Pretax Income Per Share
- X24 - Realized Sales Gross Profit Growth Rate
- X25 - Operating Profit Growth Rate: Operating Income Growth
- X26 - After-tax Net Profit Growth Rate: Net Income Growth
- X27 - Regular Net Profit Growth Rate: Continuing Operating Income after Tax Growth
- X28 - Continuous Net Profit Growth Rate: Net Income-Excluding Disposal Gain or Loss Growth
- X29 - Total Asset Growth Rate: Total Asset Growth
- X30 - Net Value Growth Rate: Total Equity Growth
- X31 - Total Asset Return Growth Rate Ratio: Return on Total Asset Growth
- X32 - Cash Reinvestment %: Cash Reinvestment Ratio
- X33 - Current Ratio
- X34 - Quick Ratio: Acid Test
- X35 - Interest Expense Ratio: Interest Expenses/Total Revenue
- X36 - Total debt/Total net worth: Total Liability/Equity Ratio
- X37 - Debt ratio %: Liability/Total Assets
- X38 - Net worth/Assets: Equity/Total Assets
- X39 - Long-term fund suitability ratio (A): (Long-term Liability+Equity)/Fixed Assets
- X40 - Borrowing dependency: Cost of Interest-bearing Debt
- X41 - Contingent liabilities/Net worth: Contingent Liability/Equity
- X42 - Operating profit/Paid-in capital: Operating Income/Capital
- X43 - Net profit before tax/Paid-in capital: Pretax Income/Capital
- X44 - Inventory and accounts receivable/Net value: (Inventory+Accounts Receivables)/Equity
- X45 - Total Asset Turnover
- X46 - Accounts Receivable Turnover
- X47 - Average Collection Days: Days Receivable Outstanding
- X48 - Inventory Turnover Rate (times)
- X49 - Fixed Assets Turnover Frequency
- X50 - Net Worth Turnover Rate (times): Equity Turnover
- X51 - Revenue per person: Sales Per Employee
- X52 - Operating profit per person: Operation Income Per Employee
- X53 - Allocation rate per person: Fixed Assets Per Employee
- X54 - Working Capital to Total Assets
- X55 - Quick Assets/Total Assets
- X56 - Current Assets/Total Assets
- X57 - Cash/Total Assets
- X58 - Quick Assets/Current Liability
- X59 - Cash/Current Liability
- X60 - Current Liability to Assets
- X61 - Operating Funds to Liability
- X62 - Inventory/Working Capital
- X63 - Inventory/Current Liability
- X64 - Current Liabilities/Liability
- X65 - Working Capital/Equity
- X66 - Current Liabilities/Equity
- X67 - Long-term Liability to Current Assets
- X68 - Retained Earnings to Total Assets
- X69 - Total income/Total expense
- X70 - Total expense/Assets
- X71 - Current Asset Turnover Rate: Current Assets to Sales
- X72 - Quick Asset Turnover Rate: Quick Assets to Sales
- X73 - Working capitcal Turnover Rate: Working Capital to Sales
- X74 - Cash Turnover Rate: Cash to Sales
- X75 - Cash Flow to Sales
- X76 - Fixed Assets to Assets
- X77 - Current Liability to Liability
- X78 - Current Liability to Equity
- X79 - Equity to Long-term Liability
- X80 - Cash Flow to Total Assets
- X81 - Cash Flow to Liability
- X82 - CFO to Assets
- X83 - Cash Flow to Equity
- X84 - Current Liability to Current Assets
- X85 - Liability-Assets Flag: 1 if Total Liability exceeds Total Assets, 0 otherwise
- X86 - Net Income to Total Assets
- X87 - Total assets to GNP price
- X88 - No-credit Interval
- X89 - Gross Profit to Sales
- X90 - Net Income to Stockholder's Equity
- X91 - Liability to Equity
- X92 - Degree of Financial Leverage (DFL)
- X93 - Interest Coverage Ratio (Interest expense to EBIT)
- X94 - Net Income Flag: 1 if Net Income is Negative for the last two years, 0 otherwise
- X95 - Equity to Liability



---
### Kernel Index
- Step 1. Data Description
- Step 2. Data Preprocessing
- Step 3. Machine Learning Modeling and Prediction

### Step 1. Data Description

In [None]:
import pandas as pd
import numpy as np

import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
df = pd.read_csv('../input/company-bankruptcy-prediction/data.csv')

In [None]:
df.head()

In [None]:
df.info()

In [None]:
df.isnull().sum().sum() # Check null values

In [None]:
df.describe().T

**I Think all valued are Min-Max-Scaled already. Let's check them**

In [None]:
over_max_count = 0
over_max_cols = []
under_min_count = 0
under_min_cols = []
for col in df.columns:
    max_value = max(df[col])
    min_value = min(df[col])
    
    if max_value > 1:
        over_max_count += 1
        over_max_cols.append(col)
    if min_value < 0:
        under_min_count += 1
        under_min_cols.append(col)
        
    print('column : {}'.format(col))
    print('max : {}'.format(max_value))
    print('min : {}'.format(min_value))
    print('----------------------------------------------------------')
print('********************')
print('********************')
print('********************')
print('over max count : ', over_max_count)
print('under min count : ', under_min_count)

In [None]:
over_max_cols

In [None]:
len(over_max_cols)

**We have to Min_Max_Scale to 24 cols equally**

### Step 2. Data Preprocessing

> sklearn.preprocessing.MinMaxScaler([scikit-learn 0.24.1 documentation](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html))

In [None]:
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
df[over_max_cols] = scaler.fit_transform(df[over_max_cols])

In [None]:
df[over_max_cols]

In [None]:
# test
df[over_max_cols[0]].max()

In [None]:
for col in df[over_max_cols]:
        
    print('column : {}'.format(col))
    print('max : {}'.format(max_value))
    print('min : {}'.format(min_value))
    print('----------------------------------------------------------')


**All Values are numeric, continuous variable. Then Easy way, Let's Try KNN Algorithm.**

Previously, We have to split data to build generalized model

In [None]:
df

In [None]:
X = df.drop('Bankrupt?', axis=1)

In [None]:
y = df['Bankrupt?']

In [None]:
X.shape, y.shape

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

In [None]:
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

Check index (shuffle, X-y match)

In [None]:
X_train

In [None]:
y_train

**Well done**

### Step 3. Machine Learning Modeling and Prediction

> sklearn.neighbors.KNeighborsClassifier([scikit-learn 0.24.1 documentation](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html))

In [None]:
from sklearn.neighbors import KNeighborsClassifier as KNN

In [None]:
# check over/underfitting
train_scores = []
test_scores = []
for i in range(1, 10): # n_neighbors = 1~9
    model = KNN(n_neighbors=i)
    model.fit(X_train,y_train)
    
    train_scores.append(model.score(X_train,y_train))
    test_scores.append(model.score(X_test,y_test))

In [None]:
plt.plot(range(1,10), train_scores, label='train_accuracy')
plt.plot(range(1,10), test_scores, label='test_accuracy')
plt.xlabel('n_neighbors')
plt.ylabel('Accuracy')
plt.legend();

**I Think 4 seems to be suitable as a N-Neighbors**

In [None]:
model = KNN(n_neighbors=4)
model.fit(X_train, y_train)
print('Final - Accuracy : ', model.score(X_test, y_test))