<a href="https://colab.research.google.com/github/kumarsinghashu/Credit-Card-Default-Prediction/blob/main/Credit_Card_Default_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <b><u> Project Title : Predicting whether a customer will default on his/her credit card </u></b>


<b/>Project Type - Classification.

Contribution - Individual</b>

## <b> Problem Description </b>

### This project is aimed at predicting the case of customers default payments in Taiwan. From the perspective of risk management, the result of predictive accuracy of the estimated probability of default will be more valuable than the binary result of classification - credible or not credible clients. We can use the [K-S chart](https://www.listendata.com/2019/07/KS-Statistics-Python.html) to evaluate which customers will default on their credit card payments


## <b> Data Description </b>

### <b>Attribute Information: </b>

### This research employed a binary variable, default payment (Yes = 1, No = 0), as the response variable. This study reviewed the literature and used the following 23 variables as explanatory variables:
* ### X1: Amount of the given credit (NT dollar): it includes both the individual consumer credit and his/her family (supplementary) credit.
* ### X2: Gender (1 = male; 2 = female).
* ### X3: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others).
* ### X4: Marital status (1 = married; 2 = single; 3 = others).
* ### X5: Age (year).
* ### X6 - X11: History of past payment. We tracked the past monthly payment records (from April to September, 2005) as follows: X6 = the repayment status in September, 2005; X7 = the repayment status in August, 2005; . . .;X11 = the repayment status in April, 2005. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above.
* ### X12-X17: Amount of bill statement (NT dollar). X12 = amount of bill statement in September, 2005; X13 = amount of bill statement in August, 2005; . . .; X17 = amount of bill statement in April, 2005.
* ### X18-X23: Amount of previous payment (NT dollar). X18 = amount paid in September, 2005; X19 = amount paid in August, 2005; . . .;X23 = amount paid in April, 2005.

# **Bring in Necessary Packages and Dataset.**

In [1]:
!pip install -U scikit-learn




In [8]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
# from sklearn.preprocessing import OneHotEncoder,LabelBinarizer,StandardScaler, Imputer, LabelEncoder
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.svm import LinearSVC
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV, cross_val_predict
from sklearn.metrics import classification_report, accuracy_score,confusion_matrix,precision_score, recall_score, roc_curve, roc_auc_score
from sklearn.decomposition import PCA
from sklearn.manifold import LocallyLinearEmbedding
from sklearn.ensemble import BaggingClassifier, RandomForestClassifier, VotingClassifier

import warnings
warnings.filterwarnings("ignore")

#<b/>Datset Loading</b>

In [9]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [12]:
pip install --upgrade xlrd



In [13]:
import xlrd

In [11]:
df = pd.read_excel('/content/drive/MyDrive/Credit Card defualt prediction/default of credit card clients.xls')

# **Detailed Data Description:**
## **> Basic User Data.**
* **ID :** Unique ID of each client.
* **LIMIT_BAL :** Amount of the given credit (NT dollar) : it includes both the individual consumer credit and his/her family (supplementary) credit.
* **SEX :**  Gender (1 = male; 2 = female).
* **EDUCATION :** Qualifications (1 = graduate school; 2 = university; 3 = high school; 4 = others).
* **MARRIAGE :** Marital status (1 = married; 2 = single; 3 = others).
* **AGE :** Age of the client (years)

## > **History of Past Payment.**
**Scale for PAY_0 to PAY_6 :** (-2 = No consumption, -1 = paid in full, 0 = use of revolving credit (paid minimum only), 1 = payment delay for one month, 2 = payment delay for two months, ... 8 = payment delay for eight months, 9 = payment delay for nine months and above)

* **PAY_0 :** Repayment status in September, 2005 (scale same as above)
* **PAY_2 :** Repayment status in August, 2005 (scale same as above)
* **PAY_3 :** Repayment status in July, 2005 (scale same as above)
* **PAY_4 :** Repayment status in June, 2005 (scale same as above)
* **PAY_5 :** Repayment status in May, 2005 (scale same as above)
* **PAY_6 :** Repayment status in April, 2005 (scale same as above)

## > **Amount of Bill Statement.**
* **BILL_AMT1 :** Amount of bill statement in September, 2005 (NT dollar)
* **BILL_AMT2 :** Amount of bill statement in August, 2005 (NT dollar)
* **BILL_AMT3 :** Amount of bill statement in July, 2005 (NT dollar)
* **BILL_AMT4 :** Amount of bill statement in June, 2005 (NT dollar)
* **BILL_AMT5 :** Amount of bill statement in May, 2005 (NT dollar)
* **BILL_AMT6 :** Amount of bill statement in April, 2005 (NT dollar)

## > **Amount of Previous Payment.**
* **PAY_AMT1 :** Amount of previous payment in September, 2005 (NT dollar)
* **PAY_AMT2 :** Amount of previous payment in August, 2005 (NT dollar)
* **PAY_AMT3 :** Amount of previous payment in July, 2005 (NT dollar)
* **PAY_AMT4 :** Amount of previous payment in June, 2005 (NT dollar)
* **PAY_AMT5 :** Amount of previous payment in May, 2005 (NT dollar)
* **PAY_AMT6 :** Amount of previous payment in April, 2005 (NT dollar)

## >  **Response Variable.**
* **default payment next month :** Default payment (1=yes, 0=no)


#<b/>Dataset First View</b>

In [16]:
#Top rows
df.head(3)

Unnamed: 0.1,Unnamed: 0,X1,X2,X3,X4,X5,X6,X7,X8,X9,...,X15,X16,X17,X18,X19,X20,X21,X22,X23,Y
0,ID,LIMIT_BAL,SEX,EDUCATION,MARRIAGE,AGE,PAY_0,PAY_2,PAY_3,PAY_4,...,BILL_AMT4,BILL_AMT5,BILL_AMT6,PAY_AMT1,PAY_AMT2,PAY_AMT3,PAY_AMT4,PAY_AMT5,PAY_AMT6,default payment next month
1,1,20000,2,2,1,24,2,2,-1,-1,...,0,0,0,0,689,0,0,0,0,1
2,2,120000,2,2,2,26,-1,2,0,0,...,3272,3455,3261,0,1000,1000,1000,0,2000,1


In [17]:
#Bottom rows
df.tail(3)

Unnamed: 0.1,Unnamed: 0,X1,X2,X3,X4,X5,X6,X7,X8,X9,...,X15,X16,X17,X18,X19,X20,X21,X22,X23,Y
29998,29998,30000,1,2,2,37,4,3,2,-1,...,20878,20582,19357,0,0,22000,4200,2000,3100,1
29999,29999,80000,1,3,1,41,1,-1,0,0,...,52774,11855,48944,85900,3409,1178,1926,52964,1804,1
30000,30000,50000,1,2,1,46,0,0,0,0,...,36535,32428,15313,2078,1800,1430,1000,1000,1000,1


#<b/>Dataset Rows and Columns</b>

In [18]:
# Dataset Rows & Columns count
rows = len(df.axes[0])
cols = len(df.axes[1])

print("Number of Rows: " + str(rows))
print("Number of Columns: " + str(cols))

Number of Rows: 30001
Number of Columns: 25


#<b/>Dataset Information</b>

In [19]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30001 entries, 0 to 30000
Data columns (total 25 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Unnamed: 0  30001 non-null  object
 1   X1          30001 non-null  object
 2   X2          30001 non-null  object
 3   X3          30001 non-null  object
 4   X4          30001 non-null  object
 5   X5          30001 non-null  object
 6   X6          30001 non-null  object
 7   X7          30001 non-null  object
 8   X8          30001 non-null  object
 9   X9          30001 non-null  object
 10  X10         30001 non-null  object
 11  X11         30001 non-null  object
 12  X12         30001 non-null  object
 13  X13         30001 non-null  object
 14  X14         30001 non-null  object
 15  X15         30001 non-null  object
 16  X16         30001 non-null  object
 17  X17         30001 non-null  object
 18  X18         30001 non-null  object
 19  X19         30001 non-null  object
 20  X20   