# Default of Credit Card Clients Data Set
## Description

Classifies the probability of default of customers' in Taiwan described by social and financial attributes.

The column attributes are as follows:

| Id | Attribute |Domain |
| -   | ----------- |----------- |
| 1   | Amount of the given credit (NT dollar) | numerical |
| 2   | Gender | 1 = male, 2 = female |
| 3   | Education | 1 = graduate school, 2 = university, 3 = high school, 4 = others |
| 4   | Marital status | 1 = married, 2 = single, 3 = others |
| 5   | Age (year) | numerical |
| 6   | Payment record of september 2005 | -1 = pay duly, 1 = payment delay for one month, 2 = payment delay for two months, . . ., 8 = payment delay for eight months, 9 = payment delay for nine months and above. |
| 7   | Payment record of august 2005 | -1 = pay duly, 1 = payment delay for one month, 2 = payment delay for two months, . . ., 8 = payment delay for eight months, 9 = payment delay for nine months and above. |
| 8   | Payment record of july 2005 | -1 = pay duly, 1 = payment delay for one month, 2 = payment delay for two months, . . ., 8 = payment delay for eight months, 9 = payment delay for nine months and above. |
| 9   | Payment record of june 2005 | -1 = pay duly, 1 = payment delay for one month, 2 = payment delay for two months, . . ., 8 = payment delay for eight months, 9 = payment delay for nine months and above. |
| 10  | Payment record of may 2005 | -1 = pay duly, 1 = payment delay for one month, 2 = payment delay for two months, . . ., 8 = payment delay for eight months, 9 = payment delay for nine months and above. |
| 11  | Payment record of april 2005 | -1 = pay duly, 1 = payment delay for one month, 2 = payment delay for two months, . . ., 8 = payment delay for eight months, 9 = payment delay for nine months and above. |
| 12  | Amount of bill statement of september 2005 (NT dollar) | numerical |
| 13  | Amount of bill statement of august 2005 (NT dollar) | numerical |
| 14  | Amount of bill statement of july 2005 (NT dollar) | numerical |
| 15  | Amount of bill statement of june 2005 (NT dollar) | numerical |
| 16  | Amount of bill statement of may 2005 (NT dollar) | numerical |
| 17  | Amount of bill statement of april 2005 (NT dollar) | numerical |
| 18  | Amount of previous statement of september 2005 (NT dollar) | numerical |
| 19  | Amount of previous statement of august 2005 (NT dollar) | numerical |
| 20  | Amount of previous statement of july 2005 (NT dollar) | numerical |
| 21  | Amount of previous statement of june 2005 (NT dollar) | numerical |
| 22  | Amount of previous statement of may 2005 (NT dollar) | numerical |
| 23  | Amount of previous statement of april 2005 (NT dollar) | numerical |

## Importin and processing dataset

In [2]:
import os                        # for os.path.exists
import json                      # for loading metadata
import urllib                    # for downloading remote files 
import numpy as np
import pandas as pd

In [6]:
def download(remoteurl: str, localfile: str):
    """
    Download remoteurl to localfile, unless localfile already exists.
    Returns the localfile string.
    """
    localfile = "../../datasets/classification/"+localfile
    if not os.path.exists(localfile):
        print("Downloading %s..." % localfile)
        filename, headers = urllib.request.urlretrieve(remoteurl, localfile)
    return localfile

In [7]:
data_file = download("https://archive.ics.uci.edu/ml/machine-learning-databases/00350/default%20of%20credit%20card%20clients.xls", "default-of-credit-card-clients.xls")

data = pd.read_excel(data_file, header = 1, index_col = 0)

data = data.replace("?", np.nan) 
data = data.dropna() 

X = (data.iloc[:,:data.shape[1]-1])
y = (data.iloc[:,data.shape[1]-1:])

X = X.to_numpy()
y = y.to_numpy().flatten()

## Importing libraries

In [10]:
import matplotlib as mpl
import matplotlib.pyplot as plt
import sklearn
import sklearn.tree
import sklearn.neighbors
import sklearn.ensemble
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

In [13]:
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=0)