# Credit Score

## 1. Problem Definition

🎯 The goal of this challenge is to build a machine learning model that can classify the credit score, given a person’s credit-related information.

## 2. Data Collection and Preparation

In [1]:
# Data manipulation
import numpy as np
import pandas as pd
# Data visualisation
import matplotlib.pyplot as plt
import seaborn as sns

In [4]:
# Import
data_train = pd.read_csv('raw_data/train.csv', low_memory=False)
data_test = pd.read_csv('raw_data/test.csv', low_memory=False)

In [22]:
# Display all columns
pd.set_option('display.max_columns', None)

In [20]:
data_train.head()

Unnamed: 0,ID,Customer_ID,Month,Name,Age,SSN,Occupation,Annual_Income,Monthly_Inhand_Salary,Num_Bank_Accounts,Num_Credit_Card,Interest_Rate,Num_of_Loan,Type_of_Loan,Delay_from_due_date,Num_of_Delayed_Payment,Changed_Credit_Limit,Num_Credit_Inquiries,Credit_Mix,Outstanding_Debt,Credit_Utilization_Ratio,Credit_History_Age,Payment_of_Min_Amount,Total_EMI_per_month,Amount_invested_monthly,Payment_Behaviour,Monthly_Balance,Credit_Score
0,0x1602,CUS_0xd40,January,Aaron Maashoh,23,821-00-0265,Scientist,19114.12,1824.843333,3,4,3,4,"Auto Loan, Credit-Builder Loan, Personal Loan,...",3,7.0,11.27,4.0,_,809.98,26.82262,22 Years and 1 Months,No,49.574949,80.41529543900253,High_spent_Small_value_payments,312.49408867943663,Good
1,0x1603,CUS_0xd40,February,Aaron Maashoh,23,821-00-0265,Scientist,19114.12,,3,4,3,4,"Auto Loan, Credit-Builder Loan, Personal Loan,...",-1,,11.27,4.0,Good,809.98,31.94496,,No,49.574949,118.28022162236736,Low_spent_Large_value_payments,284.62916249607184,Good
2,0x1604,CUS_0xd40,March,Aaron Maashoh,-500,821-00-0265,Scientist,19114.12,,3,4,3,4,"Auto Loan, Credit-Builder Loan, Personal Loan,...",3,7.0,_,4.0,Good,809.98,28.609352,22 Years and 3 Months,No,49.574949,81.699521264648,Low_spent_Medium_value_payments,331.2098628537912,Good
3,0x1605,CUS_0xd40,April,Aaron Maashoh,23,821-00-0265,Scientist,19114.12,,3,4,3,4,"Auto Loan, Credit-Builder Loan, Personal Loan,...",5,4.0,6.27,4.0,Good,809.98,31.377862,22 Years and 4 Months,No,49.574949,199.4580743910713,Low_spent_Small_value_payments,223.45130972736783,Good
4,0x1606,CUS_0xd40,May,Aaron Maashoh,23,821-00-0265,Scientist,19114.12,1824.843333,3,4,3,4,"Auto Loan, Credit-Builder Loan, Personal Loan,...",6,,11.27,4.0,Good,809.98,24.797347,22 Years and 5 Months,No,49.574949,41.420153086217326,High_spent_Medium_value_payments,341.48923103222177,Good


In [21]:
data_test.head()

Unnamed: 0,ID,Customer_ID,Month,Name,Age,SSN,Occupation,Annual_Income,Monthly_Inhand_Salary,Num_Bank_Accounts,Num_Credit_Card,Interest_Rate,Num_of_Loan,Type_of_Loan,Delay_from_due_date,Num_of_Delayed_Payment,Changed_Credit_Limit,Num_Credit_Inquiries,Credit_Mix,Outstanding_Debt,Credit_Utilization_Ratio,Credit_History_Age,Payment_of_Min_Amount,Total_EMI_per_month,Amount_invested_monthly,Payment_Behaviour,Monthly_Balance
0,0x160a,CUS_0xd40,September,Aaron Maashoh,23,821-00-0265,Scientist,19114.12,1824.843333,3,4,3,4,"Auto Loan, Credit-Builder Loan, Personal Loan,...",3,7,11.27,2022.0,Good,809.98,35.030402,22 Years and 9 Months,No,49.574949,236.64268203272132,Low_spent_Small_value_payments,186.26670208571767
1,0x160b,CUS_0xd40,October,Aaron Maashoh,24,821-00-0265,Scientist,19114.12,1824.843333,3,4,3,4,"Auto Loan, Credit-Builder Loan, Personal Loan,...",3,9,13.27,4.0,Good,809.98,33.053114,22 Years and 10 Months,No,49.574949,21.465380264657146,High_spent_Medium_value_payments,361.444003853782
2,0x160c,CUS_0xd40,November,Aaron Maashoh,24,821-00-0265,Scientist,19114.12,1824.843333,3,4,3,4,"Auto Loan, Credit-Builder Loan, Personal Loan,...",-1,4,12.27,4.0,Good,809.98,33.811894,,No,49.574949,148.23393788500923,Low_spent_Medium_value_payments,264.67544623343
3,0x160d,CUS_0xd40,December,Aaron Maashoh,24_,821-00-0265,Scientist,19114.12,,3,4,3,4,"Auto Loan, Credit-Builder Loan, Personal Loan,...",4,5,11.27,4.0,Good,809.98,32.430559,23 Years and 0 Months,No,49.574949,39.08251089460281,High_spent_Medium_value_payments,343.82687322383634
4,0x1616,CUS_0x21b1,September,Rick Rothackerj,28,004-07-5839,_______,34847.84,3037.986667,2,4,6,1,Credit-Builder Loan,3,1,5.42,5.0,Good,605.03,25.926822,27 Years and 3 Months,No,18.816215,39.684018417945296,High_spent_Large_value_payments,485.2984336755923


👀 data_test does not contain the target variable 'Credit_Score'.

In [9]:
df = data_train

In [17]:
df.shape

(100000, 28)

In [14]:
df.describe()

Unnamed: 0,Monthly_Inhand_Salary,Num_Bank_Accounts,Num_Credit_Card,Interest_Rate,Delay_from_due_date,Num_Credit_Inquiries,Credit_Utilization_Ratio,Total_EMI_per_month
count,84998.0,100000.0,100000.0,100000.0,100000.0,98035.0,100000.0,100000.0
mean,4194.17085,17.09128,22.47443,72.46604,21.06878,27.754251,32.285173,1403.118217
std,3183.686167,117.404834,129.05741,466.422621,14.860104,193.177339,5.116875,8306.04127
min,303.645417,-1.0,0.0,1.0,-5.0,0.0,20.0,0.0
25%,1625.568229,3.0,4.0,8.0,10.0,3.0,28.052567,30.30666
50%,3093.745,6.0,5.0,13.0,18.0,6.0,32.305784,69.249473
75%,5957.448333,7.0,7.0,20.0,28.0,9.0,36.496663,161.224249
max,15204.633333,1798.0,1499.0,5797.0,67.0,2597.0,50.0,82331.0


- **ID**: Represents a unique identification of an entry
- **Customer_ID**: Represents a unique identification of a person
- **Month**: Represents the month of the year
- **Name**: Represents the name of a person
- **Age**: Represents the age of the person
- **SSN**: Represents the social security number of a person
- **Occupation**: Represents the occupation of the person
- **Annual_Income**: Represents the annual income of the person
- **Monthly_Inhand_Salary**: Represents the monthly base salary of a person
- **Num_Bank_Accounts**: Represents the number of bank accounts a person holds
- **Num_Credit_Card**: Represents the number of other credit cards held by a person
- **Interest_Rate**: Represents the interest rate on credit card
- **Num_of_Loan**: Represents the number of loans taken from the bank
- **Type_of_Loan**: Represents the types of loan taken by a person
- **Delay_from_due_date**: Represents the average number of days delayed from the payment date
- **Num_of_Delayed_Payment**: Represents the average number of payments delayed by a person
- **Changed_Credit_Limit**: Represents the percentage change in credit card limit
- **Num_Credit_Inquiries**: Represents the number of credit card inquiries
- **Credit_Mix**: Represents the classification of the mix of credits
- **Outstanding_Debt**: Represents the remaining debt to be paid (in USD)
- **Credit_Utilization_Ratio**: Represents the utilization ratio of credit card
- **Credit_History_Age**: Represents the age of credit history of the person
- **Payment_of_Min_Amount**: Represents whether only the minimum amount was paid by the person
- **Total_EMI_per_month**: Represents the monthly EMI payments (in USD)
- **Amount_invested_monthly**: Represents the monthly amount invested by the customer (in USD)
- **Payment_Behaviour**: Represents the payment behavior of the customer (in USD)
- **Monthly_Balance**: Represents the monthly balance amount of the customer (in USD)
- **Credit_Score**: Represents the bracket of credit score (Poor, Standard, Good)

In [18]:
round(df.isnull().sum().sort_values(ascending=False)/len(df),3) #NaN percentage for each column

Monthly_Inhand_Salary       0.150
Type_of_Loan                0.114
Name                        0.100
Credit_History_Age          0.090
Num_of_Delayed_Payment      0.070
Amount_invested_monthly     0.045
Num_Credit_Inquiries        0.020
Monthly_Balance             0.012
ID                          0.000
Changed_Credit_Limit        0.000
Payment_Behaviour           0.000
Total_EMI_per_month         0.000
Payment_of_Min_Amount       0.000
Credit_Utilization_Ratio    0.000
Outstanding_Debt            0.000
Credit_Mix                  0.000
Delay_from_due_date         0.000
Customer_ID                 0.000
Num_of_Loan                 0.000
Interest_Rate               0.000
Num_Credit_Card             0.000
Num_Bank_Accounts           0.000
Annual_Income               0.000
Occupation                  0.000
SSN                         0.000
Age                         0.000
Month                       0.000
Credit_Score                0.000
dtype: float64

- analyser les data
- string > dummy ou numero
- faire des graphiques
- histogramme > voir si normal et conclure OK / sinon transformation
- voir correlation / target seaborn pairplot (lineraire? correlation?)

In [26]:
df.corr()

ValueError: could not convert string to float: '0x1602'

In [8]:
missing_values_count = data_train.isnull().sum().sum()
missing_values_count

np.int64(60071)