### Credit Score Machine Learning Model Using Lazy Predict

LazyPredict is a Python library designed to simplify and expedite the process of evaluating and comparing multiple machine learning models for a given dataset and prediction task. Lazypredict automates many of the common steps involved in machine learning model selection and evaluation.

In [1]:
!pip install lazypredict



### Importing Library and Datasets

We are trying to import all the library required, especially the lazypredict library. Besides, we also import the sklearn Pipeline to make efficient data processing. The datasets is collected from kaggle https://www.kaggle.com/datasets/parisrohan/credit-score-classification

In [2]:
import lazypredict
import pandas as pd
from lazypredict.Supervised import LazyClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer

In [3]:
data = pd.read_csv("/content/train.csv")
data.head()

Unnamed: 0,ID,Customer_ID,Month,Name,Age,SSN,Occupation,Annual_Income,Monthly_Inhand_Salary,Num_Bank_Accounts,...,Credit_Mix,Outstanding_Debt,Credit_Utilization_Ratio,Credit_History_Age,Payment_of_Min_Amount,Total_EMI_per_month,Amount_invested_monthly,Payment_Behaviour,Monthly_Balance,Credit_Score
0,0x1602,CUS_0xd40,January,Aaron Maashoh,23,821-00-0265,Scientist,19114.12,1824.84,3,...,_,809.98,26.82,22 Years and 1 Months,No,49.57,80.41529543900253,High_spent_Small_value_payments,312.49408867943663,Good
1,0x1603,CUS_0xd40,February,Aaron Maashoh,23,821-00-0265,Scientist,19114.12,,3,...,Good,809.98,31.94,,No,49.57,118.28022162236736,Low_spent_Large_value_payments,284.62916249607184,Good
2,0x1604,CUS_0xd40,March,Aaron Maashoh,-500,821-00-0265,Scientist,19114.12,,3,...,Good,809.98,28.61,22 Years and 3 Months,No,49.57,81.699521264648,Low_spent_Medium_value_payments,331.2098628537912,Good
3,0x1605,CUS_0xd40,April,Aaron Maashoh,23,821-00-0265,Scientist,19114.12,,3,...,Good,809.98,31.38,22 Years and 4 Months,No,49.57,199.4580743910713,Low_spent_Small_value_payments,223.45130972736783,Good
4,0x1606,CUS_0xd40,May,Aaron Maashoh,23,821-00-0265,Scientist,19114.12,1824.84,3,...,Good,809.98,24.8,22 Years and 5 Months,No,49.57,41.420153086217326,High_spent_Medium_value_payments,341.48923103222177,Good


In [4]:
data.describe()

Unnamed: 0,Monthly_Inhand_Salary,Num_Bank_Accounts,Num_Credit_Card,Interest_Rate,Delay_from_due_date,Num_Credit_Inquiries,Credit_Utilization_Ratio,Total_EMI_per_month
count,84998.0,100000.0,100000.0,100000.0,100000.0,98035.0,100000.0,100000.0
mean,4194.17,17.09,22.47,72.47,21.07,27.75,32.29,1403.12
std,3183.69,117.4,129.06,466.42,14.86,193.18,5.12,8306.04
min,303.65,-1.0,0.0,1.0,-5.0,0.0,20.0,0.0
25%,1625.57,3.0,4.0,8.0,10.0,3.0,28.05,30.31
50%,3093.75,6.0,5.0,13.0,18.0,6.0,32.31,69.25
75%,5957.45,7.0,7.0,20.0,28.0,9.0,36.5,161.22
max,15204.63,1798.0,1499.0,5797.0,67.0,2597.0,50.0,82331.0


In [5]:
data.columns

Index(['ID', 'Customer_ID', 'Month', 'Name', 'Age', 'SSN', 'Occupation',
       'Annual_Income', 'Monthly_Inhand_Salary', 'Num_Bank_Accounts',
       'Num_Credit_Card', 'Interest_Rate', 'Num_of_Loan', 'Type_of_Loan',
       'Delay_from_due_date', 'Num_of_Delayed_Payment', 'Changed_Credit_Limit',
       'Num_Credit_Inquiries', 'Credit_Mix', 'Outstanding_Debt',
       'Credit_Utilization_Ratio', 'Credit_History_Age',
       'Payment_of_Min_Amount', 'Total_EMI_per_month',
       'Amount_invested_monthly', 'Payment_Behaviour', 'Monthly_Balance',
       'Credit_Score'],
      dtype='object')

In [6]:
data["Credit_Score"].unique()

array(['Good', 'Standard', 'Poor'], dtype=object)

We will create a simple machine learning model. Therefore, we will only choose several parameters to build our model

In [7]:
data_new = data[ ["Monthly_Inhand_Salary", "Num_Bank_Accounts", "Num_Credit_Card", "Interest_Rate", "Delay_from_due_date","Num_Credit_Inquiries", "Credit_Utilization_Ratio", "Total_EMI_per_month", "Credit_Score" ]]

In [8]:
data_new.head()

Unnamed: 0,Monthly_Inhand_Salary,Num_Bank_Accounts,Num_Credit_Card,Interest_Rate,Delay_from_due_date,Num_Credit_Inquiries,Credit_Utilization_Ratio,Total_EMI_per_month,Credit_Score
0,1824.84,3,4,3,3,4.0,26.82,49.57,Good
1,,3,4,3,-1,4.0,31.94,49.57,Good
2,,3,4,3,3,4.0,28.61,49.57,Good
3,,3,4,3,5,4.0,31.38,49.57,Good
4,1824.84,3,4,3,6,4.0,24.8,49.57,Good


In [9]:
data_new.isnull().sum()

Monthly_Inhand_Salary       15002
Num_Bank_Accounts               0
Num_Credit_Card                 0
Interest_Rate                   0
Delay_from_due_date             0
Num_Credit_Inquiries         1965
Credit_Utilization_Ratio        0
Total_EMI_per_month             0
Credit_Score                    0
dtype: int64

In [10]:
data_new.dtypes

Monthly_Inhand_Salary       float64
Num_Bank_Accounts             int64
Num_Credit_Card               int64
Interest_Rate                 int64
Delay_from_due_date           int64
Num_Credit_Inquiries        float64
Credit_Utilization_Ratio    float64
Total_EMI_per_month         float64
Credit_Score                 object
dtype: object

In [11]:
data_new.shape

(100000, 9)

In [18]:
data_new = data_new.iloc[0:5000, : ]

### Let's create a pipeline for data processing

In [19]:
X = data_new.drop("Credit_Score", axis=1)
y = data_new["Credit_Score"]

In [20]:
numeric_features = ["Monthly_Inhand_Salary", "Num_Bank_Accounts", "Num_Credit_Card", "Interest_Rate", "Delay_from_due_date","Num_Credit_Inquiries", "Credit_Utilization_Ratio", "Total_EMI_per_month"]

In [21]:

# Create transformers for numeric and categorical columns
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='mean')),
    ('scaler', StandardScaler())
])



# Use ColumnTransformer to apply transformers to specific columns
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features)
    ])

# Create the final pipeline, including preprocessing and a classifier/regressor if needed
pipeline = Pipeline(steps=[('preprocessor', preprocessor)])

# Fit and transform the data
X_transformed = pipeline.fit_transform(X)

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X_transformed, y, test_size=0.2, random_state=42)

### Model Development

Using lazypredict, we will create various model in one single code

In [22]:
clf = LazyClassifier(verbose=0,ignore_warnings=True, custom_metric=None)
models,predictions = clf.fit(X_train, X_test, y_train, y_test)

print(models)

100%|██████████| 29/29 [00:01<00:00, 14.89it/s]

You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 647
[LightGBM] [Info] Number of data points in the train set: 800, number of used features: 8
[LightGBM] [Info] Start training from score -1.491655
[LightGBM] [Info] Start training from score -1.366492
[LightGBM] [Info] Start training from score -0.653926
                               Accuracy  Balanced Accuracy ROC AUC  F1 Score  \
Model                                                                          
RandomForestClassifier             0.81               0.82    None      0.81   
BaggingClassifier                  0.78               0.78    None      0.78   
ExtraTreesClassifier               0.77               0.77    None      0.77   
LGBMClassifier                     0.76               0.76    None      0.76   
DecisionTreeClassifier             0.73               0.73    None      0.73   
ExtraTreeClassifier                0.72               0.70    None      0.72   
AdaBoostClassifie




### Result
See the comparison of each model based on each metrics

In [23]:
models

Unnamed: 0_level_0,Accuracy,Balanced Accuracy,ROC AUC,F1 Score,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
RandomForestClassifier,0.81,0.82,,0.81,0.21
BaggingClassifier,0.78,0.78,,0.78,0.05
ExtraTreesClassifier,0.77,0.77,,0.77,0.24
LGBMClassifier,0.76,0.76,,0.76,0.17
DecisionTreeClassifier,0.73,0.73,,0.73,0.02
ExtraTreeClassifier,0.72,0.7,,0.72,0.02
AdaBoostClassifier,0.62,0.62,,0.63,0.14
LogisticRegression,0.66,0.61,,0.65,0.01
Perceptron,0.61,0.61,,0.61,0.01
NearestCentroid,0.56,0.6,,0.57,0.01
