### Credit Score Machine Learning Model Using Lazy Predict

LazyPredict is a Python library designed to simplify and expedite the process of evaluating and comparing multiple machine learning models for a given dataset and prediction task. Lazypredict automates many of the common steps involved in machine learning model selection and evaluation.

In [None]:
!pip install lazypredict



### Importing Library and Datasets

We are trying to import all the library required, especially the lazypredict library. Besides, we also import the sklearn Pipeline to make efficient data processing. The datasets is collected from kaggle https://www.kaggle.com/datasets/parisrohan/credit-score-classification

In [None]:
import lazypredict
import pandas as pd
from lazypredict.Supervised import LazyClassifier
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer

In [None]:
data = pd.read_csv("/content/training_LULC.csv")
#data = data.head(25000)

In [None]:
data

Unnamed: 0.1,Unnamed: 0,Label,MBI,MNDWI,NDVI,SAVI
0,5,5,0.16,-0.00,0.08,0.12
1,5,5,0.16,0.06,0.02,0.04
2,5,5,0.15,0.05,0.04,0.06
3,5,5,0.15,-0.01,0.09,0.13
4,5,5,0.17,-0.01,0.07,0.11
...,...,...,...,...,...,...
75916,0,5,0.15,0.21,-0.05,-0.07
75917,0,5,0.14,0.23,-0.06,-0.10
75918,0,5,0.12,0.24,-0.04,-0.07
75919,0,5,0.08,0.32,-0.05,-0.07


In [None]:
import numpy as np
from google.colab import autoviz

def scatter_plots(df, colname_pairs, figscale=1, alpha=.8):
  from matplotlib import pyplot as plt
  plt.figure(figsize=(len(colname_pairs) * 6 * figscale, 6 * figscale))
  for plot_i, (x_colname, y_colname) in enumerate(colname_pairs, start=1):
    ax = plt.subplot(1, len(colname_pairs), plot_i)
    df.plot(kind='scatter', x=x_colname, y=y_colname, s=(32 * figscale), alpha=alpha, ax=ax)
    ax.spines[['top', 'right',]].set_visible(False)
  plt.tight_layout()
  return autoviz.MplChart.from_current_mpl_state()

chart = scatter_plots(data, *[[['Unnamed: 0', 'Label'], ['Label', 'MBI'], ['MBI', 'MNDWI'], ['MNDWI', 'NDVI']]], **{})
chart

In [None]:
data.shape

(75921, 6)

In [None]:
data["Label"].unique()

array([5, 1, 2, 3, 4])

In [None]:
data.describe()

Unnamed: 0.1,Unnamed: 0,Label,MBI,MNDWI,NDVI,SAVI
count,75921.0,75921.0,75921.0,75921.0,75921.0,75921.0
mean,0.1,3.09,0.14,-0.1,0.23,0.35
std,0.69,1.45,0.09,0.29,0.31,0.46
min,0.0,1.0,-0.46,-0.81,-1.0,-1.5
25%,0.0,2.0,0.09,-0.3,0.04,0.05
50%,0.0,3.0,0.14,-0.21,0.23,0.34
75%,0.0,4.0,0.18,0.09,0.45,0.68
max,5.0,5.0,1.35,0.99,1.0,1.5


In [None]:
data.columns

Index(['Unnamed: 0', 'Label', 'MBI', 'MNDWI', 'NDVI', 'SAVI'], dtype='object')

We will create a simple machine learning model. Therefore, we will only choose several parameters to build our model

In [None]:
data.isnull().sum()

Unnamed: 0    0
Label         0
MBI           0
MNDWI         0
NDVI          0
SAVI          0
dtype: int64

In [None]:
data.dtypes

Unnamed: 0      int64
Label           int64
MBI           float64
MNDWI         float64
NDVI          float64
SAVI          float64
dtype: object

### Let's create a pipeline for data processing

In [None]:
X = data.drop("Label", axis=1)
y = data["Label"]

In [None]:
numeric_features = ['MBI', 'MNDWI', 'NDVI', 'SAVI']

In [None]:
# Create a k-fold cross-validation object (StratifiedKFold for classification)
kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# Define the preprocessing steps
preprocessor = MinMaxScaler()

# Create the LazyClassifier instance
lazy_classifier = LazyClassifier(predictions=True)

In [None]:
pipeline = Pipeline([
    ('preprocessor', preprocessor),
    ('lazy_classifier', lazy_classifier)
])

In [None]:
# Fit and evaluate models using Lazy Predict with k-fold cross-validation
results = lazy_classifier.fit(X, y)

# Print the summary of model performance
print(results)

TypeError: ignored

In [None]:

# Create transformers for numeric and categorical columns
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='mean')),
    ('scaler', MinMaxScaler())
])



# Use ColumnTransformer to apply transformers to specific columns
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features)
    ])

# Create the final pipeline, including preprocessing and a classifier/regressor if needed
pipeline = Pipeline(steps=[('preprocessor', preprocessor)])

# Fit and transform the data
X_transformed = pipeline.fit_transform(X)

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X_transformed, y, test_size=0.2, random_state=42)

In [None]:
X_train

array([[0.51051707, 0.52662055, 0.46175349, 0.46174824],
       [0.5284809 , 0.55461532, 0.45888826, 0.45888212],
       [0.86175609, 0.08192424, 0.63413931, 0.63412322],
       ...,
       [0.78306672, 0.27589898, 0.49957376, 0.49956372],
       [0.40286859, 0.22906475, 0.9042714 , 0.90426096],
       [0.53738432, 0.60158037, 0.3883908 , 0.3883829 ]])

### Model Development

Using lazypredict, we will create various model in one single code

In [None]:
clf = LazyClassifier(verbose=0,ignore_warnings=True, custom_metric=None)
models,predictions = clf.fit(X_train, X_test, y_train, y_test)

print(models)

 90%|████████▉ | 26/29 [01:02<00:05,  1.86s/it]

You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1020
[LightGBM] [Info] Number of data points in the train set: 20000, number of used features: 4
[LightGBM] [Info] Start training from score -1.551641
[LightGBM] [Info] Start training from score -1.593319
[LightGBM] [Info] Start training from score -1.550462
[LightGBM] [Info] Start training from score -1.577939
[LightGBM] [Info] Start training from score -1.793962


100%|██████████| 29/29 [01:02<00:00,  2.17s/it]

                               Accuracy  Balanced Accuracy ROC AUC  F1 Score  \
Model                                                                          
RandomForestClassifier             0.98               0.98    None      0.98   
ExtraTreesClassifier               0.98               0.98    None      0.98   
KNeighborsClassifier               0.98               0.98    None      0.98   
SVC                                0.98               0.98    None      0.98   
BaggingClassifier                  0.98               0.98    None      0.98   
LGBMClassifier                     0.98               0.98    None      0.98   
QuadraticDiscriminantAnalysis      0.98               0.98    None      0.98   
ExtraTreeClassifier                0.97               0.98    None      0.97   
LabelPropagation                   0.97               0.98    None      0.97   
LogisticRegression                 0.97               0.97    None      0.97   
DecisionTreeClassifier             0.97 




In [None]:
models

Unnamed: 0_level_0,Accuracy,Balanced Accuracy,ROC AUC,F1 Score,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
RandomForestClassifier,0.98,0.98,,0.98,3.21
ExtraTreesClassifier,0.98,0.98,,0.98,0.85
KNeighborsClassifier,0.98,0.98,,0.98,0.31
SVC,0.98,0.98,,0.98,0.92
BaggingClassifier,0.98,0.98,,0.98,0.61
LGBMClassifier,0.98,0.98,,0.98,0.98
QuadraticDiscriminantAnalysis,0.98,0.98,,0.98,0.03
ExtraTreeClassifier,0.97,0.98,,0.97,0.03
LabelPropagation,0.97,0.98,,0.97,8.46
LogisticRegression,0.97,0.97,,0.97,0.38


In [None]:
import numpy as np
from google.colab import autoviz

def value_plot(df, y, figscale=1):
  from matplotlib import pyplot as plt
  df[y].plot(kind='line', figsize=(8 * figscale, 4 * figscale), title=y)
  plt.gca().spines[['top', 'right']].set_visible(False)
  plt.tight_layout()
  return autoviz.MplChart.from_current_mpl_state()

chart = value_plot(models, *['Accuracy'], **{})
chart