# Credit Card Applications

Presented by,

Amal Joseph

Joel Jose

## Introduction

Financial institutions are flooded with credit card applications.
Several applications are turned down due to reasons like elevated loan balances, insufficient income, or excessive credit report checks.
The traditional manual review of these applications is repetitive, susceptible to mistakes, and highly time-consuming.


## Challenge
The manual review process isn't economical, as time is an asset.

## Proposed Solution:
We can leverage the capabilities of Machine Learning to automate this process, thereby enhancing efficiency and precision.


## Project Objective:

The goal of this project is to construct an automated credit card approval prediction model using advanced machine learning methods, mirroring the systems used in real-life banking.


## STEPS Performed
1. Load and review the dataset which we'll find contains a mix of numerical and non-numerical features, values across a wide range, and a number of missing entries.

2. Preprocess data to handle missing entries and normalizing numerical values, and encoding categorical ones.

3. Perform exploratory data analysis to understand trends for better understand the underlying trends and relationships, which will help us form meaningful insights.

4. Build a machine learning model to predict credit card application outcomes.

## DATASET

There are 2 datasets will be used in this notebook:

Application record (contains general information about applicant, such as applicant gender, DOB, education type, assets that applicant had, etc.)

Credit record (contains applicant's loan payment records)

## Content and Explanation Application Dataset
ID	- Client number	

CODE_GENDER	- Gender

FLAG_OWN_CAR- Is there a car	

FLAG_OWN_REALTY - Is there a property	

CNT_CHILDREN - Number of children	

AMT_INCOME_TOTAL- Annual income	

NAME_INCOME_TYPE - Income category	

NAME_EDUCATION_TYPE- Education level	

NAME_FAMILY_STATUS - Marital status	

NAME_HOUSING_TYPE- Way of living	

DAYS_BIRTH	Birthday - Count backwards from current day (0), -1 means yesterday

DAYS_EMPLOYED -  Start date of employment	Count backwards from current day(0). If positive, it means the person currently unemployed.

FLAG_MOBIL- Is there a mobile phone	

FLAG_WORK_PHONE- Is there a work phone	

FLAG_PHONE- Is there a phone	

FLAG_EMAIL- Is there an email	

OCCUPATION_TYPE- Occupation	

CNT_FAM_MEMBERS- Family size

## Content and Explanation
Credit record dataset

ID - Client number

MONTHS_BALANCE - Record month- The month of the extracted data is the starting point, backwards, 0 is the current month, -1 is the previous month, and so on

STATUS- Status - 0: 1-29 days past due 

1: 30-59 days past due 

2: 60-89 days overdue 

3: 90-119 days overdue 

4: 120-149 days overdue 5: Overdue or bad debts, write-offs for more than 150 days C: paid off that month X: No loan for the month

### Loading Data

In [1]:
#import panda
import pandas as pd
import numpy as np

In [2]:
#load dataset
df=pd.read_csv("C:/Users/joelj/Downloads/application_record.csv.zip",header=0)
df.head()

Unnamed: 0,ID,CODE_GENDER,FLAG_OWN_CAR,FLAG_OWN_REALTY,CNT_CHILDREN,AMT_INCOME_TOTAL,NAME_INCOME_TYPE,NAME_EDUCATION_TYPE,NAME_FAMILY_STATUS,NAME_HOUSING_TYPE,DAYS_BIRTH,DAYS_EMPLOYED,FLAG_MOBIL,FLAG_WORK_PHONE,FLAG_PHONE,FLAG_EMAIL,OCCUPATION_TYPE,CNT_FAM_MEMBERS
0,5008804,M,Y,Y,0,427500.0,Working,Higher education,Civil marriage,Rented apartment,-12005,-4542,1,1,0,0,,2.0
1,5008805,M,Y,Y,0,427500.0,Working,Higher education,Civil marriage,Rented apartment,-12005,-4542,1,1,0,0,,2.0
2,5008806,M,Y,Y,0,112500.0,Working,Secondary / secondary special,Married,House / apartment,-21474,-1134,1,0,0,0,Security staff,2.0
3,5008808,F,N,Y,0,270000.0,Commercial associate,Secondary / secondary special,Single / not married,House / apartment,-19110,-3051,1,0,1,1,Sales staff,1.0
4,5008809,F,N,Y,0,270000.0,Commercial associate,Secondary / secondary special,Single / not married,House / apartment,-19110,-3051,1,0,1,1,Sales staff,1.0


In [3]:
df_rec=pd.read_csv("C:/Users/joelj/Downloads/credit_record.csv.zip",header=0)
df_rec.head()

Unnamed: 0,ID,MONTHS_BALANCE,STATUS
0,5001711,0,X
1,5001711,-1,0
2,5001711,-2,0
3,5001711,-3,0
4,5001712,0,C


In [4]:
df = df.merge(df_rec, how='inner', on=['ID'])

In [5]:
df.shape

(777715, 20)

In [6]:
def data_info(data):
    cols = []
    unique = []
    n_uniques = []
    dtypes = []
    nulls = []
    for col in data.columns:
        cols.append(col)
        dtypes.append(data[col].dtype)
        n_uniques.append(data[col].nunique())
        unique.append(data[col].unique())
        nulls.append(data[col].isna().sum())
        
    return pd.DataFrame({'Col' : cols , 'n_uniques' : n_uniques , 
                         'unique' :unique , 'dtypes' : dtypes , "NULLS" : nulls 
                        })

In [7]:
data_info(df)

Unnamed: 0,Col,n_uniques,unique,dtypes,NULLS
0,ID,36457,"[5008804, 5008805, 5008806, 5008808, 5008809, ...",int64,0
1,CODE_GENDER,2,"[M, F]",object,0
2,FLAG_OWN_CAR,2,"[Y, N]",object,0
3,FLAG_OWN_REALTY,2,"[Y, N]",object,0
4,CNT_CHILDREN,9,"[0, 1, 3, 2, 4, 5, 14, 19, 7]",int64,0
5,AMT_INCOME_TOTAL,265,"[427500.0, 112500.0, 270000.0, 283500.0, 13500...",float64,0
6,NAME_INCOME_TYPE,5,"[Working, Commercial associate, Pensioner, Sta...",object,0
7,NAME_EDUCATION_TYPE,5,"[Higher education, Secondary / secondary speci...",object,0
8,NAME_FAMILY_STATUS,5,"[Civil marriage, Married, Single / not married...",object,0
9,NAME_HOUSING_TYPE,6,"[Rented apartment, House / apartment, Municipa...",object,0


In [8]:
df.columns

Index(['ID', 'CODE_GENDER', 'FLAG_OWN_CAR', 'FLAG_OWN_REALTY', 'CNT_CHILDREN',
       'AMT_INCOME_TOTAL', 'NAME_INCOME_TYPE', 'NAME_EDUCATION_TYPE',
       'NAME_FAMILY_STATUS', 'NAME_HOUSING_TYPE', 'DAYS_BIRTH',
       'DAYS_EMPLOYED', 'FLAG_MOBIL', 'FLAG_WORK_PHONE', 'FLAG_PHONE',
       'FLAG_EMAIL', 'OCCUPATION_TYPE', 'CNT_FAM_MEMBERS', 'MONTHS_BALANCE',
       'STATUS'],
      dtype='object')

In [9]:
df.head()

Unnamed: 0,ID,CODE_GENDER,FLAG_OWN_CAR,FLAG_OWN_REALTY,CNT_CHILDREN,AMT_INCOME_TOTAL,NAME_INCOME_TYPE,NAME_EDUCATION_TYPE,NAME_FAMILY_STATUS,NAME_HOUSING_TYPE,DAYS_BIRTH,DAYS_EMPLOYED,FLAG_MOBIL,FLAG_WORK_PHONE,FLAG_PHONE,FLAG_EMAIL,OCCUPATION_TYPE,CNT_FAM_MEMBERS,MONTHS_BALANCE,STATUS
0,5008804,M,Y,Y,0,427500.0,Working,Higher education,Civil marriage,Rented apartment,-12005,-4542,1,1,0,0,,2.0,0,C
1,5008804,M,Y,Y,0,427500.0,Working,Higher education,Civil marriage,Rented apartment,-12005,-4542,1,1,0,0,,2.0,-1,C
2,5008804,M,Y,Y,0,427500.0,Working,Higher education,Civil marriage,Rented apartment,-12005,-4542,1,1,0,0,,2.0,-2,C
3,5008804,M,Y,Y,0,427500.0,Working,Higher education,Civil marriage,Rented apartment,-12005,-4542,1,1,0,0,,2.0,-3,C
4,5008804,M,Y,Y,0,427500.0,Working,Higher education,Civil marriage,Rented apartment,-12005,-4542,1,1,0,0,,2.0,-4,C


### Section 2: Data Processing 

1. Filled the missing Data.
2. Standardization of the input feature was performed because it consisted of various ranges. 
3. Undersampling due to data imbalance
4. Converting ouput data to 1 and 0.

In [10]:
# converting ouput to 1 and 0
df['STATUS'].unique()

array(['C', '1', '0', 'X', '5', '4', '3', '2'], dtype=object)

0: 1-29 days past due 1: 30-59 days past due 2: 60-89 days overdue 3: 90-119 days overdue 4: 120-149 days overdue 5: Overdue or bad debts, write-offs for more than 150 days C: paid off that month X: No loan for the month

The categories '2' to '5' are generally considered not good as they represent longer periods of overdue payments, demonstrating difficulties in timely repayment.

In [11]:
df['STATUS'].replace(['C' , 'X','0','1','2','3','4','5' ],[1,1,1,0,0,0,0,0], inplace=True)

In [12]:
# Missing Values
df.isnull().sum()

ID                          0
CODE_GENDER                 0
FLAG_OWN_CAR                0
FLAG_OWN_REALTY             0
CNT_CHILDREN                0
AMT_INCOME_TOTAL            0
NAME_INCOME_TYPE            0
NAME_EDUCATION_TYPE         0
NAME_FAMILY_STATUS          0
NAME_HOUSING_TYPE           0
DAYS_BIRTH                  0
DAYS_EMPLOYED               0
FLAG_MOBIL                  0
FLAG_WORK_PHONE             0
FLAG_PHONE                  0
FLAG_EMAIL                  0
OCCUPATION_TYPE        240048
CNT_FAM_MEMBERS             0
MONTHS_BALANCE              0
STATUS                      0
dtype: int64

In [13]:
df['OCCUPATION_TYPE'].unique()

array([nan, 'Security staff', 'Sales staff', 'Accountants', 'Laborers',
       'Managers', 'Drivers', 'Core staff', 'High skill tech staff',
       'Cleaning staff', 'Private service staff', 'Cooking staff',
       'Low-skill Laborers', 'Medicine staff', 'Secretaries',
       'Waiters/barmen staff', 'HR staff', 'Realty agents', 'IT staff'],
      dtype=object)

In [14]:
df['OCCUPATION_TYPE'].fillna('others',inplace=True)

In [15]:
df.isnull().sum()

ID                     0
CODE_GENDER            0
FLAG_OWN_CAR           0
FLAG_OWN_REALTY        0
CNT_CHILDREN           0
AMT_INCOME_TOTAL       0
NAME_INCOME_TYPE       0
NAME_EDUCATION_TYPE    0
NAME_FAMILY_STATUS     0
NAME_HOUSING_TYPE      0
DAYS_BIRTH             0
DAYS_EMPLOYED          0
FLAG_MOBIL             0
FLAG_WORK_PHONE        0
FLAG_PHONE             0
FLAG_EMAIL             0
OCCUPATION_TYPE        0
CNT_FAM_MEMBERS        0
MONTHS_BALANCE         0
STATUS                 0
dtype: int64

In [16]:
df[df.duplicated()].size

0

## Feature Engineering

In [17]:
from datetime import date,datetime, timedelta
import datetime
import matplotlib.pyplot as plt
import seaborn as sns
from imblearn.over_sampling import SMOTE
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA

#### Calculation of Age
Age can indirectly affect your credit score via the length of your credit history,

In [18]:
def calc_day_of_birth (day_num):
    today = date.today() 
    birthDay = (today + timedelta(days=day_num)).strftime('%Y-%m-%d')
    return birthDay

In [19]:
df['BIRTH_DAY']   = df['DAYS_BIRTH'].apply(calc_day_of_birth)

In [20]:
def calculate_age(born):
    born = datetime.datetime.strptime(born, '%Y-%m-%d')
    today = date.today()
    return today.year - born.year - ((today.month, today.day) < (born.month, born.day))

In [21]:
df['AGE'] = df['BIRTH_DAY'].apply(calculate_age)

In [22]:
## Dropped unnecessary Columns
df = df.drop(['ID','DAYS_BIRTH','MONTHS_BALANCE','DAYS_EMPLOYED','BIRTH_DAY','FLAG_MOBIL'],axis=1)

In [23]:
df.head()

Unnamed: 0,CODE_GENDER,FLAG_OWN_CAR,FLAG_OWN_REALTY,CNT_CHILDREN,AMT_INCOME_TOTAL,NAME_INCOME_TYPE,NAME_EDUCATION_TYPE,NAME_FAMILY_STATUS,NAME_HOUSING_TYPE,FLAG_WORK_PHONE,FLAG_PHONE,FLAG_EMAIL,OCCUPATION_TYPE,CNT_FAM_MEMBERS,STATUS,AGE
0,M,Y,Y,0,427500.0,Working,Higher education,Civil marriage,Rented apartment,1,0,0,others,2.0,1,32
1,M,Y,Y,0,427500.0,Working,Higher education,Civil marriage,Rented apartment,1,0,0,others,2.0,1,32
2,M,Y,Y,0,427500.0,Working,Higher education,Civil marriage,Rented apartment,1,0,0,others,2.0,1,32
3,M,Y,Y,0,427500.0,Working,Higher education,Civil marriage,Rented apartment,1,0,0,others,2.0,1,32
4,M,Y,Y,0,427500.0,Working,Higher education,Civil marriage,Rented apartment,1,0,0,others,2.0,1,32


In [24]:
df['STATUS'].value_counts()

STATUS
1    766140
0     11575
Name: count, dtype: int64

In [25]:
from sklearn import preprocessing

In [26]:
column_data = ["CODE_GENDER", "FLAG_OWN_CAR", "FLAG_OWN_REALTY", "NAME_INCOME_TYPE", "NAME_EDUCATION_TYPE", "NAME_FAMILY_STATUS", "NAME_HOUSING_TYPE"]
for col in column_data:
    label = preprocessing.LabelEncoder()
    df[col] = label.fit_transform(df[col].values)

In [27]:
df.head()

Unnamed: 0,CODE_GENDER,FLAG_OWN_CAR,FLAG_OWN_REALTY,CNT_CHILDREN,AMT_INCOME_TOTAL,NAME_INCOME_TYPE,NAME_EDUCATION_TYPE,NAME_FAMILY_STATUS,NAME_HOUSING_TYPE,FLAG_WORK_PHONE,FLAG_PHONE,FLAG_EMAIL,OCCUPATION_TYPE,CNT_FAM_MEMBERS,STATUS,AGE
0,1,1,1,0,427500.0,4,1,0,4,1,0,0,others,2.0,1,32
1,1,1,1,0,427500.0,4,1,0,4,1,0,0,others,2.0,1,32
2,1,1,1,0,427500.0,4,1,0,4,1,0,0,others,2.0,1,32
3,1,1,1,0,427500.0,4,1,0,4,1,0,0,others,2.0,1,32
4,1,1,1,0,427500.0,4,1,0,4,1,0,0,others,2.0,1,32


In [28]:
df = pd.get_dummies(df, drop_first=True, columns=['OCCUPATION_TYPE'])
df.head()

Unnamed: 0,CODE_GENDER,FLAG_OWN_CAR,FLAG_OWN_REALTY,CNT_CHILDREN,AMT_INCOME_TOTAL,NAME_INCOME_TYPE,NAME_EDUCATION_TYPE,NAME_FAMILY_STATUS,NAME_HOUSING_TYPE,FLAG_WORK_PHONE,...,OCCUPATION_TYPE_Low-skill Laborers,OCCUPATION_TYPE_Managers,OCCUPATION_TYPE_Medicine staff,OCCUPATION_TYPE_Private service staff,OCCUPATION_TYPE_Realty agents,OCCUPATION_TYPE_Sales staff,OCCUPATION_TYPE_Secretaries,OCCUPATION_TYPE_Security staff,OCCUPATION_TYPE_Waiters/barmen staff,OCCUPATION_TYPE_others
0,1,1,1,0,427500.0,4,1,0,4,1,...,False,False,False,False,False,False,False,False,False,True
1,1,1,1,0,427500.0,4,1,0,4,1,...,False,False,False,False,False,False,False,False,False,True
2,1,1,1,0,427500.0,4,1,0,4,1,...,False,False,False,False,False,False,False,False,False,True
3,1,1,1,0,427500.0,4,1,0,4,1,...,False,False,False,False,False,False,False,False,False,True
4,1,1,1,0,427500.0,4,1,0,4,1,...,False,False,False,False,False,False,False,False,False,True


In [29]:
x = df.drop(['STATUS'], axis=True)
y = df['STATUS']

In [30]:
# data standarization 
sc=preprocessing.StandardScaler()
x_scaled = sc.fit_transform(x)

In [31]:
pca = PCA()
pct = pca.fit_transform(x_scaled)

PCA is commonly used for dimensionality reduction in machine learning, by transforming a large set of variables into a smaller one that still contains most of the information in the large set. The goal of PCA is to reduce the dimensionality of the dataset while retaining as much variance as possible.

Dimensionality refers to the number of features (also known as variables, attributes, or columns in a dataset) that the data contains.High-dimensional data can pose challenges because as the number of features increases, the volume of the data increases exponentially, which can make analyses more computationally expensive and less intuitive. This is known as the "curse of dimensionality".

In [32]:
from imblearn.under_sampling import RandomUnderSampler
from collections import Counter
undersample = RandomUnderSampler(random_state=0)
X, y = undersample.fit_resample(x_scaled, y)

The result of this operation will be that X and y are now undersampled. This means that the RandomUnderSampler function has reduced the number of observations in the majority class to make it equal to the number of observations in the minority class, improving balance between classes.

In [33]:
counter = Counter(y)
print(counter)
for k,v in counter.items():
    per = v / len(y) * 100
    print('Class=%d, n=%d (%.2f%%)' % (k, v, per))

Counter({0: 11575, 1: 11575})
Class=0, n=11575 (50.00%)
Class=1, n=11575 (50.00%)


## Model Creation

### Logistic Regression

In [34]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_score,accuracy_score,recall_score,f1_score

In [35]:
x_train ,x_test ,y_train,y_test = train_test_split(X,y,test_size=.3,random_state=42)

In [36]:
lr = LogisticRegression()
lr.fit(x_train,y_train)

In [37]:
y_pred_lr      = lr.predict(x_test)
y_pred_train = lr.predict(x_train)

In [38]:

print("Precision   using LG  on test Data  : {:.2f} %".format(np.round(precision_score(y_test,y_pred_lr),4)*100))
print("Recall      using LG  on test Data  : {:.2f} %".format(np.round(recall_score(y_test,y_pred_lr),4)*100))
print("Accuracy     using LG  on test Data  : {:.2f} %".format(np.round(accuracy_score(y_test, y_pred_lr),4)*100))
print("F1_score     using LG  on test Data  : {:.2f} %".format(np.round(f1_score(y_test, y_pred_lr),4)*100))

Precision   using LG  on test Data  : 54.79 %
Recall      using LG  on test Data  : 59.00 %
Accuracy     using LG  on test Data  : 55.02 %
F1_score     using LG  on test Data  : 56.82 %


### Random Forest Classifier

In [39]:
from sklearn.ensemble import RandomForestClassifier
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
rf_classifier.fit(x_train, y_train)
y_pred_rf = rf_classifier.predict(x_test)


In [40]:

print("Precision   using LG  on test Data  : {:.2f} %".format(np.round(precision_score(y_test,y_pred_rf),4)*100))
print("Recall      using LG  on test Data  : {:.2f} %".format(np.round(recall_score(y_test,y_pred_rf),4)*100))
print("Accuracy     using LG  on test Data  : {:.2f} %".format(np.round(accuracy_score(y_test, y_pred_rf),4)*100))
print("F1_score     using LG  on test Data  : {:.2f} %".format(np.round(f1_score(y_test, y_pred_rf),4)*100))

Precision   using LG  on test Data  : 81.02 %
Recall      using LG  on test Data  : 77.69 %
Accuracy     using LG  on test Data  : 79.68 %
F1_score     using LG  on test Data  : 79.32 %


In [41]:
import xgboost as xgb
xgb_classifier = xgb.XGBClassifier(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)
xgb_classifier.fit(x_train, y_train)
y_pred_xgb = xgb_classifier.predict(x_test)


In [42]:
print("Precision   using LG  on test Data  : {:.2f} %".format(np.round(precision_score(y_test,y_pred_xgb),4)*100))
print("Recall      using LG  on test Data  : {:.2f} %".format(np.round(recall_score(y_test,y_pred_xgb),4)*100))
print("Accuracy     using LG  on test Data  : {:.2f} %".format(np.round(accuracy_score(y_test, y_pred_xgb),4)*100))
print("F1_score     using LG  on test Data  : {:.2f} %".format(np.round(f1_score(y_test, y_pred_xgb),4)*100))

Precision   using LG  on test Data  : 60.54 %
Recall      using LG  on test Data  : 65.23 %
Accuracy     using LG  on test Data  : 61.24 %
F1_score     using LG  on test Data  : 62.80 %


Precision: Precision measures the accuracy of positive predictions made by the logistic regression model on the test data. In this case, it means that out of all the credit card approvals predicted by the model, 60.54% of them were actually creditworthy and posed low risk. A higher precision indicates that the model is good at correctly identifying creditworthy applicants and reducing false approvals.

Recall: Recall, also known as sensitivity or true positive rate, measures the ability of the logistic regression model to correctly capture positive instances (creditworthy applicants) from the total actual positive instances (all actual credit card approvals) in the test data. In this case, it means that the model correctly identified 65.23% of creditworthy applicants. A higher recall indicates that the model is effective at avoiding false rejections and not missing creditworthy applicants.

Accuracy: Accuracy represents the overall correctness of the logistic regression model's predictions on the test data. It measures the proportion of correct predictions (both true positives and true negatives) out of all predictions. In this case, the accuracy is 61.24%, indicating that the model's overall performance is moderate.

Analysis:

Random Forest achieved the highest precision (81.02%) among the three models. This means that it correctly identified 81.02% of the creditworthy applicants among all the applicants it predicted as creditworthy. It had the lowest number of false approvals, which is essential in credit risk approval to avoid lending to risky borrowers.

Random Forest also had the highest recall (77.69%). This indicates that it correctly identified 77.69% of the creditworthy applicants out of all the actual creditworthy applicants in the test data. It performed well in capturing creditworthy applicants and reducing false rejections.

XGBoost had relatively balanced precision (60.54%) and recall (65.23%) scores, making it a reasonable choice for a balanced approach between false approvals and false rejections.
Conclusions:

Based on the provided performance metrics, Random Forest performed the best in terms of precision and recall, making it a strong candidate for credit risk approval.

However, XGBoost also showed competitive performance with a balance between precision and recall. It may be a good option if you seek a more balanced approach to minimize both false approvals and false rejections.

Logistic Regression had lower precision and recall compared to the other two models. It might not be the best choice for this particular credit risk approval problem.

HyperParameter Tuning


In [43]:
from sklearn.model_selection import GridSearchCV

# Create a Random Forest classifier
rf_model = RandomForestClassifier()

# Define the hyperparameter space
param_grid = {
    'n_estimators': [50, 100, 150],
    'max_depth': [None, 5, 10, 20],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'max_features': ['auto', 'sqrt', 'log2']
}

# Create the GridSearchCV object
grid_search = GridSearchCV(estimator=rf_model, param_grid=param_grid, cv=5, scoring='f1')

# Fit the GridSearchCV object on the training data
grid_search.fit(x_train, y_train)

# Get the best hyperparameters
best_params = grid_search.best_params_

# Random Forest model with the best hyperparameters
best_rf_model = RandomForestClassifier(**best_params)

# Train the model on the entire training data
best_rf_model.fit(x_train, y_train)

# Evaluate the final model on the test data (optional)
test_accuracy = best_rf_model.score(x_test, y_test)

# You can also calculate precision, recall, and F1 score on the test set
from sklearn.metrics import precision_score, recall_score, f1_score

y_pred = best_rf_model.predict(x_test)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(


In [46]:
print("Precision   using LG  on test Data  : {:.2f} %".format(np.round(precision_score(y_test,y_pred),4)*100))
print("Recall      using LG  on test Data  : {:.2f} %".format(np.round(recall_score(y_test,y_pred),4)*100))
print("Accuracy     using LG  on test Data  : {:.2f} %".format(np.round(accuracy_score(y_test, y_pred),4)*100))
print("F1_score     using LG  on test Data  : {:.2f} %".format(np.round(f1_score(y_test, y_pred),4)*100))
print(best_params)

Precision   using LG  on test Data  : 81.34 %
Recall      using LG  on test Data  : 77.98 %
Accuracy     using LG  on test Data  : 79.99 %
F1_score     using LG  on test Data  : 79.62 %
{'max_depth': None, 'max_features': 'sqrt', 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 100}
