# Business Problem

SYL bank is one of Australia's largest banks. Currently, the loan applications which come in to their various branches are processed manually. The decision whether to grant a loan or not is subjective and due to a lot of applications coming in, it is getting harder for them to decide the loan grant status. Thus, they want to build an automated machine learning solution which will look at different factors and decide whether to grant loan or not to the respective individual.

In this ML problem, we will building a classification model as we have to predict if an applicant should get a loan or not. We will look at various factors of the applicant like credit score, past history and from those we will try to predict the loan granting status. We will also cleanse the data and fill in the missing values so that our ML model performs as expected. Thus we will be giving out a probability score along with Loan Granted or Loan Refused output from the model.

## Machine Learning Perspective

This business problem falls under:
- Supervised Learning
- Classification [ Binary ]

## Solution WorkFlow

- UNDERSTANDING PROBLEM STATEMENT
- SETTING UP THE WORKING ENVOIRNMENT
- DATA INGESTION / SOURCING
- EDA ( EXPLORATORY DATA ANALYSIS)
    - OUTLIER DETECTION
    - CORRELATIONS
    - MISSING VALUES
- DATA PREPARATION
- MODEL BUILDING
- MODEL EVALUATION
- FINALIZING THE MODEL AND SAVE IT
- PRODUCTIONIZATION (IN-GENERAL, NOT PERFORMED HERE)

### SETTING UP THE WORKING ENVOIRNMENT

In [3]:
# DATA MANIPULATION LIBRARIES
import pandas as pd
import numpy as np
import os

# data visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib import pyplot

# sklearn - final data preparation libraries
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.preprocessing import LabelBinarizer, StandardScaler, OrdinalEncoder
from sklearn import preprocessing

# imblance dataset preparation libraries
from imblearn.over_sampling import SMOTE

#model building libraries
from sklearn.linear_model import LogisticRegression, RidgeClassifier, PassiveAggressiveClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from xgboost import XGBClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import LinearSVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import BernoulliNB
from xgboost import plot_importance


# model metrics
from sklearn.metrics import confusion_matrix,roc_curve,roc_auc_score,classification_report
from sklearn import metrics

# statistics
import statistics
from scipy.stats import boxcox

# model saving
import joblib

### DATA SOURCING

In [5]:
# Reading the dataset
data = pd.read_csv('D:\\github\\1-DataSets\\loan eligibility prediction\\LoansTrainingSetV2.csv',
                  low_memory=False)