<a href="https://colab.research.google.com/github/franciscosalido/AIML/blob/master/Bank_Personal_Loan_Campaign.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Thera Bank Personal Loan Campaign**

## Data Description:

> The  file  Bank.xls  contains  data  on  5000  customers.  The  data  include  customer  demographic information (age, income, etc.), the customer's relationship with the bank (mortgage, securities account,  etc.), and the  customer  response to  the  last  personal  loan  campaign  (Personal  Loan). Among these 5000 customers, only 480 (= 9.6%) accepted the personal loan that was offered to them in the earlier campaign.

## Domain:

> Banking

## Context:

> This case is about a bank (Thera Bank) whose management wants to explore ways of converting its  liability  customers  to  personal  loan  customers  (while  retaining  them  as  depositors).  A campaign that the bank ran last year for liability customers showed a healthy conversion rate of over9%  success.  This  has  encouraged  the  retail  marketing  department  to  devise  campaigns with better target marketing to increase the success ratio with a minimal budget.

##Attribute Information:

> > 
* ID: Customer ID 
* Age: Customer's age in completed years
* Experience: # years of professional experience
* Income: Annual income of the customer
* ZIP Code: Home Address ZIPcode.
* Family: Family size of thecustomer
* CCAvg: Avg. spending on credit cards per month
* Education: Education Level: 
 * 1: Undergrad; 
 * 2: Graduate;
 * 3:Advanced/Professional.
* Mortgage: Value of house mortgage if any.
* Personal Loan: Did this customer accept the personal loan offered in the last campaign?
* Securities Account: Does the customer have a securities account with the bank?
* CD Account: Does the customer have a certificate of deposit (CD) account with the bank?
* Online: Does the customer use internet banking facilities?
* Credit card: Does the customer use a credit card is sued by the bank?






---

## Learning Outcomes:

> >
* Exploratory Data Analysis (EDA)
* Preparing the data to train a model
* Training and making predictions using a classification model
* Model evaluation Objective: The classification goal is to predict the likelihood of a liability customer buying personal loans.



## Steps and tasks:


1.   Import the datasets and libraries, check datatype, statistical summary,shape,null values or incorrect imputation. (5 marks)


2.   EDA:  Study  the  data  distribution  in  each  attribute  and  target  variable,  share  your  findings (20 marks) 
* Number of unique in each column?
* Number of people with zero mortgage? 
* Number of people with zero credit card spending per month? 
* Value counts of all categorical columns. 
* Univariate and Bivariate 
* Get data model ready



3.   Split the data into training and test seting the ratio of 70:30 respectively (5marks)


4.  Use Logistic Regression model to predict the number of a customer buying personal loans. Print all the metrics related for evaluating the model performance (15marks)


5.  Give your reasoning on how can the model perform better? (10 marks)
>> 
Hint: Check parameter


6.  Give Business understanding of your model? (5 marks)

##### 0. Preface

In [0]:
#!/usr/bin/python
# Date: 2020/04/25
# Code by Francisco Arruda Salido
# Version: 1.0.0

##### 1.a. Import the libraries

In [3]:
# import structures and data analysis libraries
import io
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# automaticly render any figure in a notebook made with this library
%matplotlib inline

  import pandas.util.testing as tm


In [0]:
# import machine learning libraries
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import RFE
from sklearn import metrics
from sklearn.metrics import classification_report, confusion_matrix

##### 1.b. Import the dataset

In [5]:
# Mount the google drive and set the path to the dataset
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


In [6]:
root_path = '/content/drive/My\ Drive/AIML/BankLoanCampaign/'  #set a dir to the project folder
!ls /content/drive/My\ Drive/AIML/BankLoanCampaign/ #change dir to the project folder

Bank_Personal_Loan_Modelling.csv


In [0]:
data = pd.read_csv('/content/drive/My Drive/AIML/BankLoanCampaign/Bank_Personal_Loan_Modelling.csv') #import the dataset Data.csv from specific path

###### 1.b Housekipping

In [0]:
np.set_printoptions(precision=3, suppress=True) # Make numpy values easier to read.

pd.set_option("display.precision", 3) # Use 3 decimal places in output display

pd.set_option("display.expand_frame_repr", False) # Don't wrap repr(DataFrame) across additional lines

pd.set_option("display.max_rows", 25) # Set max rows displayed in output to 25

##### 1.c Check Datatype

In [9]:
data.head()

Unnamed: 0,ID,Age,Experience,Income,ZIP Code,Family,CCAvg,Education,Mortgage,Personal Loan,Securities Account,CD Account,Online,CreditCard
0,1,25,1,49,91107,4,1.6,1,0,0,1,0,0,0
1,2,45,19,34,90089,3,1.5,1,0,0,1,0,0,0
2,3,39,15,11,94720,1,1.0,1,0,0,0,0,0,0
3,4,35,9,100,94112,1,2.7,2,0,0,0,0,0,0
4,5,35,8,45,91330,4,1.0,2,0,0,0,0,0,1


In [10]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 14 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   ID                  5000 non-null   int64  
 1   Age                 5000 non-null   int64  
 2   Experience          5000 non-null   int64  
 3   Income              5000 non-null   int64  
 4   ZIP Code            5000 non-null   int64  
 5   Family              5000 non-null   int64  
 6   CCAvg               5000 non-null   float64
 7   Education           5000 non-null   int64  
 8   Mortgage            5000 non-null   int64  
 9   Personal Loan       5000 non-null   int64  
 10  Securities Account  5000 non-null   int64  
 11  CD Account          5000 non-null   int64  
 12  Online              5000 non-null   int64  
 13  CreditCard          5000 non-null   int64  
dtypes: float64(1), int64(13)
memory usage: 547.0 KB


##### 1.d Statistical Summary

In [0]:
# Creating a copy of the Data for manipulation
df = data.copy()

In [12]:
df = df.drop('ZIP Code', axis=1) # drop the categorical column 'ZIP Code'
df.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
ID,5000.0,2500.5,1443.52,1.0,1250.75,2500.5,3750.25,5000.0
Age,5000.0,45.338,11.463,23.0,35.0,45.0,55.0,67.0
Experience,5000.0,20.105,11.468,-3.0,10.0,20.0,30.0,43.0
Income,5000.0,73.774,46.034,8.0,39.0,64.0,98.0,224.0
Family,5000.0,2.396,1.148,1.0,1.0,2.0,3.0,4.0
CCAvg,5000.0,1.938,1.748,0.0,0.7,1.5,2.5,10.0
Education,5000.0,1.881,0.84,1.0,1.0,2.0,3.0,3.0
Mortgage,5000.0,56.499,101.714,0.0,0.0,0.0,101.0,635.0
Personal Loan,5000.0,0.096,0.295,0.0,0.0,0.0,0.0,1.0
Securities Account,5000.0,0.104,0.306,0.0,0.0,0.0,0.0,1.0


In [13]:
#  Rearranging the columns orders for bringing the "Personal Loan " to the end of the dataframe.
personal_loan = df['Personal Loan']
df.drop(['Personal Loan'], axis = 1,inplace = True)
df['Personal Loan'] = personal_loan
df.head()

Unnamed: 0,ID,Age,Experience,Income,Family,CCAvg,Education,Mortgage,Securities Account,CD Account,Online,CreditCard,Personal Loan
0,1,25,1,49,4,1.6,1,0,1,0,0,0,0
1,2,45,19,34,3,1.5,1,0,1,0,0,0,0
2,3,39,15,11,1,1.0,1,0,0,0,0,0,0
3,4,35,9,100,1,2.7,2,0,0,0,0,0,0
4,5,35,8,45,4,1.0,2,0,0,0,0,1,0


1. Shape

In [14]:
df.shape # check the shape after droping and rearrangig columns

(5000, 13)

###### 1.e Check for invalid values

In [15]:
df.isnull().sum()

ID                    0
Age                   0
Experience            0
Income                0
Family                0
CCAvg                 0
Education             0
Mortgage              0
Securities Account    0
CD Account            0
Online                0
CreditCard            0
Personal Loan         0
dtype: int64

In [16]:
df.isnull().sum().any()

False

In [17]:
# Experience can't be a negative values
df[df['Experience'] <= -1]['Experience'].count()

52

In [0]:
# Get the numeric data from the 'Experience' column and replace just the negatives value by the Mode() function (Median() will be afected by the negative values) 

experience_mode = df['Experience'].mode()

df['Experience'].replace(to_replace= df['Experience'][(df['Experience'] < 0)],value = experience_mode,inplace = True )

In [40]:
# Checking the new distribution in 'Experience'
df.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
ID,5000.0,2500.5,1443.52,1.0,1250.75,2500.5,3750.25,5000.0
Age,5000.0,45.338,11.463,23.0,35.0,45.0,55.0,67.0
Experience,4948.0,20.331,11.312,0.0,10.75,20.0,30.0,43.0
Income,5000.0,73.774,46.034,8.0,39.0,64.0,98.0,224.0
Family,5000.0,2.396,1.148,1.0,1.0,2.0,3.0,4.0
CCAvg,5000.0,1.938,1.748,0.0,0.7,1.5,2.5,10.0
Education,5000.0,1.881,0.84,1.0,1.0,2.0,3.0,3.0
Mortgage,5000.0,56.499,101.714,0.0,0.0,0.0,101.0,635.0
Securities Account,5000.0,0.104,0.306,0.0,0.0,0.0,0.0,1.0
CD Account,5000.0,0.06,0.238,0.0,0.0,0.0,0.0,1.0


In [41]:
# Cheking again if 'Experience' has any negative values
df[df['Experience'] <= -1]['Experience'].count()

0

### 2. Exploratory Data Analysis:

In [50]:
df.apply(pd.Series.nunique)

ID                    5000
Age                     45
Experience              44
Income                 162
Family                   4
CCAvg                  108
Education                3
Mortgage               347
Securities Account       2
CD Account               2
Online                   2
CreditCard               2
Personal Loan            2
dtype: int64