## A Logistic Regression Model for Predicting the Propensity of Term Deposit Purchases in a Bank

In this exercise, we will build a logistic regression model, which will be used for predicting the propensity of term deposit purchases. This exercise will have three parts. The first part will be the preprocessing of the data, the second part will deal with the training process, and the last part will be spent on prediction, analysis of metrics, and deriving strategies for further improvement of the model.

### Data Preprocessing

In [1]:
#1
import pandas as pd
import altair as alt
bankData = pd.read_csv('bank-data-set.csv', sep=";")

In [2]:
#2 load library functions and data
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

In [3]:
#3 Find the data types
bankData.dtypes

age           int64
job          object
marital      object
education    object
default      object
balance       int64
housing      object
loan         object
contact      object
day           int64
month        object
duration      int64
campaign      int64
pdays         int64
previous      int64
poutcome     object
y            object
dtype: object

In [4]:
#4 Converting all categorical variables to dummy variables
bankCat = pd.get_dummies(bankData[['job', 'marital', 'education', 'default', 'housing', 'loan', 'contact', 'month', 'poutcome']])
bankCat.shape

(45211, 44)

In [5]:
#5 Separate the numerical variables
bankNum = bankData[['age', 'balance', 'day', 'duration', 'campaign', 'pdays', 'previous']]
bankNum.shape

(45211, 7)

In [6]:
#6 Preparing the X variables
X = pd.concat([bankCat, bankNum], axis=1)
print(X.shape)
# Preparing Y variable
Y = bankData['y']
print(Y.shape)
X.head()

(45211, 51)
(45211,)


Unnamed: 0,job_admin.,job_blue-collar,job_entrepreneur,job_housemaid,job_management,job_retired,job_self-employed,job_services,job_student,job_technician,...,poutcome_other,poutcome_success,poutcome_unknown,age,balance,day,duration,campaign,pdays,previous
0,0,0,0,0,1,0,0,0,0,0,...,0,0,1,58,2143,5,261,1,-1,0
1,0,0,0,0,0,0,0,0,0,1,...,0,0,1,44,29,5,151,1,-1,0
2,0,0,1,0,0,0,0,0,0,0,...,0,0,1,33,2,5,76,1,-1,0
3,0,1,0,0,0,0,0,0,0,0,...,0,0,1,47,1506,5,92,1,-1,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,1,33,1,5,198,1,-1,0


In [7]:
#7 Splitting the data into train and test sets:
X_train,X_test,y_train,y_test = train_test_split(X,Y, test_size=0.3, random_state=123)