## Portages Bank Telemarketing Analysis
<p>Part 1</p>

### Main objective:
<p>Predict customers' response to its telemarketing campaign and establish
a target customer profile for future marketing plans.</p>

<p>We'll do this by analyzing customer features, such as demographics and transaction history, the bank
will be able to predict customer saving behaviours and identify which type of customers
is more likely to make term deposits.</p>

### Loading the data

In [26]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
import os

file = 'bank-full.csv'
dataset1 = pd.read_csv(file, sep = ';')

In [27]:
dataset1.shape

(45211, 17)

<p>There are 45,211 observation in this dataset. Each represent an existing customer that the
bank reached via a phone call. For each observation the dataset records 16 input variables that are
both qualitative and quantitative attributes</p>
<p>There is a single binary output variable the denotes 'yes' or 'no' showing the outcome of the 
phone calls.</p>

In [28]:
dataset1.head()

Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,y
0,58,management,married,tertiary,no,2143,yes,no,unknown,5,may,261,1,-1,0,unknown,no
1,44,technician,single,secondary,no,29,yes,no,unknown,5,may,151,1,-1,0,unknown,no
2,33,entrepreneur,married,secondary,no,2,yes,yes,unknown,5,may,76,1,-1,0,unknown,no
3,47,blue-collar,married,unknown,no,1506,yes,no,unknown,5,may,92,1,-1,0,unknown,no
4,33,unknown,single,unknown,no,1,no,no,unknown,5,may,198,1,-1,0,unknown,no


In [29]:
dataset1.columns

Index(['age', 'job', 'marital', 'education', 'default', 'balance', 'housing',
       'loan', 'contact', 'day', 'month', 'duration', 'campaign', 'pdays',
       'previous', 'poutcome', 'y'],
      dtype='object')

### Cleaning the data

<p>There is no missing values in the dataset however, they are values like 'unknown', 'others', which
are like missing values. So, we removed the values from the dataset</p>

In [30]:
#Delete the rows with columns 'poutcome' contains 'other'
condition = dataset1.poutcome == 'other'
dataset2 = dataset1.drop(dataset1[condition].index, axis=0, inplace=False)


In [31]:
#Replace 'unknown' in job and education with 'other'
dataset2[['job', 'education']] = dataset2[['job', 'education']].replace(['unknown'], 'other')

### Drop outliers in the column 'balance'

<p>To capture general trends in the dataset, outliners in the column "balance" are dropped.
Outliers are defined as values which are more than 3 standard deviations away from the mean.</p>

In [35]:
from scipy.stats import zscore

dataset2[['balance']].mean()
dataset2[['balance']].mean()

dataset2['balance_outliers'] = dataset2['balance']
dataset2['balance_outliers'] = zscore(dataset2['balance_outliers'])

condition1 = (dataset2['balance_outliers']>3) | (dataset2['balance_outliers']<-3)
dataset3 = dataset2.drop(dataset2[condition1].index, axis=0, inplace=False)

In [36]:
dataset4 = dataset3.drop('balance_outliers', axis=1)

### Create and transforming data

In [37]:
#Change column name: 'y' to 'response'
dataset4.rename(index=str, columns={'y': 'response'}, inplace=True)

def convert(dataset4, new_column, old_column):
    dataset4[new_column] = dataset4[old_column].apply(lambda x: 0 if x == 'no' else 1)
    return dataset4[new_column].value_counts()

convert(dataset4, "response_binary", "response")

0    37785
1     4870
Name: response_binary, dtype: int64