### Notebook for the Bank Marketing (Campaign) Project

Data Understanding Drive

In [1]:
# This cell is for importing relevant packages for data analysis
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns

Data Set Information:

The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be ('yes') or not ('no') subscribed.

There are four datasets:

1) bank-additional-full.csv with all examples (41188) and 20 inputs, ordered by date (from May 2008 to November 2010), very close to the data analyzed in [Moro et al., 2014]

2) bank-additional.csv with 10% of the examples (4119), randomly selected from 1), and 20 inputs.

3) bank-full.csv with all examples and 17 inputs, ordered by date (older version of this dataset with less inputs).

4) bank.csv with 10% of the examples and 17 inputs, randomly selected from 3 (older version of this dataset with less inputs).

The smallest datasets are provided to test more computationally demanding machine learning algorithms (e.g., SVM).

The classification goal is to predict if the client will subscribe (yes/no) a term deposit (variable y)

In [2]:
# Reading the bank-additional-full.csv dataset
bank_full = pd.read_csv('bank-additional-full.csv',sep=';')
bank_full.head()

Unnamed: 0,age,job,marital,education,default,housing,loan,contact,month,day_of_week,...,campaign,pdays,previous,poutcome,emp.var.rate,cons.price.idx,cons.conf.idx,euribor3m,nr.employed,y
0,56,housemaid,married,basic.4y,no,no,no,telephone,may,mon,...,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
1,57,services,married,high.school,unknown,no,no,telephone,may,mon,...,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
2,37,services,married,high.school,no,yes,no,telephone,may,mon,...,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
3,40,admin.,married,basic.6y,no,no,no,telephone,may,mon,...,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
4,56,services,married,high.school,no,no,yes,telephone,may,mon,...,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no


In [3]:
bank_full.shape 

(41188, 21)

In [4]:
h = open('bank-additional-names.txt','r')
for line in h: 
    print(line)
h.close()

﻿Citation Request:

  This dataset is publicly available for research. The details are described in [Moro et al., 2014]. 

  Please include this citation if you plan to use this database:



  [Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, In press, http://dx.doi.org/10.1016/j.dss.2014.03.001



  Available at: [pdf] http://dx.doi.org/10.1016/j.dss.2014.03.001

                [bib] http://www3.dsi.uminho.pt/pcortez/bib/2014-dss.txt



1. Title: Bank Marketing (with social/economic context)



2. Sources

   Created by: Sérgio Moro (ISCTE-IUL), Paulo Cortez (Univ. Minho) and Paulo Rita (ISCTE-IUL) @ 2014

   

3. Past Usage:



  The full dataset (bank-additional-full.csv) was described and analyzed in:



  S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems (2014), doi:10.1016/j.dss.2014.03.001.

 

4. Relevant I

In [5]:
# Summary of the data frame information
bank_full.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41188 entries, 0 to 41187
Data columns (total 21 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   age             41188 non-null  int64  
 1   job             41188 non-null  object 
 2   marital         41188 non-null  object 
 3   education       41188 non-null  object 
 4   default         41188 non-null  object 
 5   housing         41188 non-null  object 
 6   loan            41188 non-null  object 
 7   contact         41188 non-null  object 
 8   month           41188 non-null  object 
 9   day_of_week     41188 non-null  object 
 10  duration        41188 non-null  int64  
 11  campaign        41188 non-null  int64  
 12  pdays           41188 non-null  int64  
 13  previous        41188 non-null  int64  
 14  poutcome        41188 non-null  object 
 15  emp.var.rate    41188 non-null  float64
 16  cons.price.idx  41188 non-null  float64
 17  cons.conf.idx   41188 non-null 

### Business Understanding

A term deposit is a fixed-term investment that includes the deposit of money into an account at a financial institution. Term deposit investments usually carry short-term maturities ranging from one month to a few years and will have varying levels of required minimum deposits. 

Term deposits have significant advantages over normal deposits because financial institutions can use them for safe investments. With term deposits, banks can invest in businesses that have higher rates of return and profit returns.
Additionally, the bank will like to save costs through efficient and effective marketing campaigns


### Problem Description

ABC bank wants to sell term deposits to their customers, but they want to identify particular customers with a higher propensity to buy.

There's a need to identify these customers so they can focus their marketing campaigns efficiently and effectively.


### Data Cleaning & Transformation

In [6]:
# Examine NA Values
bank_full.isnull().sum() 

age               0
job               0
marital           0
education         0
default           0
housing           0
loan              0
contact           0
month             0
day_of_week       0
duration          0
campaign          0
pdays             0
previous          0
poutcome          0
emp.var.rate      0
cons.price.idx    0
cons.conf.idx     0
euribor3m         0
nr.employed       0
y                 0
dtype: int64

In [7]:
# Inspecting the age column
# Questions: 
# Is age a factor for buying term deposits? Yes
# Is the column relevant to our future analysis? Yes

In [8]:
# Checking for unusual date values
# Using summary statistics
bank_full.age.describe()

count    41188.00000
mean        40.02406
std         10.42125
min         17.00000
25%         32.00000
50%         38.00000
75%         47.00000
max         98.00000
Name: age, dtype: float64

In [9]:
# Checking there are other data types in this column other than integers
# bank_full.age.dtype
bank_full.age.unique() 

array([56, 57, 37, 40, 45, 59, 41, 24, 25, 29, 35, 54, 46, 50, 39, 30, 55,
       49, 34, 52, 58, 32, 38, 44, 42, 60, 53, 47, 51, 48, 33, 31, 43, 36,
       28, 27, 26, 22, 23, 20, 21, 61, 19, 18, 70, 66, 76, 67, 73, 88, 95,
       77, 68, 75, 63, 80, 62, 65, 72, 82, 64, 71, 69, 78, 85, 79, 83, 81,
       74, 17, 87, 91, 86, 98, 94, 84, 92, 89])

In [10]:
# Create a new column for age groups 
bins = [0,19,40,60,100]
age_group = ['Teenager','Adults','Mid-Aged','Old']
# Create a new column
bank_full['age_group'] = pd.cut(bank_full['age'],bins=bins,labels=age_group)
bank_full[['age','age_group']].head() 

Unnamed: 0,age,age_group
0,56,Mid-Aged
1,57,Mid-Aged
2,37,Adults
3,40,Adults
4,56,Mid-Aged


In [11]:
bank_full.head(2) 

Unnamed: 0,age,job,marital,education,default,housing,loan,contact,month,day_of_week,...,pdays,previous,poutcome,emp.var.rate,cons.price.idx,cons.conf.idx,euribor3m,nr.employed,y,age_group
0,56,housemaid,married,basic.4y,no,no,no,telephone,may,mon,...,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no,Mid-Aged
1,57,services,married,high.school,unknown,no,no,telephone,may,mon,...,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no,Mid-Aged


In [12]:
# Inspecting the job column
# Questions:
# Is the job column crucial to buying term deposits? Yes

In [13]:
# Checking for the unique values in the job column
bank_full.job.unique() 

array(['housemaid', 'services', 'admin.', 'blue-collar', 'technician',
       'retired', 'management', 'unemployed', 'self-employed', 'unknown',
       'entrepreneur', 'student'], dtype=object)

Doesn't seem to be anything unusual except the unknown value. I have to understand what it represents and label it accordingly

In [14]:
# Counting the distribution of each variable in the job column
bank_full['job'].value_counts()

admin.           10422
blue-collar       9254
technician        6743
services          3969
management        2924
retired           1720
entrepreneur      1456
self-employed     1421
housemaid         1060
unemployed        1014
student            875
unknown            330
Name: job, dtype: int64

In [15]:
# Encode the unknown values in the job column to others
# Remove the dot in admin.
bank_full['job'] = bank_full['job'].replace({'admin.':'admin','unknown':'others'})
bank_full['job'].value_counts()

admin            10422
blue-collar       9254
technician        6743
services          3969
management        2924
retired           1720
entrepreneur      1456
self-employed     1421
housemaid         1060
unemployed        1014
student            875
others             330
Name: job, dtype: int64

In [16]:
# Inspect the marital column
# Questions: 
# Can marital status influence buying term deposits? Yes

In [17]:
# Checking for the uniqueness of the value
bank_full.marital.unique() 

array(['married', 'single', 'divorced', 'unknown'], dtype=object)

In [18]:
# The marital column
bank_full['marital'].value_counts() 

married     24928
single      11568
divorced     4612
unknown        80
Name: marital, dtype: int64

Classification of legal marital status popularly falls in the following categories:

1. Married

2. Separated

3. Divorced

4. Single

5. Widowed

It is possible that unknown may account for widowed and or separated. It may also mean respondents are in civil union, co-habiting, or have refuse to disclose their status. 

Here, it will be advisable to replace it with others.

In [19]:
# Encode the unknown values in the marital column to others
bank_full['marital'] = bank_full['marital'].replace({'unknown':'others'})
bank_full['marital'].value_counts()

married     24928
single      11568
divorced     4612
others         80
Name: marital, dtype: int64

In [20]:
# Inspecting the Education column
# Is this column relevant to understanding customers
# and the decision they make? Yes

In [21]:
# The uniqueness of this column
bank_full.education.unique() 

array(['basic.4y', 'high.school', 'basic.6y', 'basic.9y',
       'professional.course', 'unknown', 'university.degree',
       'illiterate'], dtype=object)

In Portugal, Basic Education consists of nine years of schooling divided into three sequential cycles of education of four, two and three years.

It followed by secondary education, professional programmes, university/polytechniques

In [22]:
# The Education column
bank_full['education'].value_counts() 

university.degree      12168
high.school             9515
basic.9y                6045
professional.course     5243
basic.4y                4176
basic.6y                2292
unknown                 1731
illiterate                18
Name: education, dtype: int64

In [23]:
# Encode the educational levels to represent the Portugese educational system
bank_full['education'] = bank_full['education'].replace({'university.degree':'university/polytechnic',
                                                        'high.school':'secondary education',
                                                        'basic.4y':'1st basic',
                                                        'basic.6y':'2nd basic',
                                                        'basic.9y':'3rd basic',
                                                        'professional.course':'professional programmes'})
bank_full['education'].value_counts()

university/polytechnic     12168
secondary education         9515
3rd basic                   6045
professional programmes     5243
1st basic                   4176
2nd basic                   2292
unknown                     1731
illiterate                    18
Name: education, dtype: int64

In [24]:
# Inspecting the default column
# A default occurs when a borrower is unable to make timely payments, misses payments, 
# or avoids or stops making payments on interest or principal owed

In [25]:
# The unique values in the default columns
bank_full.default.unique() 

array(['no', 'unknown', 'yes'], dtype=object)

In [26]:
# The count for each category in the default column
bank_full['default'].value_counts()

no         32588
unknown     8597
yes            3
Name: default, dtype: int64

If only three customers were reported to have defaulted, that it seems unusual.

Generally, research show that there's generally regular default rate, especially on credit cards.

I don't think the default column is complete and consistent.

In [27]:
# Inspecting the housing column
# The housing column contains information on customers who have taken housing loans
bank_full.housing.unique() 

array(['no', 'yes', 'unknown'], dtype=object)

In [28]:
# The housing column
bank_full['housing'].value_counts() 

yes        21576
no         18622
unknown      990
Name: housing, dtype: int64

In [29]:
# Rename the column name from housing to housing loan
bank_full.rename(columns={'housing':'housing loan'},inplace=True)
bank_full.columns

Index(['age', 'job', 'marital', 'education', 'default', 'housing loan', 'loan',
       'contact', 'month', 'day_of_week', 'duration', 'campaign', 'pdays',
       'previous', 'poutcome', 'emp.var.rate', 'cons.price.idx',
       'cons.conf.idx', 'euribor3m', 'nr.employed', 'y', 'age_group'],
      dtype='object')

In [30]:
# The personal loan column.
# This column contains categorical data 
# Categories stating if a customer took a personal loan

In [31]:
# The personal loan column
bank_full['loan'].value_counts() 

no         33950
yes         6248
unknown      990
Name: loan, dtype: int64

In [32]:
# Rename the loan column to personal loan
bank_full.rename(columns={'loan':'personal loan'},inplace=True)
bank_full.columns

Index(['age', 'job', 'marital', 'education', 'default', 'housing loan',
       'personal loan', 'contact', 'month', 'day_of_week', 'duration',
       'campaign', 'pdays', 'previous', 'poutcome', 'emp.var.rate',
       'cons.price.idx', 'cons.conf.idx', 'euribor3m', 'nr.employed', 'y',
       'age_group'],
      dtype='object')

In [33]:
# Inspecting the contact column
# The contact column contains information on how respondents were contacted
bank_full.contact.unique() 

array(['telephone', 'cellular'], dtype=object)

In [34]:
# The contact column
bank_full['contact'].value_counts() 

cellular     26144
telephone    15044
Name: contact, dtype: int64

In [35]:
# I don't think this column is important
bank_full.drop(columns=['contact'],inplace=True)

In [36]:
# Inspect the month column
# This column contains information on the last month
# a customer was contacted

In [37]:
# The month column
bank_full['month'].value_counts() 

may    13769
jul     7174
aug     6178
jun     5318
nov     4101
apr     2632
oct      718
sep      570
mar      546
dec      182
Name: month, dtype: int64

In [38]:
# I don't see any relationship between month of contact and buying a term deposit
# I will drop this column
bank_full.drop(columns=['month'],inplace=True)
bank_full.columns 

Index(['age', 'job', 'marital', 'education', 'default', 'housing loan',
       'personal loan', 'day_of_week', 'duration', 'campaign', 'pdays',
       'previous', 'poutcome', 'emp.var.rate', 'cons.price.idx',
       'cons.conf.idx', 'euribor3m', 'nr.employed', 'y', 'age_group'],
      dtype='object')

In [39]:
# Inspecting the day of week column
# Contains information of the last day the customer was contacted

In [40]:
# The day_week column
bank_full['day_of_week'].value_counts() 

thu    8623
mon    8514
wed    8134
tue    8090
fri    7827
Name: day_of_week, dtype: int64

In [41]:
# I don't think this column necessary for the overall model
bank_full.drop(columns=['day_of_week'],inplace=True)
bank_full.columns

Index(['age', 'job', 'marital', 'education', 'default', 'housing loan',
       'personal loan', 'duration', 'campaign', 'pdays', 'previous',
       'poutcome', 'emp.var.rate', 'cons.price.idx', 'cons.conf.idx',
       'euribor3m', 'nr.employed', 'y', 'age_group'],
      dtype='object')

In [42]:
# Inspecting the duration column
# last contact duration, in seconds (numeric). 
# Important note: this attribute highly affects the output target (e.g., if duration=0 then y='no'). 
# Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. 
# Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to 
# have a realistic predictive model

In [43]:
# The duration column
bank_full.duration.unique() 

array([ 261,  149,  226, ..., 1246, 1556, 1868])

In [44]:
# I will drop this column based the initial advise
bank_full.drop(columns=['duration'],inplace=True)
bank_full.columns

Index(['age', 'job', 'marital', 'education', 'default', 'housing loan',
       'personal loan', 'campaign', 'pdays', 'previous', 'poutcome',
       'emp.var.rate', 'cons.price.idx', 'cons.conf.idx', 'euribor3m',
       'nr.employed', 'y', 'age_group'],
      dtype='object')

In [45]:
# Inspecting the campaign column
# Campaign contains number of contacts performed during the campaign
bank_full['campaign'].head()

0    1
1    1
2    1
3    1
4    1
Name: campaign, dtype: int64

In [46]:
# The unique values in the campaign column
bank_full.campaign.unique() 

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 19, 18, 23, 14,
       22, 25, 16, 17, 15, 20, 56, 39, 35, 42, 28, 26, 27, 32, 21, 24, 29,
       31, 30, 41, 37, 40, 33, 34, 43])

In [47]:
# I don't think this column is relevant
bank_full.drop(columns=['campaign'],inplace=True)
bank_full.columns

Index(['age', 'job', 'marital', 'education', 'default', 'housing loan',
       'personal loan', 'pdays', 'previous', 'poutcome', 'emp.var.rate',
       'cons.price.idx', 'cons.conf.idx', 'euribor3m', 'nr.employed', 'y',
       'age_group'],
      dtype='object')

In [48]:
# Inspecting the pdays column
# Number of days that passed by after the client was last contacted from 
# a previous campaign (numeric; 999 means client was not previously contacted)

In [49]:
# The pdays column
# 999 means client was not previously contacted
bank_full['pdays'].value_counts().head()

999    39673
3        439
6        412
4        118
9         64
Name: pdays, dtype: int64

In [50]:
bank_full.pdays.describe() 

count    41188.000000
mean       962.475454
std        186.910907
min          0.000000
25%        999.000000
50%        999.000000
75%        999.000000
max        999.000000
Name: pdays, dtype: float64

In [51]:
# Create a new column for pdays_groups 
bins = [0,500,999]
pdays_group = ['Contacted','Never Contacted']
# Create a new column
bank_full['pdays_group'] = pd.cut(bank_full['pdays'],bins=bins,labels=pdays_group)
bank_full[['pdays','pdays_group']].head(10) 

Unnamed: 0,pdays,pdays_group
0,999,Never Contacted
1,999,Never Contacted
2,999,Never Contacted
3,999,Never Contacted
4,999,Never Contacted
5,999,Never Contacted
6,999,Never Contacted
7,999,Never Contacted
8,999,Never Contacted
9,999,Never Contacted


In [52]:
bank_full[['pdays','pdays_group']].tail(10)

Unnamed: 0,pdays,pdays_group
41178,6,Contacted
41179,999,Never Contacted
41180,999,Never Contacted
41181,999,Never Contacted
41182,9,Contacted
41183,999,Never Contacted
41184,999,Never Contacted
41185,999,Never Contacted
41186,999,Never Contacted
41187,999,Never Contacted


In [53]:
bank_full['pdays_group'].value_counts()

Never Contacted    39673
Contacted           1500
Name: pdays_group, dtype: int64

In [54]:
# Inspecting the previous column
# Previous: Number of contacts performed before this campaign and for this client (numeric)

In [55]:
# The previous column
bank_full['previous'].value_counts() 

0    35563
1     4561
2      754
3      216
4       70
5       18
6        5
7        1
Name: previous, dtype: int64

I will keep the previous column and do EDA on it to verify its importance

In [56]:
# Inspecting the poutcome column
# Poutcome: outcome of the previous marketing campaign (categorical: 'failure','nonexistent','success')

In [57]:
# The poutcome column
bank_full['poutcome'].value_counts() 

nonexistent    35563
failure         4252
success         1373
Name: poutcome, dtype: int64

In [58]:
# social and economic context attributes
# Inspecting the emp.var.rate column
# emp.var.rate: employment variation rate - quarterly indicator (numeric)
# What is employment variation rate?
# From my research, employment variation rate is the rate at which an employee
# goes from one job to another or the ability to lose a job and get another or not

In [59]:
# The emp.var.rate column
bank_full['emp.var.rate'].value_counts()

 1.4    16234
-1.8     9184
 1.1     7763
-0.1     3683
-2.9     1663
-3.4     1071
-1.7      773
-1.1      635
-3.0      172
-0.2       10
Name: emp.var.rate, dtype: int64

In [60]:
# Inspecting the consumer price index column
# cons.price.idx: is a measure that examines the weighted average of prices 
# of a basket of consumer goods and services, such as transportation, food, and medical care. 
# It is calculated by taking price changes for each item in the predetermined basket of goods and averaging them

In [61]:
# The cons.price.idx column
bank_full['cons.price.idx'].value_counts().head()

93.994    7763
93.918    6685
92.893    5794
93.444    5175
94.465    4374
Name: cons.price.idx, dtype: int64

In [62]:
# Inspecting the cons.conf.idx column
# What is cons.conf.idx: consumer confidence indicates economic growth in which 
# consumers are spending money, indicating higher consumption. Decreasing consumer 
# confidence implies slowing economic growth, and so consumers are likely to decrease their spending.

In [63]:
# The cons.conf.idx column
bank_full['cons.conf.idx'].value_counts().head()

-36.4    7763
-42.7    6685
-46.2    5794
-36.1    5175
-41.8    4374
Name: cons.conf.idx, dtype: int64

In [64]:
# Inspecting the euribor3m column
# What is euribor3m: The Euro Interbank Offered Rate (Euribor) is a daily reference rate, 
# published by the European Money Markets Institute,[1] based on the averaged interest rates 
# at which Eurozone banks offer to lend unsecured funds to other banks in the euro wholesale 
# money market (or interbank market)

In [65]:
# The euribor3m column
bank_full['euribor3m'].value_counts().head() 

4.857    2868
4.962    2613
4.963    2487
4.961    1902
4.856    1210
Name: euribor3m, dtype: int64

In [66]:
# Inspecting the number of exployees column

In [67]:
# The number of employees column value count
bank_full['nr.employed'].value_counts()

5228.1    16234
5099.1     8534
5191.0     7763
5195.8     3683
5076.2     1663
5017.5     1071
4991.6      773
5008.7      650
4963.6      635
5023.5      172
5176.3       10
Name: nr.employed, dtype: int64

In [68]:
# Inspecting the y column
# y: has the client subscribed a term deposit? (binary: 'yes','no')

In [69]:
# The y column
bank_full['y'].value_counts()

no     36548
yes     4640
Name: y, dtype: int64

In [70]:
# Checking for duplicate values
bank_full.duplicated().sum()

5853

In [71]:
# Examine the rows with duplicate values
bank_full_dup = bank_full.loc[bank_full.duplicated(),:]

In [72]:
bank_full_dup.info() 

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5853 entries, 10 to 41157
Data columns (total 18 columns):
 #   Column          Non-Null Count  Dtype   
---  ------          --------------  -----   
 0   age             5853 non-null   int64   
 1   job             5853 non-null   object  
 2   marital         5853 non-null   object  
 3   education       5853 non-null   object  
 4   default         5853 non-null   object  
 5   housing loan    5853 non-null   object  
 6   personal loan   5853 non-null   object  
 7   pdays           5853 non-null   int64   
 8   previous        5853 non-null   int64   
 9   poutcome        5853 non-null   object  
 10  emp.var.rate    5853 non-null   float64 
 11  cons.price.idx  5853 non-null   float64 
 12  cons.conf.idx   5853 non-null   float64 
 13  euribor3m       5853 non-null   float64 
 14  nr.employed     5853 non-null   float64 
 15  y               5853 non-null   object  
 16  age_group       5853 non-null   category
 17  pdays_group 

In [73]:
bank_full_dup.iloc[:3]

Unnamed: 0,age,job,marital,education,default,housing loan,personal loan,pdays,previous,poutcome,emp.var.rate,cons.price.idx,cons.conf.idx,euribor3m,nr.employed,y,age_group,pdays_group
10,41,blue-collar,married,unknown,unknown,no,no,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no,Mid-Aged,Never Contacted
11,25,services,single,secondary education,no,yes,no,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no,Adults,Never Contacted
16,35,blue-collar,married,2nd basic,no,yes,no,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no,Adults,Never Contacted


In [74]:
# After checking the duplicate values, I have decided to drop it
bank_full.drop_duplicates(inplace=True)

In [75]:
bank_full.info() 

<class 'pandas.core.frame.DataFrame'>
Int64Index: 35335 entries, 0 to 41187
Data columns (total 18 columns):
 #   Column          Non-Null Count  Dtype   
---  ------          --------------  -----   
 0   age             35335 non-null  int64   
 1   job             35335 non-null  object  
 2   marital         35335 non-null  object  
 3   education       35335 non-null  object  
 4   default         35335 non-null  object  
 5   housing loan    35335 non-null  object  
 6   personal loan   35335 non-null  object  
 7   pdays           35335 non-null  int64   
 8   previous        35335 non-null  int64   
 9   poutcome        35335 non-null  object  
 10  emp.var.rate    35335 non-null  float64 
 11  cons.price.idx  35335 non-null  float64 
 12  cons.conf.idx   35335 non-null  float64 
 13  euribor3m       35335 non-null  float64 
 14  nr.employed     35335 non-null  float64 
 15  y               35335 non-null  object  
 16  age_group       35335 non-null  category
 17  pdays_group 

In [76]:
bank_full.shape

(35335, 18)