# Term deposit marketing
## Background
We are a small startup focusing mainly on providing machine learning solutions in the European banking market. We work on a variety of problems including fraud detection, sentiment classification and customer intention prediction and classification.

We are interested in developing a robust machine learning system that leverages information coming from call center data.

Ultimately, we are looking for ways to improve the success rate for calls made to customers for any product that our clients offer. Towards this goal we are working on designing an ever evolving machine learning product that offers high success outcomes while offering interpretability for our clients to make informed decisions.

## Data description
The data comes from direct marketing efforts of a European banking institution. The marketing campaign involves making a phone call to a customer, often multiple times to ensure a product subscription, in this case a term deposit. Term deposits are usually short-term deposits with maturities ranging from one month to a few years. The customer must understand when buying a term deposit that they can withdraw their funds only after the term ends. All customer information that might reveal personal information is removed due to privacy concerns.

**Attributes:**
- age : age of customer (numeric)
- job : type of job (categorical)
- marital : marital status (categorical)
- education (categorical)
- default: has credit in default? (binary)
- balance: average yearly balance, in euros (numeric)
- housing: has a housing loan? (binary)
- loan: has personal loan? (binary)
- contact: contact communication type (categorical)
- day: last contact day of the month (numeric)
- month: last contact month of year (categorical)
- duration: last contact duration, in seconds (numeric)
- campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)

**Output (Desired target)**
- y - has the client subscribed to a term deposit? (binary)

**Download data**
- https://drive.google.com/file/d/1EW-XMnGfxn-qzGtGPa3v_C63Yqj2aGf7

## Goals
- Predict if the customer will subscribe (yes/no) to a term deposit (variable y)

## Success metric(s)
- Hit %81 or above accuracy by evaluating with 5-fold cross validation and reporting the average performance score.

## Bonus(es)

- We are also interested in finding customers who are more likely to buy the investment product. Determine the segment(s) of customers our client should prioritize.
- What makes the customers buy? Tell us which feature we should be focusing more on.


# Data Wrangling
## Imports
Place imports at the start of the notebook so that you only need to consult one place to check your notebook dependencies.

In [2]:
# load libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings 
warnings.filterwarnings('ignore')
from sklearn.metrics import r2_score,classification_report,accuracy_score,roc_curve, roc_auc_score
from sklearn.model_selection import train_test_split,cross_val_score,KFold,learning_curve
from sklearn.model_selection import StratifiedKFold,cross_validate,GridSearchCV,RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from xgboost import XGBClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from imblearn.over_sampling import SMOTE
from collections import Counter
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.preprocessing import StandardScaler

**Load the telemarketing data**

In [5]:
#df = pd.read_csv('bank-full.csv')

# set a path variable to the directory containing bank telemarketing data:. ../raw_data
path = '../raw_data'
# Concatenate directory path to the file name: bank_path
tele_path = path + '/term-deposit-marketing-2020.csv'
# Load data file to a dataframe: df
df = pd.read_csv(tele_path)

## View and inspect data

In [6]:
# Get number of rows and columns of our dataset: df.shape
df.shape

(40000, 14)