# Bank marketing campaigns dataset - Opening Term Deposit

## Problem Statement

#### Bank marketing campaigns dataset analysis # Opening a Term Deposit. It is a dataset that describing Portugal bank marketing campaigns results. Conducted campaigns were based mostly on direct phone calls, offering bank client to place a term deposit. If after all marking afforts client had agreed to place deposit - target variable marked 'yes', otherwise 'no'

#### To identify potential customers who opened term deposit in their bank.

## Packages used:-

In [None]:
import pandas as pd

import numpy as np

import scipy

In [None]:
from sklearn.linear_model import LogisticRegression

from sklearn.tree import DecisionTreeClassifier

from sklearn.ensemble import RandomForestClassifier

from sklearn.neighbors import KNeighborsClassifier

from sklearn.svm import SVC

from sklearn.naive_bayes import GaussianNB

In [None]:
from sklearn.ensemble import BaggingClassifier

from sklearn.ensemble import AdaBoostClassifier

from sklearn.ensemble import GradientBoostingClassifier

from sklearn.ensemble import RandomForestClassifier

In [None]:
from sklearn.preprocessing import LabelEncoder

from sklearn.model_selection import train_test_split,KFold

from sklearn.model_selection import GridSearchCV,RandomizedSearchCV

from imblearn.over_sampling import SMOTE

from scipy.stats import skew,kurtosis,boxcox,boxcox_normmax

In [None]:
from sklearn import metrics

from sklearn.metrics import accuracy_score,precision_score,roc_auc_score,f1_score,recall_score,auc,make_scorer

from sklearn.metrics import confusion_matrix,classification_report

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

import statsmodels.api as sm

import seaborn as sns

## Algorithms Used:-

1.     Logistic Regression
           1.1 LR Bagging Classifier
           1.2 LR Adaboost Classifier
           1.3 LR Kfold Bagging Classifier
2.     Decision Tree
           2.1 Decision Tree - Gini - Regularized
           2.2 Decision Tree - Entropy - Regularized
           2.3 Decision Tree - Bagging Classifier
           2.4 Decision Tree - Adaboost Classifier
           2.5 Decision Tree - Gradient Classifier
3.     Random Forest
           3.1 Random Forest - Entropy - Regularized
           3.2 Random Forest - Kfold Bagging Classifier
           3.3 Random Forest - Adaboost Classifier
           3.4 Random Forest - Gradient Classifier
4.     KNN
           4.1 KNN - Bagging Classifier
           4.2 KNN - Parameter Optimization
5.     SVM
6.     Naive Bayes - GausianNB

## Import dataset

In [None]:
df = pd.read_csv('../input/bank-marketing-campaigns-dataset/bank-additional-full.csv',sep=';')

In [None]:
df

## Data Preprocessing or Data Cleaning:-

### Check duplicates:-

In [None]:
df[df.duplicated()]

### Remove duplicates:-

In [None]:
df.drop_duplicates(inplace=True)

### Reseting Index:-

In [None]:
df.reset_index(inplace=True)

In [None]:
df.drop('index',axis=1,inplace=True)

In [None]:
df

### Null Check:-

In [None]:
df.isnull().sum()

### Datatype Check:-

In [None]:
df.info()

### Stats Check:-

In [None]:
df.describe()

### Columns Check:-

In [None]:
df.columns

In [None]:
df

#### Age:-
* Check Outliers ,Skewness or Kurtosis detected or not.
* Outliers - Data points deviates significantly than other data points
            - Data points which falls long way than other data points
Note : If any Outliers,Skewness or Kurtosis detected, then do boxcox transformation            

In [None]:
plt.figure(figsize=(8,6))
sns.distplot(df.age)
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title('Distribution of age')

**Its deviate slightly at right side.**
It should follow the normal distrubution

In [None]:
plt.figure(figsize=(8,6))
sns.boxplot(df.age)
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title('Boxplot - Outliers Detection')

**Outliers detected, so we do boxcox transformation**

### Skewness & Kurtosis
* #### skewness = 0 : normally distributed.
* #### skewness > 0 : more weight in the left tail of the distribution.
* #### skewness < 0 : more weight in the right tail of the distribution.
* #### Kurtosis = 3 : normally distributed - Mesokurtic
* #### Kurtosis > 3 : normally distributed - Leptokurtic
* #### Kurtosis < 3 : normally distributed - Platykurtic

In [None]:
import scipy #Scientific Python
print('Skewness',scipy.stats.skew(df.age))

In [None]:
import scipy #Scientific Python
print('Kurtosis',scipy.stats.kurtosis(df.age))

### Boxcox Transformation Technique:-

In [None]:
from scipy.stats import boxcox,boxcox_normmax

In [None]:
df.age = boxcox(df.age,boxcox_normmax(df.age))

#### After Boxcox Transform:-

In [None]:
plt.figure(figsize=(8,6))
sns.distplot(df.age)
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title('Distribution of age - After Boxcox transformation')

**Note :- Its not following the normal distribution even after doing boxcox transformation. Better we can drop this column.**

In [None]:
plt.figure(figsize=(8,6))
sns.boxplot(df.age)
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title('Boxplot for Outliers Detection - After Boxcox transformation')

#### Now the outliers are removed after the boxcox technique

### Skewness & Kurtosis after Boxcox Transformation

In [None]:
print('Skewness after Boxcox',scipy.stats.skew(df.age))
print('Kurtosis after Boxcox',scipy.stats.kurtosis(df.age))


#### Both the Skewness & Kurtosis are reduced after Boxcox Transformation.

* Before Boxcox:-
    *     Skewness 0.7845316793906337
    *     Kurtosis 0.7908715485573286
* After Boxcox :-
    *     Skewness -0.006389818305811041
    *     Kurtosis -0.38321858182694646

### Convert all categorical variable into numerical structure

#### Job:-

In [None]:
df.job.value_counts()

In [None]:
df.job.unique()

In [None]:
from sklearn.preprocessing import LabelEncoder
l_enc = LabelEncoder()

In [None]:
df.job = l_enc.fit_transform(df.job)

In [None]:
df.job.unique()

#### Marital:-

In [None]:
print(df.marital.unique())

In [None]:
print(sorted(df.marital.unique()))

In [None]:
df.marital = df.marital.replace(['divorced', 'married', 'single', 'unknown'],[0,1,2,3])

#### Education:-

In [None]:
df.education.unique()

In [None]:
print(sorted(df.education.unique()))

In [None]:
df.education = df.education.replace(['basic.4y', 'basic.6y', 'basic.9y', 'high.school', 'illiterate', 'professional.course', 'university.degree', 'unknown'],
                                    [0,1,2,3,4,5,6,7])

In [None]:
df.education.unique()

#### Default:-

In [None]:
df.default.unique()

In [None]:
print(sorted(df.default.unique()))

In [None]:
df.default = df.default.replace(['no', 'unknown', 'yes'],[0,1,2])

#### Housing:-

In [None]:
df.housing = df.housing.replace(['no', 'unknown', 'yes'],[0,1,2])

#### Loan:-

In [None]:
df.loan = df.loan.replace(['no', 'unknown', 'yes'],[0,1,2])

#### Contact:-

In [None]:
df.contact = df.contact.replace(['telephone', 'cellular'],[1,0])

#### Month:-

In [None]:
print(sorted(df.month.unique()))

In [None]:
df.month = df.month.replace(['apr', 'aug', 'dec', 'jul', 'jun', 'mar', 'may', 'nov', 'oct', 'sep'],range(0,10))

#### Day_of_week:-

In [None]:
print(sorted(df.day_of_week.unique()))

In [None]:
df.day_of_week = df.day_of_week.replace(['fri', 'mon', 'thu', 'tue', 'wed'],[0,1,2,3,4])

#### Poutcome:-

In [None]:
print(sorted(df.poutcome.unique()))

In [None]:
df.poutcome = df.poutcome.replace(['failure', 'nonexistent', 'success'],[0,1,2])

#### y - Output Variable or Dependent Variable:-

In [None]:
df.y.unique()

In [None]:
df.y = df.y.replace(['no', 'yes'],[0,1])

In [None]:
df.sample(3)

## To check class Imbalanced or not:-

In [None]:
df.y.value_counts()

In [None]:
sns.countplot(df.y)

#### Note :- Dataset is highly imbalanced so we need to use below technique to overcome this problem after the Train Test Split
#### SMOTE - SYNTHATIC MINORITY OF OVER SAMPLING TECHNIQUE

### To Identify Significant Variable:-

#### Correlation Graph:- Check Multicolinearity
*     Correlation between 2 input variables should be very low - Weak Correlation
*     Correlation between input & output variables should be very high - Strong Correlation

Note: We can straight away remove those input columns which is having more correlated.

In [None]:
plt.figure(figsize=(14,12))
sns.heatmap(df.corr(),
            annot=True,
            linewidth=.5,
            center = 0,
            cbar=False,
            cmap='YlGnBu')
plt.show()

#### Below columns are highly correlated with another input columns except y column, so dropping the same.

In [None]:
df = df.drop(['marital','contact','pdays','previous', 'emp.var.rate', 'cons.price.idx',
       'cons.conf.idx', 'euribor3m', 'nr.employed','duration'],axis=1)

In [None]:
df

#### Difine X & y variables:-

In [None]:
X = df.loc[:,df.columns != 'y']
y = df.loc[:,df.columns == 'y']

In [None]:
X

In [None]:
y

### OLS Method (Oridinary Least Square) - To Identify Significant Variable
### p-value <= 0.05

In [None]:
import statsmodels.api as sm

ols = sm.OLS(y,X).fit()

print(ols.summary2())

#### Note : P-value should be less than or equal to 0.05
#### From the above result, we are removing the age , loan, month coulumns,because of its P-value > 0.05

In [None]:
df = df.drop(['age','loan','month'],axis =1)

In [None]:
X = df.loc[:,df.columns != 'y']
y = df.loc[:,df.columns == 'y']

## To check class Imbalanced or not.

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

In [None]:
y_train.value_counts()

#### Data is highly imbalanced, So apply SMOTE on your trining datset.
#### Note:- Dont apply SMOTE on testing data because its our original dataset which needs to be predicted
#### Over Sampling the Minority Class - by SMOTE (Synthetic Minority Over Sampling Technique)

In [None]:
from imblearn.over_sampling import SMOTE

os = SMOTE(random_state = 2)

os_X, os_y = os.fit_resample(X_train,y_train)

In [None]:
os_y.value_counts()

#### Now both the classes are balanced equally :)

### Algorithms Used:-
*     Logistic Regression
*     Decision Tree
*     Random Forest
*     KNN
    

## Model 1 :- Logistic Regression

#### 1.1 Logistic Regression

In [None]:
from sklearn.linear_model import LogisticRegression

In [None]:
LR = LogisticRegression()

# Train Your Model
LR.fit(os_X,os_y)

# Predict the model
LR_predicted_y = LR.predict(X_test)

In [None]:
from sklearn.metrics import accuracy_score,confusion_matrix,classification_report

In [None]:
# Confusion Matrix
LR_CM = confusion_matrix(y_test,LR_predicted_y)
print(LR_CM)

# Accuracy Score
Accuracy = round(accuracy_score(y_test,LR_predicted_y)*100,2)
print('LR_Accuracy is ', Accuracy)

In [None]:
# Classification Report
print(classification_report(y_test,LR_predicted_y))

#### 1.2 LR - Bagging Classifier

In [None]:
from sklearn.ensemble import BaggingClassifier,AdaBoostClassifier,GradientBoostingClassifier,RandomForestClassifier

In [None]:
# Bagging Classifier Technique
LR_Bagged = BaggingClassifier(base_estimator=LR,random_state=0)

# Train your model
LR_Bagged.fit(os_X,os_y)

# Predict the model
LR_Bagged_predicted_y = LR_Bagged.predict(X_test)

In [None]:
# Confusion Matrix
LR_Bagged_CM = confusion_matrix(y_test,LR_Bagged_predicted_y)
print(LR_Bagged_CM)

# Accuracy Score
LR_Bagged_Accuracy = round(accuracy_score(y_test,LR_Bagged_predicted_y)*100,2)
print('LR_Bagged_Accuracy is ', LR_Bagged_Accuracy)

# Classification Report
print(classification_report(y_test,LR_Bagged_predicted_y))

#### 1.3 LR - Boosting Classifier

In [None]:
# Boosting Classifier Technique
LR_Ada_Boost = AdaBoostClassifier(base_estimator=LR,random_state=0)

# Train your model
LR_Ada_Boost.fit(os_X,os_y)

# Predict the model
LR_Adaboost_predicted_y = LR_Ada_Boost.predict(X_test)

In [None]:
# Confusion Matrix
LR_Adaboost_CM = confusion_matrix(y_test,LR_Adaboost_predicted_y)
print(LR_Adaboost_CM)

# Accuracy Score
LR_Adaboost_Accuracy = round(accuracy_score(y_test,LR_Adaboost_predicted_y)*100,2)
print('LR_Adaboost_Accuracy is ', LR_Adaboost_Accuracy)

# Classification Report
print(classification_report(y_test,LR_Adaboost_predicted_y))

#### 1.4 Parameter Optimization
*     GridSearchCV
*     RandomisedSearchCV

In [None]:
param = {'n_estimators': np.arange(1,10)}
kfold = KFold(n_splits=5, shuffle=True, random_state=0)

# LR_Bagged_GS = GridSearchCV(LR_Bagged,param, cv= kfold ,scoring= accuracy_score) # Taking more time consumption, went through RandomisedSearchCV
# LR_Bagged_GS.fit(X,y)

LR_Bagged_RS = RandomizedSearchCV(LR_Bagged,param, cv= kfold ,scoring= accuracy_score)
LR_Bagged_RS.fit(X,y)

In [None]:
# print(LR_Bagged_GS.best_params_)
print(LR_Bagged_RS.best_params_)
# LR_RS_n_estimators = LR_Bagged_RS['n_estimators']

In [None]:
LR_Bagged_Kfold = BaggingClassifier(base_estimator=LR,n_estimators=1,random_state=0)

# Train your model
LR_Bagged_Kfold.fit(os_X,os_y)

# Predict your model
LR_Bagged_Kfold_predicted_y = LR_Bagged_Kfold.predict(X_test)

In [None]:
# Confusion Matrix
LR_Bagged_Kfold_CM = confusion_matrix(y_test,LR_Bagged_Kfold_predicted_y)
print(LR_Bagged_Kfold_CM)

# Accuracy Score
LR_Bagged_Kfold_Accuracy = round(accuracy_score(y_test,LR_Bagged_Kfold_predicted_y)*100,2)
print('LR_Bagged_Kfold_Accuracy is ', LR_Bagged_Kfold_Accuracy)

# Classification Report
print(classification_report(y_test,LR_Bagged_Kfold_predicted_y))

In [None]:
print(4506/(4506+377))
print(1057/(1057+6413))

## Model 2:- Decision Tree

In [None]:
from sklearn.tree import DecisionTreeClassifier

In [None]:
DT_Gini = DecisionTreeClassifier() # Fully growned tree which has more bias and variance error

# Train the model
DT_Gini.fit(os_X,os_y)

# Predict the model
DT_Gini_predicted_y = DT_Gini.predict(X_test)


In [None]:
# Confusion Matrix
DT_Gini_CM = confusion_matrix(y_test,DT_Gini_predicted_y)
print(DT_Gini_CM)

# Accuracy Score
DT_Gini_Accuracy_Score = round(accuracy_score(y_test,DT_Gini_predicted_y)*100,2)
print('DT_Gini_Accuracy_Score is ', DT_Gini_Accuracy_Score)

# Classification Report
print(classification_report(y_test,DT_Gini_predicted_y))

#### 2.1 Decision Tree - Regularized

In [None]:
DT_Gini_Semi_Grown = DecisionTreeClassifier(max_depth=3,random_state=0)

# Train the model
DT_Gini_Semi_Grown.fit(os_X,os_y)

# Predict the model
DT_Gini_Semi_Grown_predicted_y = DT_Gini_Semi_Grown.predict(X_test)

In [None]:
# Confusion Matrix
DT_Gini_Semi_Grown_CM = confusion_matrix(y_test,DT_Gini_Semi_Grown_predicted_y)
print(DT_Gini_Semi_Grown_CM)

# Accuracy Score
DT_Gini_Semi_Grown_Accuracy_Score = round(accuracy_score(y_test,DT_Gini_Semi_Grown_predicted_y)*100,2)
print('DT_Gini_Semi_Grown_Accuracy_Score is ', DT_Gini_Semi_Grown_Accuracy_Score)

# Classification Report
print(classification_report(y_test,DT_Gini_Semi_Grown_predicted_y))

#### 2.2 Decision Tree - Entropy Method - Regularized

In [None]:
DT_Entropy_Semi_Grown = DecisionTreeClassifier(criterion='entropy',max_depth=3,random_state=0)

# Train the model
DT_Entropy_Semi_Grown.fit(os_X,os_y)

# Predict the model
DT_Entropy_Semi_Grown_predicted_y = DT_Entropy_Semi_Grown.predict(X_test)

In [None]:
# Confusion Matrix
DT_Entropy_Semi_Grown_CM = confusion_matrix(y_test,DT_Entropy_Semi_Grown_predicted_y)
print(DT_Entropy_Semi_Grown_CM)

# Accuracy Score
DT_Entropy_Semi_Grown_Accuracy_Score = round(accuracy_score(y_test,DT_Entropy_Semi_Grown_predicted_y)*100,2)
print('DT_Entropy_Semi_Grown_Accuracy_Score is ', DT_Entropy_Semi_Grown_Accuracy_Score)

# Classification Report
print(classification_report(y_test,DT_Entropy_Semi_Grown_predicted_y))

#### 2.3 Decision Tree - Bagging Classifier

In [None]:
DT_Bagged = BaggingClassifier(base_estimator=DT_Gini_Semi_Grown,n_estimators=1)

DT_Bagged.fit(os_X,os_y)

DT_Bagged_predicted_y = DT_Bagged.predict(X_test)

In [None]:
# Confusion Matrix
DT_Bagged_CM = confusion_matrix(y_test,DT_Bagged_predicted_y)
print(DT_Bagged_CM)

# Accuracy Score
DT_Bagged_Accuracy_Score = round(accuracy_score(y_test,DT_Bagged_predicted_y)*100,2)
print('DT_Bagged_Accuracy_Score is ', DT_Bagged_Accuracy_Score)

# Classification Report
print(classification_report(y_test,DT_Bagged_predicted_y))

#### 2.4 Decision Tree - Adaboost Classifier

In [None]:
DT_Adaboost = AdaBoostClassifier(base_estimator=DT_Gini_Semi_Grown,n_estimators=1)

DT_Adaboost.fit(os_X,os_y)

DT_Adaboost_predicted_y = DT_Adaboost.predict(X_test)

In [None]:
# Confusion Matrix
DT_Adaboost_CM = confusion_matrix(y_test,DT_Adaboost_predicted_y)
print(DT_Adaboost_CM)

# Accuracy Score
DT_Adaboost_Accuracy_Score = round(accuracy_score(y_test,DT_Adaboost_predicted_y)*100,2)
print('DT_Adaboost_Accuracy_Score is ', DT_Adaboost_Accuracy_Score)

# Classification Report
print(classification_report(y_test,DT_Adaboost_predicted_y))

#### 2.4 Decision Tree - Gradient Boost Classifier

In [None]:
DT_Gboost = GradientBoostingClassifier(n_estimators=1)

DT_Gboost.fit(os_X,os_y)

DT_Gboost_predicted_y = DT_Gboost.predict(X_test)

In [None]:
# Confusion Matrix
DT_Gboost_CM = confusion_matrix(y_test,DT_Gboost_predicted_y)
print(DT_Gboost_CM)

# Accuracy Score
DT_Gboost_Accuracy_Score = round(accuracy_score(y_test,DT_Gboost_predicted_y)*100,2)
print('DT_Gboost_Accuracy_Score is ', DT_Gboost_Accuracy_Score)

# Classification Report
print(classification_report(y_test,DT_Gboost_predicted_y))

## Model 3:- Random Foresst

In [None]:
from sklearn.ensemble import RandomForestClassifier

In [None]:
RF_Gini = RandomForestClassifier(n_estimators=100,criterion='gini',random_state=0)

RF_Gini.fit(os_X,os_y)

RF_Gini_predicted_y = RF_Gini.predict(X_test)

In [None]:
# Confusion Matrix
RF_Gini_CM = confusion_matrix(y_test,RF_Gini_predicted_y)
print(RF_Gini_CM)

# Accuracy Score
RF_Gini_Accuracy_Score = round(accuracy_score(y_test,RF_Gini_predicted_y)*100,2)
print('RF_Gini_Accuracy_Score is ', RF_Gini_Accuracy_Score)

# Classification Report
print(classification_report(y_test,RF_Gini_predicted_y))

#### 3.1 Random Forest - Entropy Method

In [None]:
RF_Entropy = RandomForestClassifier(n_estimators=100,criterion='entropy',random_state=0)

RF_Entropy.fit(os_X,os_y)

RF_Entropy_predicted_y = RF_Entropy.predict(X_test)

In [None]:
# Confusion Matrix
RF_Entropy_CM = confusion_matrix(y_test,RF_Entropy_predicted_y)
print(RF_Entropy_CM)

# Accuracy Score
RF_Entropy_Accuracy_Score = round(accuracy_score(y_test,RF_Entropy_predicted_y)*100,2)
print('RF_Entropy_Accuracy_Score is ', RF_Entropy_Accuracy_Score)

# Classification Report
print(classification_report(y_test,RF_Entropy_predicted_y))

#### 3.2 Random Forest - Parameter Optimization

In [None]:
param = {'n_estimators': np.arange(1,10)}
kfold = KFold(n_splits=5, shuffle=True, random_state=0)

RF_RS = RandomizedSearchCV(RF_Gini,param, cv= kfold ,scoring= accuracy_score)
RF_RS.fit(X,y)

In [None]:
print(RF_RS.best_params_)
print(RF_RS.best_estimator_)

In [None]:
RF_RS_Kfold = BaggingClassifier(base_estimator=RF_Gini,n_estimators=1,random_state=0)

# Train your model
RF_RS_Kfold.fit(os_X,os_y)

# Predict your model
RF_RS_Kfold_predicted_y = RF_RS_Kfold.predict(X_test)

In [None]:
# Confusion Matrix
RF_RS_Kfold_CM = confusion_matrix(y_test,RF_RS_Kfold_predicted_y)
print(RF_RS_Kfold_CM)

# Accuracy Score
RF_RS_Kfold_Accuracy_Score = round(accuracy_score(y_test,RF_RS_Kfold_predicted_y)*100,2)
print('RF_RS_Kfold_Accuracy_Score is ', RF_RS_Kfold_Accuracy_Score)

# Classification Report
print(classification_report(y_test,RF_RS_Kfold_predicted_y))

#### 3.3 Random Forest - Adaboost Classifier

In [None]:
RF_Adaboost = AdaBoostClassifier(base_estimator=RF_Gini,n_estimators=1,random_state=0)

RF_Adaboost.fit(os_X,os_y)

RF_Adaboost_predict_y = RF_Adaboost.predict(X_test)

In [None]:
# Confusion Matrix
RF_Adaboost_CM = confusion_matrix(y_test,RF_Adaboost_predict_y)
print(RF_Adaboost_CM)

# Accuracy Score
RF_Adaboost_Accuracy_Score = round(accuracy_score(y_test,RF_Adaboost_predict_y)*100,2)
print('RF_Adaboost_Accuracy_Score is ', RF_Adaboost_Accuracy_Score)

# Classification Report
print(classification_report(y_test,RF_Adaboost_predict_y))

#### 3.4 Random Forest - Gradientboost Classifier

In [None]:
RF_Gboost = GradientBoostingClassifier(n_estimators=1)

RF_Gboost.fit(os_X,os_y)

RF_Gboost_predicted_y = RF_Gboost.predict(X_test)

In [None]:
# Confusion Matrix
RF_Gboost_CM = confusion_matrix(y_test,RF_Gboost_predicted_y)
print(RF_Gboost_CM)

# Accuracy Score
RF_Gboost_Accuracy_Score = round(accuracy_score(y_test,RF_Gboost_predicted_y)*100,2)
print('RF_Gboost_Accuracy_Score is ', RF_Gboost_Accuracy_Score)

# Classification Report
print(classification_report(y_test,RF_Gboost_predicted_y))

## Model 4:- KNN

In [None]:
from sklearn.neighbors import KNeighborsClassifier

In [None]:
KNN = KNeighborsClassifier(n_neighbors=2)

KNN.fit(os_X,os_y)

KNN_predicted_y = KNN.predict(X_test)

In [None]:
# Confusion Matrix
KNN_CM = confusion_matrix(y_test,KNN_predicted_y)
print(KNN_CM)

# Accuracy Score
KNN_Accuracy_Score = round(accuracy_score(y_test,KNN_predicted_y)*100,2)
print('KNN_Accuracy_Score is ', KNN_Accuracy_Score)

# Classification Report
print(classification_report(y_test,KNN_predicted_y))

#### 4.1 KNN - Bagged Classifier

In [None]:
KNN_Bagged = BaggingClassifier(base_estimator=KNN,n_estimators=1,random_state=0)

KNN_Bagged.fit(os_X,os_y)

KNN_Bagged_predicted_y = KNN_Bagged.predict(X_test)

In [None]:
# Confusion Matrix
KNN_Bagged_CM = confusion_matrix(y_test,KNN_Bagged_predicted_y)
print(KNN_Bagged_CM)

# Accuracy Score
KNN_Bagged_Accuracy_Score = round(accuracy_score(y_test,KNN_Bagged_predicted_y)*100,2)
print('KNN_Bagged_Accuracy_Score is ', KNN_Bagged_Accuracy_Score)

# Classification Report
print(classification_report(y_test,KNN_Bagged_predicted_y))

#### 4.2 KNN - Parameter Optimization 

In [None]:
param = {'n_neighbors':np.arange(1,50),
         'weights':['uniform','distance']}
KNN_RS = RandomizedSearchCV(KNN,param,cv=5,scoring='roc_auc')

KNN_RS.fit(X,y)

In [None]:
KNN_RS.best_params_

In [None]:
KNN_RS.best_params_
KNN_RS_params = KNN_RS.best_params_

knn_weights = KNN_RS_params['weights']
knn_n_neighbors = KNN_RS_params['n_neighbors']

KNN1 = KNeighborsClassifier(n_neighbors=knn_n_neighbors, weights=knn_weights)

KNN1.fit(os_X,os_y)

KNN1_predicted_y = KNN1.predict(X_test)

In [None]:
# Confusion Matrix
KNN1_CM = confusion_matrix(y_test,KNN1_predicted_y)
print(KNN1_CM)

# Accuracy Score
KNN1_Accuracy_Score = round(accuracy_score(y_test,KNN1_predicted_y)*100,2)
print('KNN1_Accuracy_Score is ', KNN1_Accuracy_Score)

# Classification Report
print(classification_report(y_test,KNN1_predicted_y))

## Model 5:- SVM

In [None]:
from sklearn.svm import SVC

In [None]:
SVM = SVC()

SVM.fit(os_X,os_y)

SVM_predicted_y = SVM.predict(X_test)

In [None]:
# Confusion Matrix
SVM_CM = confusion_matrix(y_test,SVM_predicted_y)
print(SVM_CM)

# Accuracy Score
SVM_Accuracy_Score = round(accuracy_score(y_test,SVM_predicted_y)*100,2)
print('SVM_Accuracy_Score is ', SVM_Accuracy_Score)

# Classification Report
print(classification_report(y_test,SVM_predicted_y))

## Model 6:- Naive Bias

In [None]:
from sklearn.naive_bayes import GaussianNB

In [None]:
NB = GaussianNB()

NB.fit(os_X,os_y)

NB_predicted_y = NB.predict(X_test)

In [None]:
# Confusion Matrix
NB_CM = confusion_matrix(y_test,NB_predicted_y)
print(NB_CM)

# Accuracy Score
NB_Accuracy_Score = round(accuracy_score(y_test,NB_predicted_y)*100,2)
print('NB_Accuracy_Score is ', NB_Accuracy_Score)

# Classification Report
print(classification_report(y_test,NB_predicted_y))

#### Note : Finally I choose the model GaussianNB as best model to identify potential customers who opening the term deposit. Even though the model has very low accuracy but the FPR rate is very less.
####        For better Accuracy model, we can take KNN model as 75.58 % Accuracy Score.[](http://)