<a href="https://colab.research.google.com/github/nallagondu/datatrained_inter_public/blob/main/BANK_MARKETING_Predicting_Whether_The_Customer_Will_Subscribe_To_Term_Deposit_(FIXED_DEPOSIT)_or_not.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Business Use Case**

There has been a revenue decline for a Portuguese bank and they would like to know what actions to take. After investigation, they found out that the root cause is that their clients are not depositing as frequently as before. Knowing that term deposits allow banks to hold onto a deposit for a specific amount of time, so banks can invest in higher gain financial products to make a profit. In addition, banks also hold better chance to persuade term deposit clients into buying other products such as funds or insurance to further increase their revenues. As a result, the Portuguese bank would like to identify existing clients that have higher chance to subscribe for a term deposit and focus marketing efforts on such clients.


#**Project Description**
Your client is a retail banking institution. Term deposits are a major source of income for a bank. A term deposit is a cash investment held at a financial institution. Your money is invested for an agreed rate of interest over a fixed amount of time, or term. The bank has various outreach plans to sell term deposits to their customers such as email marketing, advertisements, telephonic marketing and digital marketing. Telephonic marketing campaigns still remain one of the most effective way to reach out to people. However, they require huge investment as large call centers are hired to actually execute these campaigns. Hence, it is crucial to identify the customers most likely to convert beforehand so that they can be specifically targeted via call.
You are provided with the client data such as : age of the client, their job type, their marital status, etc. Along with the client data, you are also provided with the information of the call such as the duration of the call, day and month of the call, etc. Given this information, your task is to predict if the client will subscribe to term deposit.


#**About The Dataset**
The dataset is related with direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal of this dataset is to predict if the client or the customer of polish banking institution will subscribe a term deposit product of the bank or not.


**You are provided with following 2 files:**
1.     train.csv : Use this dataset to train the model. This file contains all the client and call details as well as the target variable “subscribed”. You have to train your model using this file.
2.     test.csv : Use the trained model to predict whether a new set of clients will subscribe the term deposit.

Dataset Attributes
Here is the description of all the variables:
•	Variable: Definition
•	ID: Unique client ID
•	age: Age of the client
•	job: Type of job
•	marital: Marital status of the client
•	education: Education level
•	default: Credit in default.
•	housing: Housing loan
•	loan: Personal loan
•	contact: Type of communication
•	month: Contact month
•	day_of_week: Day of week of contact
•	duration: Contact duration
•	campaign: number of contacts performed during this campaign to the client
•	pdays: number of days that passed by after the client was last contacted
•	previous: number of contacts performed before this campaign
•	poutcome: outcome of the previous marketing campaign
Output variable (desired target):
•	Subscribed (target): has the client subscribed a term deposit? (YES/NO)








**Dataset Link-**
•	https://github.com/FlipRoboTechnologies/ML-Datasets/tree/main/Bank%20Marketing
•	https://github.com/FlipRoboTechnologies/ML-Datasets/blob/main/Bank%20Marketing/termdeposit_test.csv
•	https://raw.githubusercontent.com/FlipRoboTechnologies/ML-Datasets/main/Bank%20Marketing/termdeposit_train.csv


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

In [None]:
#loading the data

trained_url = 'https://raw.githubusercontent.com/FlipRoboTechnologies/ML-Datasets/main/Bank%20Marketing/termdeposit_train.csv'
test_url = 'https://raw.githubusercontent.com/FlipRoboTechnologies/ML-Datasets/main/Bank%20Marketing/termdeposit_test.csv'
trained_df = pd.read_csv(trained_url)
test_df = pd.read_csv(test_url)


In [None]:
trained_df.head()

In [None]:
test_df.head()

#Checking the columns


In [None]:
trained_df.columns


In [None]:
test_df.columns

In [None]:
#Test DATA Shape
test_df.shape

In [None]:
#trained DATA Shape
trained_df.shape

In [None]:
#find Null and missing values trained data
trained_df.isnull().sum()

In [None]:
#find Null and missing values in test data
test_df.isnull().sum()

#Frequency of subscribed data in trained and test

In [None]:
trained_df['subscribed'].value_counts()

In [None]:
trained_df['subscribed'] = trained_df['subscribed'].map({'yes':1,'no':0})
sns.distplot(trained_df['subscribed'])

In [None]:
trained_df['marital'] = trained_df['marital'].map({'married':1,'single':0,'divorced':2})
sns.distplot(trained_df['marital'])


In [None]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()

trained_df['job'] = le.fit_transform(trained_df['job'])
trained_df['education'] = le.fit_transform(trained_df['education'])
trained_df['contact'] = le.fit_transform(trained_df['contact'])
trained_df['housing'] = le.fit_transform(trained_df['housing'])
trained_df['default'] = le.fit_transform(trained_df['default'])
trained_df['poutcome'] = le.fit_transform(trained_df['poutcome'])
trained_df['month'] = le.fit_transform(trained_df['month'])
print(trained_df)
print('\n')
test_df['job'] = le.fit_transform(test_df['job'])
test_df['education'] = le.fit_transform(test_df['education'])
test_df['contact'] = le.fit_transform(test_df['contact'])
test_df['housing'] = le.fit_transform(test_df['housing'])
test_df['default'] = le.fit_transform(test_df['default'])
test_df['poutcome'] = le.fit_transform(test_df['poutcome'])
test_df['month'] = le.fit_transform(test_df['month'])
test_df['marital'] = test_df['marital'].map({'married':1,'single':0,'divorced':2})
test_df
print(test_df)


#Large number of customers are not subscribed to term deposit when compared to those who have subscribed

In [None]:
#normalize the frequency of scuscribed uses
normalized_freq = trained_df['subscribed'].value_counts(normalize=True)
normalized_freq

In [None]:
trained_df.info()

In [None]:
trained_df['job'].value_counts()

In [None]:
plt.figure(figsize=(20,5))
sns.countplot(x='job',hue='subscribed',data=trained_df,order = trained_df['job'].value_counts().index)
plt.title('Job vs Subscribed')
plt.show()

In [None]:
trained_df['marital'].value_counts()

In [None]:
plt.figure(figsize=(20,5))
sns.countplot(x='marital',hue='subscribed',data=trained_df,order = trained_df['marital'].value_counts().index)
plt.title('marital vs Subscribed')
plt.show()

In [None]:
#marital status vs subscribed
pd.crosstab(trained_df['marital'],trained_df['subscribed'])

In [None]:
sns.distplot(trained_df['age'])

We can see most of the user are between the age:  20-60 years

In [None]:
plt.figure(figsize=(20,5))
sns.countplot(x='marital',hue='job',data=trained_df,order = trained_df['marital'].value_counts().index)
plt.title('marital vs job')
plt.show()

In [None]:
plt.figure(figsize=(20,5))
sns.countplot(x='marital',hue='loan',data=trained_df,order = trained_df['marital'].value_counts().index)
plt.title('marital vs Loan')
plt.show()

In [None]:
plt.figure(figsize=(20,5))
sns.countplot(x='education',hue='subscribed',data=trained_df,order = trained_df['education'].value_counts().index)
plt.title('Education Distribution')
plt.show()

In [None]:
plt.figure(figsize=(20,5))
sns.countplot(data=trained_df,x='housing',order = trained_df['housing'].value_counts().index)
plt.title('Housing  Distribution')
plt.show()

In [None]:
plt.figure(figsize=(20,5))
sns.countplot(data=trained_df,x='month',order = trained_df['month'].value_counts().index)
plt.title('month  Distribution')
plt.tight_layout()
plt.show()

In [None]:
trained_df.info()

In [None]:
trained_df['subscribed'].replace({'yes':1,'no':0},inplace=True)
trained_df.info()

In [None]:
trained_df['subscribed']

In [None]:
#Correllation matrix
numeric_df  = trained_df.select_dtypes(include=['number'])
trained_df_corr = numeric_df.corr()
trained_df_corr

In [None]:
fig,ax = plt.subplots(figsize=(20,10))
sns.heatmap(trained_df_corr,annot=True)

We can observe  that, the  duration of the call is highly correlated with the target variable. As the duration of the call is more, there are higher chances that the client is showing interest in the term deposit and hence there are higher chances that the client will subscribe to term deposit.


In [None]:
# multivariate Analysis
sns.pairplot(trained_df,hue = 'subscribed',palette='Set2')
plt.show()

In [None]:
trained_df

In [None]:
plt.figure(figsize=(21,15), facecolor='red')
plotnumber = 1
for column in trained_df:
    if plotnumber<=8:
        ax = plt.subplot(4,4,plotnumber)
        if trained_df[column].dtype != 'object':
            sns.boxplot(y=trained_df[column],color = 'm')
            plt.xlabel(column,fontsize=20)
            plt.ylabel('Value' , rotation = 0, fontsize = 10)
    plotnumber+=1
plt.tight_layout()


In [None]:
trained_df.describe()

In [None]:
target = trained_df['subscribed']
train = trained_df.drop('subscribed', axis =1)


In [None]:
train=pd.get_dummies(train)
train.head()

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(train, target, test_size=0.3, random_state=42)

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
LR = LogisticRegression()
LR.fit(X_train,y_train)

In [None]:
LR_pred = LR.predict(X_test)
accuracy_score(y_test,LR_pred)


In [None]:
#Decision Tree
from sklearn.tree import DecisionTreeClassifier
Dt = DecisionTreeClassifier()
Dt.fit(X_train,y_train)

In [None]:
Dt_pred = Dt.predict(X_test)
Dt_pred

In [None]:
Dt_accuracy = accuracy_score(y_test,Dt_pred)
Dt_accuracy

#**Test Data Predection **

In [None]:
test = pd.get_dummies(test_df)
test.head()

In [None]:
test.info()

In [None]:
test.isnull().sum()

In [None]:
test = test.fillna(test.mean())
test

In [None]:
test.head()

In [None]:
test['marital'] = test['marital'].map({'married':1,'single':0,'divorced':2})
test['marital'] = test['marital'].fillna(-1)
test.head()

In [None]:
test_predict = Dt.predict(test)
test_predict

In [None]:
test_submision = pd.DataFrame()
test_submision['ID'] = test['ID']
test_submision['subscribed'] = test_predict


In [None]:
test_submision['subscribed']

In [None]:
#converting  subscription values form o: no  and 1 in to Yes
test_submision['subscribed'].replace(1,'yes',inplace=True)
test_submision['subscribed'].replace(0,'no',inplace=True)
test_submision