# Bank Customers Churn 



![](https://image.slidesharecdn.com/2017olofcustomerchurn-180312012319/95/customer-churn-prediction-in-banking-1-638.jpg?cb=1520817951)

# Data Preprocessing

# IMPORTING THE LIBRARIES

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
import scipy as sp
import warnings
import string
import datetime
warnings.filterwarnings("ignore")
%matplotlib inline

# LOADING THE DATASET

In [None]:
data = pd.read_csv('/kaggle/input/bank-customers/Churn Modeling.csv')
data.head()


In [None]:
data.describe()

In [None]:
data.info()

In [None]:
data.columns

In [None]:
data.dtypes

**So our dataset contains different data types like int , object ,float.** 

In [None]:
data.mean()

In [None]:
data.value_counts()

**Checking Null Values**

In [None]:
data.isnull().sum()

In [None]:
data.isnull().any()

In [None]:
data.shape

**So our dataset contains 10000 rows and 14 columns**

# Exploratory Data Analysis

In [None]:
data.corr()

**HEATMAP**

**A heatmap is a graphical representation of data that uses a system of color-coding to represent different values. Heatmaps are used in various forms of analytics but are most commonly used to show user behaviour on specific webpages or webpage templates.**


In [None]:
plt.figure(figsize = (16,10))

sns.heatmap(data.corr(), annot =True)


**HISTOGRAM**

**A histogram is basically used to represent data provided in a form of some groups.It is accurate method for the graphical representation of numerical data distribution.It is a type of bar plot where X-axis represents the bin ranges while Y-axis gives information about frequency.**



In [None]:
data.hist(figsize=(18,12))
plt.show()

**SCATTER PLOT**

**A scatter plot (aka scatter chart, scatter graph) uses dots to represent values for two different numeric variables. The position of each dot on the horizontal and vertical axis indicates values for an individual data point. Scatter plots are used to observe relationships between variables.**



In [None]:
sns.scatterplot(x='Balance', y= 'CreditScore', data=data)

**PAIRPLOT**

**A pairplot plot a pairwise relationships in a dataset. The pairplot function creates a grid of Axes such that each variable in data will by shared in the y-axis across a single row and in the x-axis across a single column.**


In [None]:
sns.set_style("whitegrid") 
mean_col = ['RowNumber','Gender','Age','Tenure','Balance','Exited']

sns.pairplot(data[mean_col],palette='Accent')

**RELPLOT**

**This is a figure-level function for visualizing statistical relationships using two common approaches: scatter plots and line plots. ... relplot() combines a FacetGrid with one of two axes-level functions: scatterplot() (with kind="scatter" ; the default)**



In [None]:
sns.relplot(x='Age', y= 'CreditScore', data=data)

**JOINTPLOT**

**Seaborn's jointplot displays a relationship between 2 variables (bivariate) as well as 1D profiles (univariate) in the margins. This plot is a convenience class that wraps JointGrid.**



In [None]:
sns.jointplot(x='Balance', y= 'CreditScore', data=data)


**KDE PLOT (DENSITY PLOT)**

**KDE Plot described as Kernel Density Estimate is used for visualizing the Probability Density of a continuous variable. It depicts the probability density at different values in a continuous variable. We can also plot a single graph for multiple samples which helps in more efficient data visualization.**



In [None]:
plt.style.use("ggplot")
plt.figure(figsize=(14,8))
plt.xlabel('Balance')
plt.ylabel('CreditScore')
sns.kdeplot(data['Balance'],shade=True,color='blue')
plt.show()



**BARPLOT**

**A barplot (or barchart) is one of the most common types of graphic. It shows the relationship between a numeric and a categoric variable. Each entity of the categoric variable is represented as a bar. The size of the bar represents its numeric value.**

In [None]:
plt.style.use("default")
sns.barplot(x="Balance", y="CreditScore",data=data[179:190])
plt.title("Balance vs CreditScore",fontsize=15)
plt.xlabel("Balance")
plt.ylabel("CreditScore")
plt.show()

In [None]:
plt.style.use("default")
sns.barplot(x="EstimatedSalary", y="CreditScore",data=data[183:190])
plt.title("EstimatedSalary vs CreditScore",fontsize=15)
plt.xlabel("EstimatedSalary")
plt.ylabel("CreditScore")
plt.show()


In [None]:
plt.style.use("default")
sns.barplot(x="Tenure", y="Balance",data=data[170:190])
plt.title("Tenure vs Balance",fontsize=15)
plt.xlabel("Tenure")
plt.ylabel("Balance")
plt.show()

In [None]:
plt.style.use("default")
sns.barplot(x="HasCrCard", y="NumOfProducts",data=data[160:190])
plt.title("HasCrCard vs NumOfProducts",fontsize=15)
plt.xlabel("HasCrCard")
plt.ylabel("NumOfProducts")
plt.show()

In [None]:
#lets find the categorialfeatures
list_1=list(data.columns)


In [None]:
list_cate=[]
for i in list_1:
    if data[i].dtype=='object':
        list_cate.append(i)

In [None]:
from sklearn.preprocessing import LabelEncoder
le=LabelEncoder()


In [None]:
for i in list_cate:
    data[i]=le.fit_transform(data[i])


In [None]:
data

In [None]:
#drop the columns as it is no longer required
X = data.drop('Geography',axis=1)
y = data['Geography']



# TRAINING AND TESTING DATA

In [None]:
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

In [None]:
print(len(X_test))
print(len(X_train))
print(len(y_test))
print(len(y_train))

# Feature Scaling

In [None]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# ANN

In [None]:
# Importing the Keras libraries and packages
import keras
from keras.models import Sequential
from keras.layers import Dense

# Initialising the ANN
classifier = Sequential()

# Adding the input layer and the first hidden layer
classifier.add(Dense(activation="relu", units=6, kernel_initializer="uniform"))

# Adding the second hidden layer
classifier.add(Dense(activation="relu", input_dim=11, units=6, kernel_initializer="uniform"))

# Adding the output layer
classifier.add(Dense(activation="sigmoid", units=1, kernel_initializer="uniform"))

# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

# Fitting the ANN to the Training set
classifier.fit(X_train,y_train,batch_size = 10,
    epochs=200,
)


In [None]:
y_pred = classifier.predict(X_test)
y_pred

In [None]:
print((y_pred > 0.5))

In [None]:
from tensorflow.keras.utils import plot_model
plot_model(classifier, show_shapes = True)


In [None]:
from sklearn.preprocessing import LabelEncoder
le=LabelEncoder()


In [None]:
for i in list_cate:
    data[i]=le.fit_transform(data[i])


In [None]:
data

In [None]:
classifier.summary()

In [None]:
from sklearn.metrics import accuracy_score
from sklearn.metrics import mean_squared_error,mean_absolute_error
print("Mean Squared Error:\n",mean_squared_error(y_test,y_pred))
print("Mean Absolute Error:\n",mean_absolute_error(y_test,y_pred))







In [None]:
classifier.evaluate(X_test,y_test)

# **Thank You** 