# Bank Customers Churn Prediction

- We have to predict that Customers of the bank will leave or not based on the given dataset.

https://www.kaggle.com/adammaus/predicting-churn-for-bank-customers(for more detail click this link)

#### 1 importing the libraries

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

#### 1.1 Loading dataset

In [None]:
dataset = pd.read_csv('../input/bank-customers/Churn Modeling.csv')
dataset.head()

### 2. EDA

#### 2.1 Checking misssing values

In [None]:
dataset.isnull().sum()

Here no null values in this datatype, So we can do further process.

####  2.2 checking the datatype

In [None]:
dataset.info()

- Here from above information we can see that Dtype of surname , Geography & Gender  column are Object type.So we have to convert these columns into  numeric Datatype and apart from these, All column are Numeric type.

In [None]:
dataset.describe()

#### 2.3 Proportion of customer churned and retained

In [None]:
value_counts = pd.value_counts(dataset['Exited'])
plt.figure(figsize = (6,6))
value_counts.plot(kind = 'pie', explode = [0,0.1],autopct='%1.1f%%', shadow=True)
plt.title('Proportion of customer churned and retained')
plt.show()
value_counts

- Here 20.4 % customers churned(leave) from the bank and 79% customers retained.
- 0 --> not Exited
- 1 --> Exited


#### 2.4 location Distribution of Bank Customers

In [None]:
sns.countplot(dataset['Geography'])
plt.title('Geographical location Distribution of Bank Customers')
plt.show()

##### 2.5 Gender Distribution

In [None]:
sns.countplot(dataset['Gender'])
plt.title('Gender Distribution of Bank Customers')
plt.show()

 ####  2.6 review the 'Status' relation with categorical variables

In [None]:

fig, axarr = plt.subplots(2, 2, figsize=(20, 12))
sns.countplot(x='Geography', hue = 'Exited',data = dataset, ax=axarr[0][0])
sns.countplot(x='Gender', hue = 'Exited',data = dataset, ax=axarr[0][1])
sns.countplot(x='HasCrCard', hue = 'Exited',data = dataset, ax=axarr[1][0])
sns.countplot(x='IsActiveMember', hue = 'Exited',data = dataset, ax=axarr[1][1])

- Majority of the data is from persons from France.

- The proportion of female customers churning is also greater than that of male customers
- Interestingly, majority of the customers that churned are those with credit cards. Given that majority of the customers have credit cards could prove this to be just a coincidence.
-  Inactive members have a greater churn compare to other other graph.

####  2.7 pair plot

In [None]:
sns.pairplot(dataset, hue = 'Exited')

- Above the graph we can see that the relation b/w the features.

#### 2.8  finding the correlation b/w the dataset

In [None]:
plt.figure(figsize = (15,15))
sns.heatmap(dataset.corr(), annot = True, cmap = 'RdYlGn')

- Here RowNumber & CustomerID is highly negative correalted & For modeling  we will not consider these features. 

### 3. Data Preparing

#### 3.1 dependent and independent features

In [None]:
X = dataset.iloc[:,3:-1].values
y = dataset.iloc[:,-1].values

In [None]:
X

In [None]:
y

#### 3.2 Encoding categorical data

##### 3.2.1 Label Encoding the "Gender" column

In [None]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X[:, 2] = le.fit_transform(X[:, 2])

In [None]:
print(X)

 ##### 3.2.2 One Hot Encoding the "Geography" column

In [None]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1])], remainder='passthrough')
X = np.array(ct.fit_transform(X))

In [None]:
X

In [None]:
y

#### 3.3 Splitting the data into train and test set

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, random_state = 0)

#### 3.4  Feature scaling

In [None]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [None]:
print(X_train)

In [None]:
print(y_train)

###  Part 4  Modeling (Building the ANN)

In [None]:

import tensorflow as tf

####  4.1 intializing the ANN

In [None]:
ann = tf.keras.models.Sequential()

##### 4.2  Adding the input layer and the first hidden layer

In [None]:
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))

###### 4.3  Adding the second  Hidden Layer

In [None]:
ann.add(tf.keras.layers.Dense(units = 6, activation = 'relu'))

#### 4.4 adding the output layer

In [None]:
ann.add(tf.keras.layers.Dense(units = 1, activation = 'sigmoid'))

##### 4.5 compile the ANN

In [None]:
ann.compile(optimizer  ='adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

##### 4.6 Fitting ANN to the train set

In [None]:
model_history = ann.fit(X_train, y_train,validation_split=0.33,batch_size = 10, epochs = 50)

### 5. Visualizing the performace of ANN Model 

In [None]:
print(model_history.history.keys())

#### 5.1 visualizing the accuracy of model

In [None]:
plt.plot(model_history.history['accuracy'])
plt.plot(model_history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()

####  5.2 Visualizing the loss of the model

In [None]:
plt.plot(model_history.history['loss'])
plt.plot(model_history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper right')
plt.show()

#####  6.predict the test set

In [None]:

y_pred = ann.predict(X_test)
y_pred =(y_pred > 0.5)

##### 7.  Evaluate the performance

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
cm = confusion_matrix(y_test, y_pred)
print(cm)
print(accuracy_score(y_test, y_pred))

### 8. conclusion :
Here we build a model using Artificial Neural Network and we get approx 86% accuracy.This model can predict
customer will leave the bank or not based on the given dataset.