<center><font size=6> Bank Churn Prediction </font></center>

## Problem Statement

### Context

Businesses like banks which provide service have to worry about problem of 'Customer Churn' i.e. customers leaving and joining another service provider. It is important to understand which aspects of the service influence a customer's decision in this regard. Management can concentrate efforts on improvement of service, keeping in mind these priorities.

### Objective

You as a Data scientist with the  bank need to  build a neural network based classifier that can determine whether a customer will leave the bank  or not in the next 6 months.

### Data Dictionary

* CustomerId: Unique ID which is assigned to each customer

* Surname: Last name of the customer

* CreditScore: It defines the credit history of the customer.
  
* Geography: A customer’s location
   
* Gender: It defines the Gender of the customer
   
* Age: Age of the customer
    
* Tenure: Number of years for which the customer has been with the bank

* NumOfProducts: refers to the number of products that a customer has purchased through the bank.

* Balance: Account balance

* HasCrCard: It is a categorical variable which decides whether the customer has credit card or not.

* EstimatedSalary: Estimated salary

* isActiveMember: Is is a categorical variable which decides whether the customer is active member of the bank or not ( Active member in the sense, using bank products regularly, making transactions etc )

* Exited : whether or not the customer left the bank within six month. It can take two values
** 0=No ( Customer did not leave the bank )
** 1=Yes ( Customer left the bank )

## Importing necessary libraries

In [1]:
import os
os.environ['MPLBACKEND'] = 'TkAgg'  # or 'Agg'
import matplotlib


In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf

from sklearn.preprocessing import StandardScaler 
from sklearn.metrics import confusion_matrix 
from scipy import stats 
from scipy.stats import chi2_contingency
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential  # Use this one from TensorFlow
from tensorflow.keras.layers import Dense, Input  # Use these from TensorFlow
from tensorflow.keras.optimizers import SGD, Adam  # Optimizers from TensorFlow
from sklearn.metrics import f1_score
from tensorflow.keras.optimizers.schedules import ExponentialDecay
from keras_tuner import Hyperband
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from sklearn.model_selection import RandomizedSearchCV
from scikeras.wrappers import KerasClassifier  # Use this for KerasClassifier


In [3]:
import tensorflow as tf
print(tf.__version__)


2.18.0


In [4]:
from tensorflow import keras
print(keras.__version__)  # Should print the version of Keras included in TensorFlow


3.8.0


## Loading the dataset

In [5]:
df = pd.read_csv('C:\\Users\\Nobody\\Downloads\\school stuff\\Bank Churn Prediction\\bank-1.csv')

In [6]:
# making a duplicate copy of data
df_original = df.copy()

## Data Overview

In [7]:
"""
i need to conduct a observations, and sanity check.

check out he first few rows


"""
df.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


## Exploratory Data Analysis

### Univariate Analysis

In [8]:
# descriptive statistics 
univariate_desc_stats = df.describe()
print(univariate_desc_stats)

         RowNumber    CustomerId   CreditScore           Age        Tenure  \
count  10000.00000  1.000000e+04  10000.000000  10000.000000  10000.000000   
mean    5000.50000  1.569094e+07    650.528800     38.921800      5.012800   
std     2886.89568  7.193619e+04     96.653299     10.487806      2.892174   
min        1.00000  1.556570e+07    350.000000     18.000000      0.000000   
25%     2500.75000  1.562853e+07    584.000000     32.000000      3.000000   
50%     5000.50000  1.569074e+07    652.000000     37.000000      5.000000   
75%     7500.25000  1.575323e+07    718.000000     44.000000      7.000000   
max    10000.00000  1.581569e+07    850.000000     92.000000     10.000000   

             Balance  NumOfProducts    HasCrCard  IsActiveMember  \
count   10000.000000   10000.000000  10000.00000    10000.000000   
mean    76485.889288       1.530200      0.70550        0.515100   
std     62397.405202       0.581654      0.45584        0.499797   
min         0.000000     

In [10]:
"""
RowNumber- there are 10,000 rows, which means 10,000 cusotmer records 

CustomerID- identifier for unique customers which is 10,000 entries

CreditScore- the average credit score is 650.53.
the lowest credit score is 350.
the highest Credit Score is 850.
25% of customers have a credit score below 584.
75% of customers have a credit score above 718.

Age- the average age of the customer is about 39 years old.
the youngest customer age is 18 years old.
the oldest customer age is 92 years old . 
25% of customers are younger than 32 years old.
25% of cusomters are older than 44.

Tenure(The length of time an account is open)
the average number of years a customer has been with the bank ,
is around 5 years. 
min is 0, which indicates some have just joined the bank.
the maximum tenure is 10 years . 
25% of cusomter have in with the bank for 3 years or less.
75% of customer have been with the bank for 7 years or more 

Balance(account balance)
The average customer's account balance is $76,485
min is 0 , telling us that someone has a balance of $0 and ,
doesn't keep any money in the account or rely on credit products.
The highest account balance is #



""" 

"\nRowNumber- there are 10,000 rows, which means 10,000 cusotmer records \n\nCustomerID- identifier for unique customers which is 10,000 entries\n\nCreditScore- the average credit score is 650.53.\nthe lowest credit score is 350.\nthe highest Credit Score is 850.\n25% of customers have a credit score below 584.\n75% of customers have a credit score above 718.\n\nAge- the average age of the customer is about 39 years old.\nthe youngest customer age is 18 years old.\nthe oldest customer age is 92 years old . \n25% of customers are younger than 32 years old.\n25% of cusomters are older than 44.\n\nTenure(The length of time an account is open)\nthe average number of years a customer has been with the bank ,\nis around 5 years. \nmin is 0, which indicates some have just joined the bank.\nthe maximum tenure is 10 years . \n25% of cusomter have in with the bank for 3 years or less.\n75% of customer have been with the bank for 7 years or more \n\nBalance(account balance)\nThe average customer'

In [11]:
# frequencey tabe for categorical variables (example for 'Gender')
print(df['Gender'].value_counts())

Gender
Male      5457
Female    4543
Name: count, dtype: int64


In [12]:
# histogram for numerical variables (e.g., Age)

plt.figure(figsize=(10, 5))
sns.histplot(df['Age'].values, bins=30, kde=True)
plt.title('Distribution of Age')
plt.savefig('AgeDistribution.png')
plt.show()

In [13]:
"""
the distribution peaks between 30's and 40's. Telling us that , most of our ,
customers are that age. There is also a gradual decline in the number as the,
age increases beyond 50.

the distribution has a slight right skew, there are fewer older customers,
in the 60's , 70's, and older, but they do have afew outliers in the 80's and 
90's

"""

"\nthe distribution peaks between 30's and 40's. Telling us that , most of our ,\ncustomers are that age. There is also a gradual decline in the number as the,\nage increases beyond 50.\n\nthe distribution has a slight right skew, there are fewer older customers,\nin the 60's , 70's, and older, but they do have afew outliers in the 80's and \n90's\n\n"

In [14]:
# Boxplot for numerical variable (e.g., CreditScore)
plt.figure(figsize=(10,5))
sns.boxplot(x=df['CreditScore'])
plt.title('Boxplot of CreditScore')
plt.savefig('CreditScoreBoxPlot.png')
plt.show()

In [15]:
"""
the median credit score is approximately 650 to 680, its a even 50/50 split
50% of customers have higher and lower scores.

THe whisker extensions are from approximately 400 to 850, low outliers suggest
not that many customers whave poor credit scores. 
"""


'\nthe median credit score is approximately 650 to 680, its a even 50/50 split\n50% of customers have higher and lower scores.\n\nTHe whisker extensions are from approximately 400 to 850, low outliers suggest\nnot that many customers whave poor credit scores. \n'

In [16]:
# Bar plot for cateogrical variable(e.g., Geography)
plt.figure(figsize=(10,5))
sns.countplot(x='Geography', data=df)
plt.title('Count of Customers by Geography')

Text(0.5, 1.0, 'Count of Customers by Geography')

In [17]:
"""
based on the countplot most of our customers are in france
then spain and germany. 

almost 5000 customers in france

around 2500 in spain and Germany
"""

'\nbased on the countplot most of our customers are in france\nthen spain and germany. \n\nalmost 5000 customers in france\n\naround 2500 in spain and Germany\n'

In [18]:
# Density plot for continous variable (e.g., Balance)
plt.figure(figsize=(10,5))
sns.kdeplot(df['Balance'], fill=True)
plt.title('Density Plot of Balance')
plt.savefig('DensityPlotOFBalance.png')
plt.show()

In [19]:
"""
this densityy plot dipslys two peaks, for the Balance.

PEak 1-the first peak is around a balance is of $0 , suggesting that, 
those customers dont,have that much money, or the money goes out as soon ,
as the money is deposited.

Business Recommendations: the presense of a Large number of Customers with zero,
balance. Might require further investigation, susceptibale to higher risk of leaving
bank. This group   situation , is possible opportunity. if the bank , decides to ,
promote finiancial services or products . 

peak 2 - the second peak is around $100,000 TO $150,000 . IT suggest  that this group,
has a significant balance , that is customers with large deposits at a time. These ,
customers, might 


"""

'\nthis densityy plot dipslys two peaks, for the Balance.\n\nPEak 1-the first peak is around a balance is of $0 , suggesting that, \nthose customers dont,have that much money, or the money goes out as soon ,\nas the money is deposited.\n\nBusiness Recommendations: the presense of a Large number of Customers with zero,\nbalance. Might require further investigation, susceptibale to higher risk of leaving\nbank. This group   situation , is possible opportunity. if the bank , decides to ,\npromote finiancial services or products . \n\npeak 2 - the second peak is around $100,000 TO $150,000 . IT suggest  that this group,\nhas a significant balance , that is customers with large deposits at a time. These ,\ncustomers, might \n\n\n'

In [20]:
# skewness and kurtosis (for numeric variables)
print('Skewness of Age:', df['Age'].skew())
print('Kurtosis of Age:', df['Age'].kurt())

Skewness of Age: 1.0113202630234552
Kurtosis of Age: 1.3953470615086956


In [21]:
"""

The Skewness age of : 1.0113202630234552 means , its a slight positive skewed.
the "Age" data,  will have a longer tail on the right side. which means 
few customers with higher ages that are pulling the distribution in that direction

the Kurtosis Age of : 1.40, if value is less than 3, which tells us it will have a 
flatter peak compared to normal distribution.  it suggest that will be fewere extreme values.

"""

'\n\nThe Skewness age of : 1.0113202630234552 means , its a slight positive skewed.\nthe "Age" data,  will have a longer tail on the right side. which means \nfew customers with higher ages that are pulling the distribution in that direction\n\nthe Kurtosis Age of : 1.40, if value is less than 3, which tells us it will have a \nflatter peak compared to normal distribution.  it suggest that will be fewere extreme values.\n\n'

In [22]:
# percentilies
print('Percentiles for Age:')
print(df['Age'].quantile([0.25, 0.5, 0.75]))

Percentiles for Age:
0.25    32.0
0.50    37.0
0.75    44.0
Name: Age, dtype: float64


In [23]:
"""
25th Percentile (Q1): 32 years
This indicates that 25% of the customers are aged 32 or younger.
It gives us a sense of where the lower quarter of the age distribution lies.

50th Percentile (Median): 37 years
50% of customers,  are younger than 37 ,and 50% are older. 
This is a central tendency measure that divides the data into two equal parts.

75th Percentile (Q3): 44 years
75% of the customers are aged 44 or younger. This tells us where the upper quarter,
of the age distribution lies. 

"""

'\n25th Percentile (Q1): 32 years\nThis indicates that 25% of the customers are aged 32 or younger.\nIt gives us a sense of where the lower quarter of the age distribution lies.\n\n50th Percentile (Median): 37 years\n50% of customers,  are younger than 37 ,and 50% are older. \nThis is a central tendency measure that divides the data into two equal parts.\n\n75th Percentile (Q3): 44 years\n75% of the customers are aged 44 or younger. This tells us where the upper quarter,\nof the age distribution lies. \n\n'

In [24]:
# outlier detection using Z-score method (e.g., for 'Balance')

z_scores = np.abs(stats.zscore(df['Balance']))
outliers_z = df[(z_scores > 3)]
print("Outliers detected using Z-score (|Z| > 3):")
print(outliers_z[['Balance']])


Outliers detected using Z-score (|Z| > 3):
Empty DataFrame
Columns: [Balance]
Index: []


In [25]:
"""
used the Z-score method to identify potential outliers in "Balance" column. 

A Z-Score is  greater than 3 (or less than -3),
typically indicates that a data point is an outlier.

after using the Z-score , i report there are no outliers detected,
in "Balance" column
"""

'\nused the Z-score method to identify potential outliers in "Balance" column. \n\nA Z-Score is  greater than 3 (or less than -3),\ntypically indicates that a data point is an outlier.\n\nafter using the Z-score , i report there are no outliers detected,\nin "Balance" column\n'

In [26]:
# Z-score for 'Credit SCore'
z_scores_credit = np.abs(stats.zscore(df['CreditScore']))
outliers_credit = df[(z_scores_credit > 3)]
print("Outliers detected using Z-score (|Z| > 3) for 'CreditScore':")
print(outliers_credit[['CreditScore']])

Outliers detected using Z-score (|Z| > 3) for 'CreditScore':
      CreditScore
1405          359
1631          350
1838          350
1962          358
2473          351
8723          350
8762          350
9624          350


In [27]:
"""
(left column is the customer id, right column is the creditscore)

applied Z-score to identify the outliers in "CreditScore" columns.
Z-score greater than 3 (or less than -3) , will be flagged as outliers

the result is , outliers were detected in "CreditSCore" variable, as listed below

CreditScores: 359, 350, 350, 358, 351, 350, 350, 350 
(for customer IDs 1405, 1631, 1838, 1962, 2473, 8723, 8762, 9624)

 It is important to further investigate whether these outliers represent,
 valid customer data or potential errors.
"""

'\n(left column is the customer id, right column is the creditscore)\n\napplied Z-score to identify the outliers in "CreditScore" columns.\nZ-score greater than 3 (or less than -3) , will be flagged as outliers\n\nthe result is , outliers were detected in "CreditSCore" variable, as listed below\n\nCreditScores: 359, 350, 350, 358, 351, 350, 350, 350 \n(for customer IDs 1405, 1631, 1838, 1962, 2473, 8723, 8762, 9624)\n\n It is important to further investigate whether these outliers represent,\n valid customer data or potential errors.\n'

In [28]:
# Z-Score for 'Age'
z_scores_age = np.abs(stats.zscore(df['Age']))
outliers_age = df[(z_scores_age > 3)]
print("Outliers detected using Z-score (|Z| > 3)  for 'Age':")
print(outliers_age[['Age']])

Outliers detected using Z-score (|Z| > 3)  for 'Age':
      Age
85     75
158    73
230    72
252    79
310    80
...   ...
9646   71
9671   78
9736   78
9894   77
9936   77

[133 rows x 1 columns]


In [29]:
# Z-score for 'Tenure'
z_scores_tenure = np.abs(stats.zscore(df['Tenure']))
outliers_tenure = df[(z_scores_tenure > 3)]
print("Outliers detected using Z-score (|Z| > 3) for 'Tenure':")
print(outliers_tenure[['Tenure']])

Outliers detected using Z-score (|Z| > 3) for 'Tenure':
Empty DataFrame
Columns: [Tenure]
Index: []


In [30]:
# Z-score for "balance'
z_scores_balance = np.abs(stats.zscore(df['Balance']))
outliers_balance = df[(z_scores_balance > 3)]
print("Outliers detected using Z-score (|Z| > 3) for 'Balance':")
print(outliers_balance[['Balance']])

Outliers detected using Z-score (|Z| > 3) for 'Balance':
Empty DataFrame
Columns: [Balance]
Index: []


In [31]:
# Z-Score for 'NUmOfProducts"
z_scores_products = np.abs(stats.zscore(df['NumOfProducts']))
outliers_products = df[(z_scores_products > 3)]
print("Outliers detected using Z-score (|Z| > 3)  for 'NumOfProducts':")
print(outliers_products[['NumOfProducts']])

Outliers detected using Z-score (|Z| > 3)  for 'NumOfProducts':
      NumOfProducts
7                 4
70                4
1254              4
1469              4
1488              4
1701              4
1876              4
2124              4
2196              4
2285              4
2462              4
2499              4
2509              4
2541              4
2614              4
2617              4
2872              4
3152              4
3365              4
3841              4
4013              4
4014              4
4166              4
4260              4
4403              4
4511              4
4516              4
4606              4
4654              4
4748              4
4822              4
5010              4
5137              4
5235              4
5386              4
5700              4
5904              4
6150              4
6172              4
6279              4
6750              4
6875              4
7257              4
7457              4
7567              4
7698            

In [32]:
# check for outliers in 'HasCrCard' and 'IsActiveMember'
hascrcard_outliers = df[(df['HasCrCard'] != 0) & (df['HasCrCard'] != 1)]
isactivemember_outliers = df[(df['IsActiveMember'] != 0) & (df['IsActiveMember'] != 1)]

print("Outliers in 'HasCrCard':")
print(hascrcard_outliers)

print("Outliers in 'IsActiveMember':")
print(isactivemember_outliers)

Outliers in 'HasCrCard':
Empty DataFrame
Columns: [RowNumber, CustomerId, Surname, CreditScore, Geography, Gender, Age, Tenure, Balance, NumOfProducts, HasCrCard, IsActiveMember, EstimatedSalary, Exited]
Index: []
Outliers in 'IsActiveMember':
Empty DataFrame
Columns: [RowNumber, CustomerId, Surname, CreditScore, Geography, Gender, Age, Tenure, Balance, NumOfProducts, HasCrCard, IsActiveMember, EstimatedSalary, Exited]
Index: []


In [33]:
# Z-Score for "EstimatedSalary"
z_scores_salary = np.abs(stats.zscore(df['EstimatedSalary']))
outliers_salary = df[(z_scores_salary > 3)]

print("Outliers detected using Z-score (|Z| > 3) for 'EstimatedSalary':")
print(outliers_salary[['EstimatedSalary']])

Outliers detected using Z-score (|Z| > 3) for 'EstimatedSalary':
Empty DataFrame
Columns: [EstimatedSalary]
Index: []


### Bivariate Analysis

In [34]:
# Select only numeric columns
numeric_df = df.select_dtypes(include=['number'])

# Calculaate teh corretlation matrix for numeric columns only
corr_matrix = numeric_df.corr() 

# plot the corrleation matrix 
plt.figure(figsize=(10,8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt=' .2f', linewidths=0.5)
plt.title('Correlation Matrix')
plt.savefig('CorrelationMatrix.png')
plt.show()

In [35]:
"""
exited (a customer who has left the bank )
churn(when a customer stop using a bank's service or close account ) ,
a.k.a customer attrition. 

age vs exited - there is a fair postive correalation of 0.29 ,
a indication that when age increases, customers are more likely to exit(churn)

Balance vs Exited - the correlation between balance and exited is 0.12, a slight ,
, positive , association . Customers with igher balance might be slightly prone
to churn

IsActiveMember vs Exited - the correlation is -0.16, meaning that , more active a user,
is , the less likely to churn. Inactivity could be a significatant factor in ,
customer attrion.

NumOfProducts vs. Exited - the correlations is -05, which is  a faint negative. 
the data suggests that if a customer has more products he might be less likely to 
churn, the effects will me mininimum.



Age vs ISActiveMember: the corrleation is 0.09, a faint positive correlation. 
we can interpret , as Older customers may tend to be mroe active bank members

Key takeways:
Age, Balance, and IsActiveMember appears to have the strongest correlations, with 
Exited. These variables can help us pedicting churn.

CreditScore,NumOfProducts,andHasCrCard show a faint correlation with churn, 
they may not be as impactful in the model 


"""



"\nexited (a customer who has left the bank )\nchurn(when a customer stop using a bank's service or close account ) ,\na.k.a customer attrition. \n\nage vs exited - there is a fair postive correalation of 0.29 ,\na indication that when age increases, customers are more likely to exit(churn)\n\nBalance vs Exited - the correlation between balance and exited is 0.12, a slight ,\n, positive , association . Customers with igher balance might be slightly prone\nto churn\n\nIsActiveMember vs Exited - the correlation is -0.16, meaning that , more active a user,\nis , the less likely to churn. Inactivity could be a significatant factor in ,\ncustomer attrion.\n\nNumOfProducts vs. Exited - the correlations is -05, which is  a faint negative. \nthe data suggests that if a customer has more products he might be less likely to \nchurn, the effects will me mininimum.\n\n\n\nAge vs ISActiveMember: the corrleation is 0.09, a faint positive correlation. \nwe can interpret , as Older customers may tend 

In [36]:
# Numerical vs Numerical variables 
sns.scatterplot(x='Age', y='Balance', data=df, hue='Exited')
plt.savefig('NumericalScatterPlot.png')
plt.show()

In [37]:
"""
BLUE EXITED(0) = with the bank

ORANGE EXITED(1) = EXITED (CHURN)

Looks there are clustering of churned customers(oarnge), in the 40 to 60 range.
This implies that customers in this age group are more likely to churn compared ,
to younger or older customers.

IF you look (below 30), it appears younger customers are lower likely to exist.
(Its mostly Blue), while middle age customers , specificially between 45 and 60, 
it appears more prone to exist(churn )

High balance customers ($100,000) , appear to have a more big portion of orange dots,
indicating a higher likelihood of churn for rich invidiuals, in turn it supports the weak ,
positive correlation that higher-balance customers might be more likely to churn . 

conclusion- age and balance are important factors when predicting customer churn. 
Middle-aged customers, especially those with higher balances, should be monitored,
closely as they may be at a higher risk of leaving.

"""

'\nBLUE EXITED(0) = with the bank\n\nORANGE EXITED(1) = EXITED (CHURN)\n\nLooks there are clustering of churned customers(oarnge), in the 40 to 60 range.\nThis implies that customers in this age group are more likely to churn compared ,\nto younger or older customers.\n\nIF you look (below 30), it appears younger customers are lower likely to exist.\n(Its mostly Blue), while middle age customers , specificially between 45 and 60, \nit appears more prone to exist(churn )\n\nHigh balance customers ($100,000) , appear to have a more big portion of orange dots,\nindicating a higher likelihood of churn for rich invidiuals, in turn it supports the weak ,\npositive correlation that higher-balance customers might be more likely to churn . \n\nconclusion- age and balance are important factors when predicting customer churn. \nMiddle-aged customers, especially those with higher balances, should be monitored,\nclosely as they may be at a higher risk of leaving.\n\n'

In [38]:
# pair plot
sns.pairplot(numeric_df)
plt.savefig('PairPlot.png')
plt.show()

In [39]:
"""
if we look at features like age and balance, it shows distinct distributions.
age has a slightly right-skewed distribution, which tells us customers are younger,
with fewer older customers.

balance distribution feature shows us it has another two peak(bi-modal) distrubition
like the density plot earlier, Many customers have either very low balances (near $0),
or balances around $100,000 to $150,000.

CreditScore shows a nearly normal distribution centered around 600 to 700.
"""

'\nif we look at features like age and balance, it shows distinct distributions.\nage has a slightly right-skewed distribution, which tells us customers are younger,\nwith fewer older customers.\n\nbalance distribution feature shows us it has another two peak(bi-modal) distrubition\nlike the density plot earlier, Many customers have either very low balances (near $0),\nor balances around $100,000 to $150,000.\n\nCreditScore shows a nearly normal distribution centered around 600 to 700.\n'

In [40]:
pd.crosstab(df['Gender'], df['Exited'])


Exited,0,1
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,3404,1139
Male,4559,898


In [41]:
"""
observation: 
female customers are more likely to churn, than  male customers.
This could suggest that female customers might be experiencing something ,
that makes them less satisfied or less loyal to the bank compared to male customers.
The total number of male customers is higher than female customers in this dataset,
but despite that, the absolute number of female churners (1139)
is still higher than male churners (898).

actionable insight:
we as the bank , might investigate reasons why female customers ,
are leaving at a higher rate. It could be helpful to examine, and find out
if its a specific  product, service, or experience that lead to there dissatisfaction.
"""

'\nobservation: \nfemale customers are more likely to churn, than  male customers.\nThis could suggest that female customers might be experiencing something ,\nthat makes them less satisfied or less loyal to the bank compared to male customers.\nThe total number of male customers is higher than female customers in this dataset,\nbut despite that, the absolute number of female churners (1139)\nis still higher than male churners (898).\n\nactionable insight:\nwe as the bank , might investigate reasons why female customers ,\nare leaving at a higher rate. It could be helpful to examine, and find out\nif its a specific  product, service, or experience that lead to there dissatisfaction.\n'

In [42]:
# stacked bar plot
pd.crosstab(df['Gender'], df['Exited']).plot(kind='bar', stacked=True)
plt.title('Gender vs Exited')
plt.show()

In [43]:
"""
this stacked barplot is a visual representation of gender vs Exited graph above. 
""" 

'\nthis stacked barplot is a visual representation of gender vs Exited graph above. \n'

In [44]:
# using chi-square to test if the variables are indepedant 
crosstab = pd.crosstab(df['Gender'], df['Exited'])
chi2, p, dof, expected = chi2_contingency(crosstab)
print(f"Chi-squared: {chi2}, p-value: {p}")

Chi-squared: 112.91857062096116, p-value: 2.2482100097131755e-26


In [45]:
"""
the Chi-squared value is a measurment of how much the observed data deviates,
from the what we would expect if there was no relationship between the variable.

which means a high Chi-squared value(112.91857062096116) indicates that the higher value,
the greater the difference between observed and expected values.  There is  strong association,
between gender and churn in the data set.

This result supports the earlier findings from the cross-tabulation that gender,
has a significant impact on churn. 

 the relationship between gender and churn is statistically significant, 
 which means that gender plays a role in whether a customer decides to leave the bank.


actionable insight: 
Given the significance of gender in churn prediction, we as the bank may consider,
targeted strategies to reduce churn among female customers, as they are more likely,
to leave based on previous findings.
"""

'\nthe Chi-squared value is a measurment of how much the observed data deviates,\nfrom the what we would expect if there was no relationship between the variable.\n\nwhich means a high Chi-squared value(112.91857062096116) indicates that the higher value,\nthe greater the difference between observed and expected values.  There is  strong association,\nbetween gender and churn in the data set.\n\nThis result supports the earlier findings from the cross-tabulation that gender,\nhas a significant impact on churn. \n\n the relationship between gender and churn is statistically significant, \n which means that gender plays a role in whether a customer decides to leave the bank.\n\n\nactionable insight: \nGiven the significance of gender in churn prediction, we as the bank may consider,\ntargeted strategies to reduce churn among female customers, as they are more likely,\nto leave based on previous findings.\n'

In [46]:
# boxplot exited vs Balance
sns.boxplot(x='Exited', y='Balance', data=df)
plt.savefig('BoxPlotExitvsBalance.png')
plt.show()


In [47]:
"""

The blue boxes show us customer balances - 
basically comparing the money in accounts of people who stayed with us ,
versus those who left.What's interesting is that customers who left us,
(that's the "1" on the bottom) actually had slightly more money in their accounts ,
about $110,000 typically, compared to $90,000 for customers who stayed.

This tells me we should focus less on balance-related incentives ,
and more on other factors that might be causing these higher-balance customers to leave.
"""

'\n\nThe blue boxes show us customer balances - \nbasically comparing the money in accounts of people who stayed with us ,\nversus those who left.What\'s interesting is that customers who left us,\n(that\'s the "1" on the bottom) actually had slightly more money in their accounts ,\nabout $110,000 typically, compared to $90,000 for customers who stayed.\n\nThis tells me we should focus less on balance-related incentives ,\nand more on other factors that might be causing these higher-balance customers to leave.\n'

In [48]:
# barplot Exited Vs Age
sns.barplot(x='Exited', y='Age', data=df)
plt.savefig('ExitedVsAgeBarPlot.png')
plt.show()

In [49]:
"""
The graph shows us the average age of our customers who stayed versus those who left.
What jumps out is that customers who left us, (the "1" column) are typically older,
around 44 years old on average .Compared to customers who stayed with us, who average ,
about 38 years old.This is really valuable information because it tells us we,
might have an issue retaining our older customers. 
Business Insight : 
Maybe our services or products aren't meeting their specific needs as well as they ,
could be.We should consider looking at what features or services our older customers,
value most and whether we're delivering on those expectations. Perhaps we need to,
enhance our customer experience for this demographic.

"""

'\nThe graph shows us the average age of our customers who stayed versus those who left.\nWhat jumps out is that customers who left us, (the "1" column) are typically older,\naround 44 years old on average .Compared to customers who stayed with us, who average ,\nabout 38 years old.This is really valuable information because it tells us we,\nmight have an issue retaining our older customers. \nBusiness Insight : \nMaybe our services or products aren\'t meeting their specific needs as well as they ,\ncould be.We should consider looking at what features or services our older customers,\nvalue most and whether we\'re delivering on those expectations. Perhaps we need to,\nenhance our customer experience for this demographic.\n\n'

In [50]:
# countplot geography vs exited
sns.countplot(x='Geography', hue='Exited',data=df)
plt.savefig('CountPlotGeorgraphyVSExit.png')
plt.show()

In [51]:
"""

graph shows us where our customers are located and whether they're staying with us,
or leaving.The blue bars show customers who've stayed, and the orange bars show those,
who've left. Here's what stands out:

France is our biggest market with over 4,000 customers who've stayed, 
but we're also seeing about 800 customers leaving.In Spain, we have around 2,000 loyal,
customers, and they seem to be our most satisfied group with the lowest number of departures - only about 400 customers leaving.
Germany has the smallest number of staying customers at about 1,700, but nearly 800,customers have left - which is almost the same number as France despite having a much smaller customer base.

"""

"\n\ngraph shows us where our customers are located and whether they're staying with us,\nor leaving.The blue bars show customers who've stayed, and the orange bars show those,\nwho've left. Here's what stands out:\n\nFrance is our biggest market with over 4,000 customers who've stayed, \nbut we're also seeing about 800 customers leaving.In Spain, we have around 2,000 loyal,\ncustomers, and they seem to be our most satisfied group with the lowest number of departures - only about 400 customers leaving.\nGermany has the smallest number of staying customers at about 1,700, but nearly 800,customers have left - which is almost the same number as France despite having a much smaller customer base.\n\n"

In [52]:
crosstab = pd.crosstab(df['Geography'], df['Exited'])
sns.heatmap(crosstab, annot=True, cmap='Blues')
plt.savefig('GeographyHeatMap.png')
plt.show()

In [53]:
"""
Looking at this heatmap:

France has 4,200 staying customers (0) and 810 who left (1) - our largest market with moderate churn
Germany shows 1,700 staying customers and 810 who left - concerning high proportional loss
Spain has 2,100 staying customers and only 410 who left - our healthiest retention rate

Key insight: Germany is losing customers at nearly twice the rate of France and Spain (32% vs 16% and 16%). This confirms our Germany market needs immediate attention, while Spain demonstrates our best retention practices.
"""

'\nLooking at this heatmap:\n\nFrance has 4,200 staying customers (0) and 810 who left (1) - our largest market with moderate churn\nGermany shows 1,700 staying customers and 810 who left - concerning high proportional loss\nSpain has 2,100 staying customers and only 410 who left - our healthiest retention rate\n\nKey insight: Germany is losing customers at nearly twice the rate of France and Spain (32% vs 16% and 16%). This confirms our Germany market needs immediate attention, while Spain demonstrates our best retention practices.\n'

In [54]:
# temporal/time-based analysis
sns.boxplot(x='Exited', y='Tenure', data=df)
plt.title('Tenure vs Exited')
plt.show()

In [55]:
print(df.columns)


Index(['RowNumber', 'CustomerId', 'Surname', 'CreditScore', 'Geography',
       'Gender', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard',
       'IsActiveMember', 'EstimatedSalary', 'Exited'],
      dtype='object')


## Data Preprocessing

In [56]:
# im defining a function to detect and treat outliers using IQR
def treat_outliers_iqr(df, columns):
    for col in columns:
        # Calculate Q1 (25%) and Q3(75%)
        Q1 = df[col].quantile(0.25)
        Q3 = df[col].quantile(0.75)
        IQR = Q3-Q1

        # im define bouns for Outliers
        lower_bound = Q1 - 1.5 * IQR
        upper_bound = Q3 + 1.5 * IQR

        #replacing outlieres with bounds
        df[col] = df[col].clip(lower=lower_bound, upper=upper_bound)
    
    return df

In [57]:
# listing of numerical columns to check for outliers 
numerical_features = ['CreditScore', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'EstimatedSalary']

In [58]:
# spllitting target (Exited) from predictors
X = df.drop(columns=['Exited', 'Surname']) # Predictors (features)
y = df['Exited'] #Target Variable 

### Dummy Variable Creation

In [59]:
# one- hot encoding categorical variables 'geography' and 'Gender'
X_encoded = pd.get_dummies(X, columns=['Geography', 'Gender'], drop_first=True)

### Train-validation-test Split

In [60]:
# First, split into train and test (70% train, 30% test)
X_train_full, X_test, y_train_full, y_test = train_test_split(X_encoded, y, test_size=0.3, random_state=42)

# Then, split the training data into train (70%) and validation (30%) from the original train set
X_train, X_val, y_train, y_val = train_test_split(X_train_full, y_train_full, test_size=0.3, random_state=42)


### Data Normalization

In [61]:
# bdfore treatment (Display mean and std)
print("Before Outlier Treatment")
print(X_train[numerical_features].describe())



Before Outlier Treatment
       CreditScore          Age       Tenure        Balance  NumOfProducts  \
count  4900.000000  4900.000000  4900.000000    4900.000000    4900.000000   
mean    652.018163    38.870612     5.007755   76163.334218       1.532653   
std      96.418548    10.454969     2.883168   62581.580333       0.587301   
min     350.000000    18.000000     0.000000       0.000000       1.000000   
25%     585.000000    32.000000     3.000000       0.000000       1.000000   
50%     654.000000    37.000000     5.000000   95539.735000       1.000000   
75%     719.000000    44.000000     7.000000  127900.490000       2.000000   
max     850.000000    92.000000    10.000000  250898.090000       4.000000   

       EstimatedSalary  
count      4900.000000  
mean     101179.728978  
std       57482.040816  
min          90.070000  
25%       51978.417500  
50%      101473.150000  
75%      150684.552500  
max      199970.740000  


In [62]:
# checking for missing values in the training and validation sets 
print(X_train.isnull().sum())
print(X_val.isnull().sum())

RowNumber            0
CustomerId           0
CreditScore          0
Age                  0
Tenure               0
Balance              0
NumOfProducts        0
HasCrCard            0
IsActiveMember       0
EstimatedSalary      0
Geography_Germany    0
Geography_Spain      0
Gender_Male          0
dtype: int64
RowNumber            0
CustomerId           0
CreditScore          0
Age                  0
Tenure               0
Balance              0
NumOfProducts        0
HasCrCard            0
IsActiveMember       0
EstimatedSalary      0
Geography_Germany    0
Geography_Spain      0
Gender_Male          0
dtype: int64


In [63]:
# Check the columns in X_train, X_val, and X_test
print(X_train.columns)
print(X_val.columns)
print(X_test.columns)


Index(['RowNumber', 'CustomerId', 'CreditScore', 'Age', 'Tenure', 'Balance',
       'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary',
       'Geography_Germany', 'Geography_Spain', 'Gender_Male'],
      dtype='object')
Index(['RowNumber', 'CustomerId', 'CreditScore', 'Age', 'Tenure', 'Balance',
       'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary',
       'Geography_Germany', 'Geography_Spain', 'Gender_Male'],
      dtype='object')
Index(['RowNumber', 'CustomerId', 'CreditScore', 'Age', 'Tenure', 'Balance',
       'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary',
       'Geography_Germany', 'Geography_Spain', 'Gender_Male'],
      dtype='object')


In [64]:
# Apply outlier treatment again
X_train_treated = treat_outliers_iqr(X_train.copy(), numerical_features)
X_val_treated = treat_outliers_iqr(X_val.copy(), numerical_features)
X_test_treated = treat_outliers_iqr(X_test.copy(), numerical_features)

# Now you can display the mean and std
print("\nAfter Outlier Treatment")
print(X_train_treated[numerical_features].describe())



After Outlier Treatment
       CreditScore          Age       Tenure        Balance  NumOfProducts  \
count  4900.000000  4900.000000  4900.000000    4900.000000    4900.000000   
mean    652.050204    38.616939     5.007755   76163.334218       1.529388   
std      96.324914     9.734252     2.883168   62581.580333       0.574829   
min     384.000000    18.000000     0.000000       0.000000       1.000000   
25%     585.000000    32.000000     3.000000       0.000000       1.000000   
50%     654.000000    37.000000     5.000000   95539.735000       1.000000   
75%     719.000000    44.000000     7.000000  127900.490000       2.000000   
max     850.000000    62.000000    10.000000  250898.090000       3.500000   

       EstimatedSalary  
count      4900.000000  
mean     101179.728978  
std       57482.040816  
min          90.070000  
25%       51978.417500  
50%      101473.150000  
75%      150684.552500  
max      199970.740000  


In [65]:
# Drop 'RowNumber' and 'CustomerId' from all datasets before scaling
X_train = X_train.drop(columns=['RowNumber', 'CustomerId'], errors='ignore')
X_val = X_val.drop(columns=['RowNumber', 'CustomerId'], errors='ignore')
X_test = X_test.drop(columns=['RowNumber', 'CustomerId'], errors='ignore')

In [66]:
# Initializiing the Scaler
scaler = StandardScaler()

# Scaling training and validatiion data
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)

# Scaling the test data (ensure consitency)
X_test_scaled = scaler.transform(X_test)

## Model Building

### Model Evaluation Criterion

we are going to chose, F1-score for our evaluation because we have class imbalances,
prevent False Positives (its costly and burns time ), and will ive us a better score

When we opitimize the F1-score, we will have built a model that effectively , indentifies at-risk customers, enabling the bank to take proactive measures to retain them while minimizing  unnecessary intervention . 

-


In [67]:
# define the model architecture usintg tf.keras.Sequential

model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(X_train_scaled.shape[1],)), # use Input as the first layer
    tf.keras.layers.Dense(64, activation='relu'), # First Hidden Layer
    tf.keras.layers.Dense(32, activation='relu'), # Second Hidden layer
    tf.keras.layers.Dense(1, activation='sigmoid') # Output Layer 
])

### Neural Network with SGD Optimizer

In [68]:
# define the learning rate schedule 
initial_learning_rate = 0.01
lr_schedule = ExponentialDecay(
    initial_learning_rate,
    decay_steps=10000,
    decay_rate=0.9,
    staircase=True
)

In [69]:
# Define the optimizer with the learning rate schedule 
sgd_optimizer = SGD(learning_rate=lr_schedule)

In [70]:
# compile the model with SGD optimizer 
model.compile(optimizer=sgd_optimizer,
             loss='binary_crossentropy',
             metrics=['accuracy'])


In [71]:
print(type(X_train_scaled))  # To check the type of the object

# To check the types of the original features before scaling
print(X_train.dtypes)


<class 'numpy.ndarray'>
CreditScore            int64
Age                    int64
Tenure                 int64
Balance              float64
NumOfProducts          int64
HasCrCard              int64
IsActiveMember         int64
EstimatedSalary      float64
Geography_Germany       bool
Geography_Spain         bool
Gender_Male             bool
dtype: object


In [72]:
print(X_train_scaled.shape)
print(X_val_scaled.shape)
print(X_test_scaled.shape)


(4900, 11)
(2100, 11)
(3000, 11)


In [73]:
# Make sure the test data has the same columns as the training and validation data
X_test_scaled = X_test_scaled[:, :11]


In [74]:
print(X_train_scaled.shape)  # (4900, 11)
print(X_val_scaled.shape)    # (2100, 11)
print(X_test_scaled.shape)   # (3000, 11)


(4900, 11)
(2100, 11)
(3000, 11)


In [75]:
# fit the model to the training data
history = model.fit(X_train_scaled ,y_train,epochs=50, batch_size=32, validation_data=(X_val_scaled, y_val))

Epoch 1/50
[1m154/154[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.6838 - loss: 0.6114 - val_accuracy: 0.7924 - val_loss: 0.4864
Epoch 2/50
[1m154/154[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.7932 - loss: 0.4835 - val_accuracy: 0.7933 - val_loss: 0.4623
Epoch 3/50
[1m154/154[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.8071 - loss: 0.4424 - val_accuracy: 0.8000 - val_loss: 0.4476
Epoch 4/50
[1m154/154[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.8042 - loss: 0.4423 - val_accuracy: 0.8086 - val_loss: 0.4375
Epoch 5/50
[1m154/154[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.8032 - loss: 0.4448 - val_accuracy: 0.8152 - val_loss: 0.4301
Epoch 6/50
[1m154/154[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.8168 - loss: 0.4187 - val_accuracy: 0.8229 - val_loss: 0.4246
Epoch 7/50
[1m154/154[0m 

In [76]:

y_pred = model.predict(X_test_scaled)

# Convert probabilities to binary outcomes (since F1 score is for classification tasks)
y_pred_bin = (y_pred > 0.5).astype(int)

# Calculate F1 score
f1 = f1_score(y_test, y_pred_bin)
print(f"F1 Score: {f1}")


[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 785us/step
F1 Score: 0.5761658031088083


In [77]:
# Confusion matrix
cm = confusion_matrix(y_test, y_pred_bin)
print("Confusion Matrix:")
print(cm)


Confusion Matrix:
[[2313  103]
 [ 306  278]]


## Model Performance Improvement

In [78]:
# DEFINE a leearning rate reduction callback
lr_scheduler = ReduceLROnPlateau(monitor='val_loss',factor=0.5, patience=5, min_lr=1e-7)

In [79]:
# Adjust batch size and epochs
history = model.fit(X_train_scaled, y_train, epochs=100, batch_size=64,
                    validation_data=(X_val_scaled, y_val))

Epoch 1/100
[1m77/77[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8764 - loss: 0.3120 - val_accuracy: 0.8505 - val_loss: 0.3602
Epoch 2/100
[1m77/77[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8677 - loss: 0.3222 - val_accuracy: 0.8481 - val_loss: 0.3603
Epoch 3/100
[1m77/77[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8723 - loss: 0.3264 - val_accuracy: 0.8448 - val_loss: 0.3605
Epoch 4/100
[1m77/77[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8694 - loss: 0.3233 - val_accuracy: 0.8481 - val_loss: 0.3601
Epoch 5/100
[1m77/77[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8590 - loss: 0.3320 - val_accuracy: 0.8500 - val_loss: 0.3599
Epoch 6/100
[1m77/77[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8760 - loss: 0.3177 - val_accuracy: 0.8500 - val_loss: 0.3601
Epoch 7/100
[1m77/77[0m [32m━━━

In [80]:

y_pred = model.predict(X_test_scaled)

# Convert probabilities to binary outcomes (since F1 score is for classification tasks)
y_pred_bin = (y_pred > 0.5).astype(int)

# Calculate F1 score
f1 = f1_score(y_test, y_pred_bin)
print(f"F1 Score: {f1}")

[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 581us/step
F1 Score: 0.5754527162977867


In [81]:
# Confusion matrix
cm = confusion_matrix(y_test, y_pred_bin)
print("Confusion Matrix:")
print(cm)

Confusion Matrix:
[[2292  124]
 [ 298  286]]


### Neural Network with Adam Optimizer

In [106]:
# Define the model with Adam Optimizer 


model_adam = Sequential([
    Input(shape=(X_train.shape[1],)),  # Define the input shape here
    Dense(128, activation='relu'),
    Dense(64, activation='relu'),
    Dense(32, activation='relu'),  # Adding another layer with fewer neurons
    Dense(1, activation='sigmoid')
])


In [107]:
# Compile the model with Adam optimizer
model_adam.compile(optimizer=Adam(), loss='binary_crossentropy', metrics=['accuracy'])

In [108]:
# Add learning rate reduction
lr_scheduler = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, min_lr=1e-6, verbose=1)

In [109]:
print(X_train.shape)  # Should be (n_samples, n_features)
print(X_val.shape)    # Should be (n_samples, n_features)


(4900, 11)
(2100, 11)


In [110]:
print(np.any(np.isnan(X_train)))  # Should return False
print(np.any(np.isnan(X_val)))    # Should return False


False
False


In [111]:
X_train = np.array(X_train, dtype=np.float32)
y_train = np.array(y_train, dtype=np.int32)
X_val = np.array(X_val, dtype=np.float32)
y_val = np.array(y_val, dtype=np.int32)


In [128]:
# Fit the model with ReduceLROnPlateau for learning rate adjustment
history_adam = model_adam.fit(X_train, y_train, epochs=50, batch_size=64,
                              validation_data=(X_val, y_val),
                              callbacks=[lr_scheduler])

Epoch 1/50
[1m77/77[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7334 - loss: 2.2342 - val_accuracy: 0.7881 - val_loss: 3.6217 - learning_rate: 6.2500e-05
Epoch 2/50
[1m77/77[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7075 - loss: 3.4045 - val_accuracy: 0.6990 - val_loss: 1.5854 - learning_rate: 6.2500e-05
Epoch 3/50
[1m77/77[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7017 - loss: 2.3020 - val_accuracy: 0.7195 - val_loss: 1.7296 - learning_rate: 6.2500e-05
Epoch 4/50
[1m77/77[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7127 - loss: 2.0076 - val_accuracy: 0.6543 - val_loss: 3.8528 - learning_rate: 6.2500e-05
Epoch 5/50
[1m77/77[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7140 - loss: 2.2517 - val_accuracy: 0.4614 - val_loss: 2.6722 - learning_rate: 6.2500e-05
Epoch 6/50
[1m77/77[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m

### Neural Network with Adam Optimizer and Dropout

In [130]:
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import Adam

# Define the model
def create_model():
    model = Sequential()
    
    # Input layer and first hidden layer with Dropout
    model.add(Dense(64, activation='relu', input_dim=X_train.shape[1]))
    model.add(Dropout(0.2))  # Dropout with 20% rate
    
    # Second hidden layer with Dropout
    model.add(Dense(64, activation='relu'))
    model.add(Dropout(0.3))  # Dropout with 30% rate
    
    # Output layer (for binary classification)
    model.add(Dense(1, activation='sigmoid'))
    
    # Compile the model using Adam optimizer
    model.compile(optimizer=Adam(learning_rate=0.001), loss='binary_crossentropy', metrics=['accuracy'])
    
    return model

# Create the model
model = create_model()

# Train the model
history = model.fit(X_train, y_train, epochs=20, batch_size=32, validation_data=(X_test, y_test))


Epoch 1/20


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m154/154[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.6699 - loss: 3020.4026 - val_accuracy: 0.7440 - val_loss: 108.8361
Epoch 2/20
[1m154/154[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.6792 - loss: 680.1493 - val_accuracy: 0.6547 - val_loss: 30.0094
Epoch 3/20
[1m154/154[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.6704 - loss: 206.6677 - val_accuracy: 0.8053 - val_loss: 1.4903
Epoch 4/20
[1m154/154[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.6848 - loss: 55.7342 - val_accuracy: 0.6197 - val_loss: 0.7979
Epoch 5/20
[1m154/154[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7089 - loss: 27.3855 - val_accuracy: 0.8053 - val_loss: 0.6183
Epoch 6/20
[1m154/154[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7157 - loss: 10.2121 - val_accuracy: 0.8053 - val_loss: 0.5980
Epoch 7/20
[1m154/154[0m

### Neural Network with Balanced Data (by applying SMOTE) and SGD Optimizer

In [132]:
from imblearn.over_sampling import SMOTE
from keras.optimizers import SGD

# Step 1: Apply SMOTE to balance the data
smote = SMOTE(random_state=42)
X_train_balanced, y_train_balanced = smote.fit_resample(X_train, y_train)

# Step 2: Define the model with SGD optimizer
def create_sgd_model():
    model = Sequential()
    
    # Input layer and first hidden layer with Dropout
    model.add(Dense(64, activation='relu', input_dim=X_train_balanced.shape[1]))
    model.add(Dropout(0.2))  # Dropout with 20% rate
    
    # Second hidden layer with Dropout
    model.add(Dense(64, activation='relu'))
    model.add(Dropout(0.3))  # Dropout with 30% rate
    
    # Output layer (for binary classification)
    model.add(Dense(1, activation='sigmoid'))
    
    # Compile the model using SGD optimizer
    sgd_optimizer = SGD(learning_rate=0.01)
    model.compile(optimizer=sgd_optimizer, loss='binary_crossentropy', metrics=['accuracy'])
    
    return model

# Create the model
sgd_model = create_sgd_model()

# Step 3: Train the model on the balanced dataset using SGD optimizer
history_sgd = sgd_model.fit(X_train_balanced, y_train_balanced, epochs=20, batch_size=32, validation_data=(X_test, y_test))

# Evaluate the model
loss, accuracy = sgd_model.evaluate(X_test, y_test)
print(f'Test Loss: {loss:.4f}')
print(f'Test Accuracy: {accuracy:.4f}')


Epoch 1/20


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m243/243[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.5033 - loss: nan - val_accuracy: 0.8053 - val_loss: nan
Epoch 2/20
[1m243/243[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.4894 - loss: nan - val_accuracy: 0.8053 - val_loss: nan
Epoch 3/20
[1m243/243[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.4944 - loss: nan - val_accuracy: 0.8053 - val_loss: nan
Epoch 4/20
[1m243/243[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.5008 - loss: nan - val_accuracy: 0.8053 - val_loss: nan
Epoch 5/20
[1m243/243[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.5037 - loss: nan - val_accuracy: 0.8053 - val_loss: nan
Epoch 6/20
[1m243/243[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.4984 - loss: nan - val_accuracy: 0.8053 - val_loss: nan
Epoch 7/20
[1m243/243[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0

### Neural Network with Balanced Data (by applying SMOTE) and Adam Optimizer

In [133]:
from keras.optimizers import Adam

# Define the model with Adam optimizer
def create_adam_model():
    model = Sequential()
    
    # Input layer and first hidden layer with Dropout
    model.add(Dense(64, activation='relu', input_dim=X_train_balanced.shape[1]))
    model.add(Dropout(0.2))  # Dropout with 20% rate
    
    # Second hidden layer with Dropout
    model.add(Dense(64, activation='relu'))
    model.add(Dropout(0.3))  # Dropout with 30% rate
    
    # Output layer (for binary classification)
    model.add(Dense(1, activation='sigmoid'))
    
    # Compile the model using Adam optimizer
    adam_optimizer = Adam(learning_rate=0.001)  # You can try different learning rates
    model.compile(optimizer=adam_optimizer, loss='binary_crossentropy', metrics=['accuracy'])
    
    return model

# Create the model
adam_model = create_adam_model()

# Step 3: Train the model on the balanced dataset using Adam optimizer
history_adam = adam_model.fit(X_train_balanced, y_train_balanced, epochs=20, batch_size=32, validation_data=(X_test, y_test))

# Evaluate the model
loss, accuracy = adam_model.evaluate(X_test, y_test)
print(f'Test Loss: {loss:.4f}')
print(f'Test Accuracy: {accuracy:.4f}')


Epoch 1/20


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m243/243[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.5146 - loss: 2581.8430 - val_accuracy: 0.7607 - val_loss: 5.8048
Epoch 2/20
[1m243/243[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.5214 - loss: 88.5424 - val_accuracy: 0.4817 - val_loss: 1.1175
Epoch 3/20
[1m243/243[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.4985 - loss: 15.3281 - val_accuracy: 0.6537 - val_loss: 0.6359
Epoch 4/20
[1m243/243[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.5149 - loss: 6.2231 - val_accuracy: 0.7303 - val_loss: 0.7318
Epoch 5/20
[1m243/243[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.4986 - loss: 3.8658 - val_accuracy: 0.8053 - val_loss: 0.6858
Epoch 6/20
[1m243/243[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.4899 - loss: 3.6277 - val_accuracy: 0.8053 - val_loss: 0.6909
Epoch 7/20
[1m243/243[0m [32m━━

### Neural Network with Balanced Data (by applying SMOTE), Adam Optimizer, and Dropout

In [134]:
# Import necessary libraries
from keras.models import Sequential
from keras.layers import Dense, Dropout, Input
from keras.optimizers import Adam
from keras.callbacks import LearningRateScheduler
from keras.regularizers import l2
from imblearn.over_sampling import SMOTE

# Step 1: Apply SMOTE to balance the data
smote = SMOTE(random_state=42)
X_train_balanced, y_train_balanced = smote.fit_resample(X_train, y_train)

# Step 2: Learning rate scheduler function
def lr_schedule(epoch):
    if epoch < 10:
        return 0.001
    else:
        return 0.0001

# Step 3: Define the model with Adam optimizer, Dropout, and L2 regularization
def create_regularized_adam_model():
    model = Sequential()
    
    # Input layer (use Input shape to avoid the warning)
    model.add(Input(shape=(X_train_balanced.shape[1],)))
    
    # First hidden layer with L2 regularization and Dropout
    model.add(Dense(128, activation='relu', kernel_regularizer=l2(0.01)))
    model.add(Dropout(0.3))  # Dropout with 30% rate
    
    # Second hidden layer with L2 regularization and Dropout
    model.add(Dense(128, activation='relu', kernel_regularizer=l2(0.01)))
    model.add(Dropout(0.4))  # Dropout with 40% rate
    
    # Output layer (for binary classification)
    model.add(Dense(1, activation='sigmoid'))
    
    # Compile the model using Adam optimizer
    adam_optimizer = Adam(learning_rate=0.001)
    model.compile(optimizer=adam_optimizer, loss='binary_crossentropy', metrics=['accuracy'])
    
    return model

# Step 4: Create the model
adam_dropout_model = create_regularized_adam_model()

# Step 5: Set up the learning rate scheduler
lr_scheduler = LearningRateScheduler(lr_schedule)

# Step 6: Train the model with the balanced dataset using Adam optimizer, Dropout, and Learning Rate Scheduler
history_adam_dropout = adam_dropout_model.fit(
    X_train_balanced, y_train_balanced,
    epochs=20,
    batch_size=32,
    validation_data=(X_test, y_test),
    callbacks=[lr_scheduler]
)

# Step 7: Evaluate the model
loss, accuracy = adam_dropout_model.evaluate(X_test, y_test)
print(f'Test Loss: {loss:.4f}')
print(f'Test Accuracy: {accuracy:.4f}')


Epoch 1/20
[1m243/243[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.5189 - loss: 3670.9299 - val_accuracy: 0.2623 - val_loss: 47.4221 - learning_rate: 0.0010
Epoch 2/20
[1m243/243[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.4960 - loss: 234.5527 - val_accuracy: 0.6480 - val_loss: 2.0781 - learning_rate: 0.0010
Epoch 3/20
[1m243/243[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.4974 - loss: 12.3207 - val_accuracy: 0.1947 - val_loss: 1.7441 - learning_rate: 0.0010
Epoch 4/20
[1m243/243[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.4928 - loss: 4.4599 - val_accuracy: 0.1947 - val_loss: 1.7424 - learning_rate: 0.0010
Epoch 5/20
[1m243/243[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.5103 - loss: 3.7646 - val_accuracy: 0.1947 - val_loss: 1.7392 - learning_rate: 0.0010
Epoch 6/20
[1m243/243[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[

## Model Performance Comparison and Final Model Selection

## Actionable Insights and Business Recommendations

*
Proactive Retention Strategies: For customers identified as high-risk, the bank should initiate proactive retention strategies, such as personalized communication (e.g., offering better loan terms, personalized financial advice, or exclusive rewards) to increase customer satisfaction and reduce churn.

Personalized Offers: The bank should design personalized retention offers based on insights from the model. For instance, offering discounted services or customized plans for customers whose patterns indicate a likelihood of churn can improve customer loyalty.

Customer Experience Improvement: Based on the features driving churn, invest in improving the customer experience. This might involve upgrading customer support, streamlining mobile banking, or introducing more engaging features based on user preferences identified in the model.

Optimizing Marketing Spend: Direct marketing efforts more efficiently by focusing on high-risk customers identified by the model. This allows the bank to allocate marketing resources where they can have the highest impact.

Loyalty Programs: Enhance or introduce new loyalty programs for long-term customers who are likely to churn due to dissatisfaction or competition. Offer rewards, discounts, or benefits to keep customers engaged.

By implementing these strategies, the bank can reduce churn rates, improve customer retention, and ultimately increase customer lifetime value (CLTV), leading to higher revenue and a stronger customer base.


<font size=6 color='blue'>Power Ahead</font>
___