Given a bank customer, build a neural network-based classifier that can determine whether they will leave or not in the next 6 months. Dataset Description: The case study is from an open-source dataset from Kaggle. The dataset contains 10,000 sample points with 14 distinct features such as CustomerId, CreditScore, Geography, Gender, Age, Tenure, Balance, etc. Link to the Kaggle project: https://www.kaggle.com/barelydedicated/bank-customer-churn-modeling Perform following steps:

Read the dataset. Distinguish the feature and target set and divide the data set into training and test sets. Normalize the train and test data. Initialize and build the model. Identify the points of improvement and implement the same. Print the accuracy score and confusion matrix (5 points).

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
pip install "tensorflow<2.11"

In [None]:
df = pd.read_csv("/content/Churn_Modelling.csv")

In [None]:
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   RowNumber        10000 non-null  int64  
 1   CustomerId       10000 non-null  int64  
 2   Surname          10000 non-null  object 
 3   CreditScore      10000 non-null  int64  
 4   Geography        10000 non-null  object 
 5   Gender           10000 non-null  object 
 6   Age              10000 non-null  int64  
 7   Tenure           10000 non-null  int64  
 8   Balance          10000 non-null  float64
 9   NumOfProducts    10000 non-null  int64  
 10  HasCrCard        10000 non-null  int64  
 11  IsActiveMember   10000 non-null  int64  
 12  EstimatedSalary  10000 non-null  float64
 13  Exited           10000 non-null  int64  
dtypes: float64(2), int64(9), object(3)
memory usage: 1.1+ MB
None


In [None]:
X = df.drop(['CustomerId', 'Surname', 'Exited'], axis=1)
y = df['Exited']

In [None]:
# Convert categorical columns to one-hot encoding
X = pd.get_dummies(X, columns=['Geography', 'Gender'], drop_first=True)

In [None]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
# Step 3: Normalize the train and test data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [None]:
# Step 4: Initialize and build the model
model = Sequential()
model.add(Dense(128, input_dim=X_train.shape[1], activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

In [None]:
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

In [None]:
# Train the model
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.1, verbose=2)

Epoch 1/10
225/225 - 1s - loss: 0.4385 - accuracy: 0.8136 - val_loss: 0.3815 - val_accuracy: 0.8438 - 1s/epoch - 5ms/step
Epoch 2/10
225/225 - 0s - loss: 0.3726 - accuracy: 0.8511 - val_loss: 0.3603 - val_accuracy: 0.8462 - 269ms/epoch - 1ms/step
Epoch 3/10
225/225 - 0s - loss: 0.3510 - accuracy: 0.8583 - val_loss: 0.3459 - val_accuracy: 0.8537 - 260ms/epoch - 1ms/step
Epoch 4/10
225/225 - 0s - loss: 0.3432 - accuracy: 0.8586 - val_loss: 0.3451 - val_accuracy: 0.8587 - 279ms/epoch - 1ms/step
Epoch 5/10
225/225 - 0s - loss: 0.3392 - accuracy: 0.8615 - val_loss: 0.3411 - val_accuracy: 0.8575 - 310ms/epoch - 1ms/step
Epoch 6/10
225/225 - 0s - loss: 0.3347 - accuracy: 0.8636 - val_loss: 0.3381 - val_accuracy: 0.8550 - 301ms/epoch - 1ms/step
Epoch 7/10
225/225 - 0s - loss: 0.3301 - accuracy: 0.8672 - val_loss: 0.3424 - val_accuracy: 0.8562 - 318ms/epoch - 1ms/step
Epoch 8/10
225/225 - 0s - loss: 0.3269 - accuracy: 0.8640 - val_loss: 0.3389 - val_accuracy: 0.8637 - 388ms/epoch - 2ms/step
Epo

<keras.src.callbacks.History at 0x786b140cd210>

In [None]:
# Step 5: Print the accuracy score and confusion matrix
y_pred = (model.predict(X_test) > 0.5).astype("int32")



In [None]:
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)

In [None]:
print("Accuracy Score:", accuracy)
print("Confusion Matrix:")
print(conf_matrix)

Accuracy Score: 0.8625
Confusion Matrix:
[[1534   73]
 [ 202  191]]
