<center><h1>Predicting Customer Churn Using Deep Learning</h1></center>

<h2>Introduction</h2>

In this project, we develop a deep learning model to predict customer churn. For this manner, we will use the <a href="https://www.kaggle.com/datasets/barun2104/telecom-churn">Customer Churn</a> dataset from Kaggle. This data set contains customer level information for a telecom company. Various attributes related to the services used are recorded for each customer.

<h2>Download / Load Dataset</h2>

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
df = pd.read_csv ('/kaggle/input/telecom-churn/telecom_churn.csv')
df.head()

Each row of this dataset, represents a customer and each column contains attributes related to customer. The “Churn” column is the target variable. For more information about columns, you can visit the dataset's page in Kaggle website.

<h2>Preprocessing</h2>

One of the key steps in preprocessing for this type of project is ensuring that the data is balanced. In the case of churn prediction, this means that the number of instances for each class (churn = 1, no churn = 0) should be equal or at least similar. Imbalanced data can negatively impact the training process, leading to a biased model that favors the majority class.

In [None]:
churn_counts = df['Churn'].value_counts()
print(churn_counts)

In [None]:
churn_percentage = df['Churn'].value_counts(normalize=True) * 100
print(churn_percentage

As you can see, the number of instances for the churn class '0' is significantly higher than for class '1', making the dataset clearly imbalanced. However, before addressing this imbalance, let's first split the data into training and testing sets.

<h2>Split Data into Train and Test Sets</h2>

As mentioned before, the "Churn" column is our target value. So, we can split the data like this:



In [None]:
from sklearn.model_selection import train_test_split

X = df.drop(columns=['Churn'])
y = df['Churn']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

<h2>Balancing the Data</h2>

Now, we can balance the data. There are multiple methods to achieve this, but given the nature of our dataset, we will use the SMOTE (Synthetic Minority Over-sampling Technique) method.

In [None]:
from imblearn.over_sampling import SMOTE

smote = SMOTE(sampling_strategy='auto', random_state=42)

X_train_resampled, y_train_resampled = smote.fit_resample(X_train, y_train)

Now, let's see the result of using SMOTE method.

In [None]:
# print("Before SMOTE:")
print("\nBefore SMOTE:")
print(y_train.value_counts(normalize=True) * 100)  # Original class distribution

# print("After SMOTE:")
print("\nAfter SMOTE:")
print(y_train_resampled.value_counts(normalize=True) * 100)  # Balanced class distribution

As you can see, the number of instances in each churn class is now equal. However, it's important to note that using the SMOTE method can introduce some challenges, such as:

<li>Synthetic Data Artifacts – SMOTE generates synthetic samples based on existing data, which may not always represent real-world patterns accurately.</li>
<li>Overfitting – The model may learn patterns specific to the synthetic data rather than generalizing well to new, unseen data.</li>
<li>Impact on Feature Distribution – SMOTE may slightly alter the distribution of certain features, especially if the dataset is highly imbalanced.</li>
<br>
To mitigate these issues, it's important to carefully evaluate the model's performance and consider alternative techniques if necessary.

<h2>Feature Scaling</h2>

After balancing the data, the next crucial step is feature scaling. Scaling ensures that all numerical features have a similar range, preventing models from being biased toward features with larger values.

In our dataset, some features have values ranging from 0 to 1, while others have much larger ranges, such as 110, 256.2, or 13.7. If left unscaled, this difference can negatively impact the model's performance, as features with larger magnitudes may dominate the learning process.

To address this, we use Standard Scaling, which transforms the features to have a mean of 0 and a standard deviation of 1. This method helps deep learning models converge faster and improves overall performance.

Since scaling should be based only on training data to prevent data leakage, we fit the scaler on the training set and apply the transformation to both the training and test sets.

In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train_resampled)  # Apply scaling on resampled training data
X_test_scaled = scaler.transform(X_test)  # Apply scaling on test data (don't fit on test)

print(X_train_scaled[:5])  # Print first 5 rows of scaled training data

<h2>Build the Deep Learning Model</h2>

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import Adam

In [None]:
model = Sequential([
    Dense(128, input_dim=X_train_scaled.shape[1], activation='relu'),
    Dense(64, activation='relu'),
    Dropout(0.3),
    Dense(32, activation='relu'),
    Dropout(0.3),
    Dense(1, activation='sigmoid')
])

model.compile(optimizer=Adam(learning_rate=0.001), loss='binary_crossentropy', metrics=['accuracy'])

In addition to using SMOTE for balancing the dataset, another effective approach is assigning class weights during model training. This technique helps the model pay more attention to the minority class (churn = 1) without altering the original dataset.

Since our dataset is imbalanced, the model may naturally favor the majority class (churn = 0), leading to poor recall for the minority class. By assigning higher weights to the minority class, we encourage the model to give more importance to correctly predicting churn cases.

In our case, we experimented with different class weight ratios (e.g., 1:1.3 and 1:1.5) and observed their impact on model performance. This helped improve the recall for churn cases while maintaining a balanced overall accuracy.

Using class weights is especially useful in scenarios where oversampling techniques like SMOTE might introduce synthetic data artifacts, making it a valuable alternative for handling class imbalance.

In [None]:
class_weight = {0: 1, 1: 1.4}  # 1 for non-churn, 1.5 for churn
model.fit(X_train_scaled, y_train_resampled, class_weight=class_weight, epochs=50, batch_size=64)

<h2>Model Evaluation</h2>

After training the model, it's crucial to evaluate its performance using multiple metrics, especially since we are dealing with an imbalanced dataset. Accuracy alone may be misleading, as the model could predict the majority class more often while failing to correctly identify churn cases.

To get a comprehensive view of the model's performance, we use:

<li>Precision – Measures how many predicted churn cases were actually correct.</li>
<li>Recall – Measures how well the model identifies actual churn cases. Higher recall is important in churn prediction to minimize missed churners.</li>
<li>F1-Score – Balances precision and recall, providing a better metric for imbalanced data.</li>
<li>Confusion Matrix – Helps visualize the number of true positives, false positives, true negatives, and false negatives.</li>
<br>
Additionally, we experimented with threshold tuning (e.g., adjusting the default 0.5 threshold to 0.6) to optimize precision-recall trade-offs, ensuring a better balance between false positives and false negatives.

In [None]:
test_loss, test_accuracy = model.evaluate(X_test_scaled, y_test)
print(f"Test Accuracy: {test_accuracy * 100:.2f}%")

In [None]:
from sklearn.metrics import classification_report

y_pred = model.predict(X_test_scaled)
y_pred = (y_pred > 0.6)  # Convert probabilities to binary labels (0 or 1)

print(classification_report(y_test, y_pred))

<h2>Final Thoughts and Conclusion</h2>

After evaluating the model, we achieved a test accuracy of 87.41%, with a strong performance in predicting customer churn. The model effectively identifies at-risk customers, as shown by a recall of 63% for the churn class, meaning it captures most actual churn cases.

Key takeaways from this project:
<li>Addressed class imbalance using SMOTE and class weighting.</li>
<li>Applied feature scaling for consistent model performance.</li>
<li>Optimized hyperparameters and decision thresholds to improve recall.</li>
<li>Evaluated performance using precision, recall, F1-score, and accuracy.</li>
<br>
Possible improvements:
<li>Further hyperparameter tuning to optimize the precision-recall trade-off.</li>
<li>Exploring additional feature engineering to extract more insights.</li>
<li>Testing other architectures or models, such as tree-based algorithms or hybrid approaches.</li>
<br>
Overall, this project provides a solid deep learning-based approach to churn prediction, helping businesses take proactive steps to retain customers and reduce churn.

<hr>
Thanks for your time.<br>
I hope you found this notebook useful.<br>
Author: Mohammadmehdi Omidi