<a href="https://colab.research.google.com/github/ludwigenvall/churn-prediction-dl/blob/main/Churn_report.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Churn Prediction with Simulated Customer Sequences using LSTM

## Abstract

Customer churn prediction is essential for subscription-based businesses. This project explores a Bayesian generative model used to simulate customer behavior over 30 days, followed by Long Short-Term Memory (LSTM) neural networks trained to classify churn. Results show high accuracy (93%) and strong performance metrics (F1-score: 0.86, AUC: 0.98), suggesting that combining generative simulation and deep learning is effective for churn modeling.


## Introduction

Churn, or the loss of customers, is costly for companies. Traditional churn models often rely on static features or simple heuristics, ignoring temporal dynamics. To capture customer behavior more realistically, this project uses two-steps:

1. Simulate customer sequences with a Bayesian model based on churn status  
2. Predict churn using sequential deep learning (LSTM)

Question: *Can a deep learning model trained on synthetic, behavior-based sequences accurately detect churn?*


## Methods

### Data Simulation

Using the Telco Customer Churn dataset as a base for labels, customer behavior over 30 days was simulated using a PyMC Bayesian model. Three sequences were generated per customer:

- `logins_seq`: daily number of logins (Poisson distribution)
- `support_seq`: binary indicator of customer support contact (Binomial)
- `data_seq`: daily data usage in GB (Gamma)

The model samples from posterior distributions of parameters (e.g., churners have lower login rates and higher support contact probability). Random noise is added at generation to introduce variance across samples.

A total of 7,035 customers were simulated. The churn label from the Telco dataset (Yes/No) was used as the target variable, while synthetic sequences were generated for each customer over 30 days. Data was split into train and test sets and shaped into `(n_samples, 30, 3)` tensors.



### Long Short-Term Memory (LSTM)

LSTM is a type of recurrent neural network (RNN) designed to model sequential data and overcome the vanishing gradient problem associated with traditional RNNs. LSTMs achieve this using memory cells and gates (input, forget, and output) that control the flow of information across time steps.

In this project, the LSTM architecture enables the model to capture temporal patterns in customer behavior over a 30-day window. The first LSTM layer extracts sequence-level features, while subsequent TimeDistributed dense layers apply learned transformations at each time step. A second LSTM layer then compresses the sequence into a fixed-size representation for the final churn prediction.

LSTM is suitable for this problem due to its ability to remember past inputs over long periods and identify trends in customer actions that may indicate churn risk.

### LSTM Architecture

Multiple LSTM models were implemented and evaluated:

Baseline model: 1 LSTM layer -> Dropout -> Dense -> Output

Deeper LSTM model: 2 stacked LSTM layers -> Dropout -> Dense layers

Bidirectional LSTM: Bidirectional LSTM -> Dropout -> LSTM -> Dense

Conv1D + LSTM: Conv1D + MaxPooling -> LSTM -> Dropout -> Dense

TimeDistributed LSTM: LSTM -> TimeDistributed Dense -> LSTM -> Dropout -> Dense

TimeDistributed with L2 regularization: Same as above + L2 kernel regularizer

All models were trained using binary_crossentropy loss, Adam optimizer, early stopping with patience=5, batch size 64, and a validation split of 20%.

### Model Evaluation

Each model was evaluated using:

- Accuracy, Precision, Recall, F1-score

- ROC Curve and AUC

- Precision-Recall curve

- Confusion Matrix

## Results

Among all tested architectures, the model with stacked TimeDistributed layers and LSTM achieved the best performance:

- Accuracy: 93%

- Precision (Churn): 0.91

- Recall (Churn): 0.82

- F1-score (Churn): 0.86

- AUC (ROC): 0.98

The baseline and simpler LSTM models performed slightly worse in recall and F1. Bidirectional and Conv1D-enhanced models yielded similar but marginally lower AUC scores. The use of L2 regularization slightly improved generalization.

ROC and PR curves show strong class separation and high model confidence across models.

Loss curves showed stable convergence in all models. Early stopping was triggered between epochs 10–18, with validation loss flattening earlier than training loss. This pattern supports the use of early stopping to reduce overfitting risk.

## Discussion

The combination of simulation and LSTM modeling produces robust results. The best-performing model generalized well despite being trained on synthetic data.

Key observations include:

- TimeDistributed layers helped capture finer patterns across time

- Adding convolutional layers or bidirectionality gave only minor gains

- L2 regularization improved stability slightly

However, limitations include:

- lack of real behavioral data

- dependence on simulation assumptions

- tradeoff between precision and recall (especially for churn class)

Future work could test attention-based models or adapt simulations to better reflect real-world customer heterogeneity.

## Code and Reproducibility

All code is available in the GitHub repo: [github.com/ludwigenvall/churn-prediction-dl](https://github.com/ludwigenvall/churn-prediction-dl)

Key scripts:

- `generative_model.py`: Bayesian simulation

- `lstm_model.py`: model utilities

Notebooks:

- `01_explore_telco_data.ipynb`: Exploring and cleaning the Telco dataset

- `02_generate_behavior.ipynb`: Generating the simulated behavior and visualizing results

- `03_train_lstm_model.ipynb`: full pipeline and evaluation

## References

PyMC documentation: https://www.pymc.io/

TensorFlow/Keras API: https://www.tensorflow.org/api_docs

Telco Churn dataset: https://www.kaggle.com/blastchar/telco-customer-churn