This project uses advanced machine learning classification techniques to predict which customers are most likely to cancel their service (churn). Early identification of high-risk customers allows a business to deploy targeted retention campaigns, thereby maximizing customer lifetime value (CLV).
Key Result: The final [Model Name, e.g., Gradient Boosting Classifier] achieved a [Insert Metric, e.g., 88% Recall] on the churn class. We focused on Recall because [Explain why: e.g., minimizing False Negatives—missing a churner—is more costly than a False Positive].
- Data Preprocessing: Handled significant class imbalance (high ratio of non-churners to churners) using [Technique used, e.g., SMOTE, undersampling].
- Feature Engineering: Engineered features based on customer usage and activity (e.g., tenure, average monthly usage, total spend).
- Model Selection: Compared several classification models (Logistic Regression, k-NN, Random Forest) and selected the [Final Model] for its robust performance on the imbalanced dataset.
- Actionable Insights: Used Feature Importance from the final model to identify the top factors driving churn (e.g., high monthly fees, short tenure).
notebooks/01_Churn_Modeling.ipynb: Complete analysis covering EDA, data preprocessing, model training, and performance evaluation.requirements.txt: List of all necessary Python packages.
- Language: Python
- Libraries: Pandas, NumPy, Scikit-learn, Matplotlib/Seaborn, [Specify Classifier, e.g., XGBoost, Imbalanced-learn]
Follow these steps to set up the necessary environment and run the analysis notebook on your local machine.
git clone [https://github.com/pandakitty/Customer-Churn-Analysis.git](https://github.com/pandakitty/Customer-Churn-Analysis.git)
cd Customer-Churn-Analysis