An end-to-end Customer Churn Prediction and Retention Intelligence web application built with Streamlit, Scikit-Learn, and LLM integration (OpenAI / Gemini).
| Module | Description |
|---|---|
| 📥 Data Ingestion | Fetches real-time customer data from the IBM Telco public dataset or any custom API endpoint |
| 🔧 ETL Pipeline | Automated data cleaning, encoding, feature engineering (LTV, engagement score, NPS segments, etc.) and Min-Max scaling |
| 🤖 ML Models | Trains and evaluates Logistic Regression and Random Forest classifiers with 5-fold cross-validation |
| 📊 Dashboard | Interactive KPI cards, churn probability distribution, risk segmentation pie chart, and revenue analysis |
| 💡 Retention Strategies | AI-generated personalised retention playbooks per customer using OpenAI GPT or Google Gemini (with a rich mock fallback) |
| 📤 Power BI Export | One-click CSV export with predictions and risk segments ready for Power BI dashboards |
churn_retention_system/
│
├── app.py # Main Streamlit UI (5-tab interface)
├── requirements.txt # Python dependencies
├── .gitignore
├── data/
│ └── power_bi_export.csv # Output file for Power BI
│
└── modules/
├── __init__.py # Package exports
├── scraper.py # Real-time data fetcher (IBM Telco + custom API)
├── etl_pipeline.py # Data cleaning & feature engineering
├── ml_models.py # Random Forest & Logistic Regression
└── llm_strategy.py # OpenAI / Gemini / Mock strategy generator
git clone https://github.com/jaideep005/churn_retention_system.git
cd churn_retention_systempip install -r requirements.txtCreate a .env file in the project root for LLM API keys:
GEMINI_API_KEY=your_gemini_api_key_here
OPENAI_API_KEY=your_openai_api_key_hereNote: If no API key is provided, the app automatically uses the built-in mock strategy generator.
streamlit run app.pyThe app will open at http://localhost:8501 in your browser.
1. Data Ingestion → Fetch real IBM Telco data (or upload CSV)
2. ETL Pipeline → Clean, encode, and engineer features
3. ML Models → Train Logistic Regression + Random Forest
4. Dashboard → View KPIs, risk segments, revenue analysis
5. Strategies → Generate AI-powered retention plans per customer
Two classifiers are trained and compared:
| Model | Key Hyperparameters |
|---|---|
| Logistic Regression | C=1.0, class_weight=balanced, max_iter=1000 |
| Random Forest | n_estimators=300, max_depth=12, class_weight=balanced |
Both models are evaluated on:
- Accuracy, Precision, Recall, F1 Score, ROC-AUC
- 5-Fold Stratified Cross-Validation
The best model (by ROC-AUC) is automatically selected to generate full-dataset predictions.
By default, the scraper pulls from the IBM Telco Customer Churn public dataset:
https://raw.githubusercontent.com/IBM/telco-customer-churn-on-icp4d/master/data/Telco-Customer-Churn.csv
To use a custom API endpoint, pass your own URL and column mapping:
from modules.scraper import CompanyDataScraper
scraper = CompanyDataScraper(
source_url="https://your-api.com/customers",
column_map={
"TimeWithCompany": "tenure_months",
"MonthlySpend": "monthly_revenue",
"DidTheyLeave": "churn_raw",
},
request_headers={"Authorization": "Bearer YOUR_TOKEN"},
)
df = scraper.fetch()Supports three modes:
| Mode | Description |
|---|---|
mock |
Rich rule-based template (default, no API key needed) |
gemini |
Google Gemini 1.5 Flash |
openai |
OpenAI GPT-3.5-Turbo |
After training, click "Export CSV for Power BI" in the Dashboard tab. The exported file includes:
- All original customer columns
churn_probability(float 0–1)predicted_churn(0 or 1)risk_segment(Low / Medium / High Risk)ltv_24mo(estimated 24-month lifetime value)
streamlit==1.32.0
pandas==2.2.1
numpy==1.26.4
scikit-learn==1.4.1
plotly==5.20.0
openai==1.14.3
google-generativeai==0.4.1
python-dotenv==1.0.1
faker==24.2.0
requests==2.31.0
joblib==1.3.2
Jaideep — @jaideep005
This project is licensed under the MIT License.