Skip to content

jaideep005/churn_retention_system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔮 ChurnGuard AI — Customer Churn Prediction & Retention Strategy System

Python Streamlit Scikit-Learn License

An end-to-end Customer Churn Prediction and Retention Intelligence web application built with Streamlit, Scikit-Learn, and LLM integration (OpenAI / Gemini).


📌 Features

Module Description
📥 Data Ingestion Fetches real-time customer data from the IBM Telco public dataset or any custom API endpoint
🔧 ETL Pipeline Automated data cleaning, encoding, feature engineering (LTV, engagement score, NPS segments, etc.) and Min-Max scaling
🤖 ML Models Trains and evaluates Logistic Regression and Random Forest classifiers with 5-fold cross-validation
📊 Dashboard Interactive KPI cards, churn probability distribution, risk segmentation pie chart, and revenue analysis
💡 Retention Strategies AI-generated personalised retention playbooks per customer using OpenAI GPT or Google Gemini (with a rich mock fallback)
📤 Power BI Export One-click CSV export with predictions and risk segments ready for Power BI dashboards

🏗️ Project Structure

churn_retention_system/
│
├── app.py                        # Main Streamlit UI (5-tab interface)
├── requirements.txt              # Python dependencies
├── .gitignore
├── data/
│   └── power_bi_export.csv       # Output file for Power BI
│
└── modules/
    ├── __init__.py               # Package exports
    ├── scraper.py                # Real-time data fetcher (IBM Telco + custom API)
    ├── etl_pipeline.py           # Data cleaning & feature engineering
    ├── ml_models.py              # Random Forest & Logistic Regression
    └── llm_strategy.py           # OpenAI / Gemini / Mock strategy generator

🚀 Getting Started

1. Clone the Repository

git clone https://github.com/jaideep005/churn_retention_system.git
cd churn_retention_system

2. Install Dependencies

pip install -r requirements.txt

3. Set Up Environment Variables (Optional)

Create a .env file in the project root for LLM API keys:

GEMINI_API_KEY=your_gemini_api_key_here
OPENAI_API_KEY=your_openai_api_key_here

Note: If no API key is provided, the app automatically uses the built-in mock strategy generator.

4. Run the App

streamlit run app.py

The app will open at http://localhost:8501 in your browser.


🔄 App Workflow

1. Data Ingestion  →  Fetch real IBM Telco data (or upload CSV)
2. ETL Pipeline    →  Clean, encode, and engineer features
3. ML Models       →  Train Logistic Regression + Random Forest
4. Dashboard       →  View KPIs, risk segments, revenue analysis
5. Strategies      →  Generate AI-powered retention plans per customer

🧠 Machine Learning

Two classifiers are trained and compared:

Model Key Hyperparameters
Logistic Regression C=1.0, class_weight=balanced, max_iter=1000
Random Forest n_estimators=300, max_depth=12, class_weight=balanced

Both models are evaluated on:

  • Accuracy, Precision, Recall, F1 Score, ROC-AUC
  • 5-Fold Stratified Cross-Validation

The best model (by ROC-AUC) is automatically selected to generate full-dataset predictions.


🌐 Real-Time Data Source

By default, the scraper pulls from the IBM Telco Customer Churn public dataset:

https://raw.githubusercontent.com/IBM/telco-customer-churn-on-icp4d/master/data/Telco-Customer-Churn.csv

To use a custom API endpoint, pass your own URL and column mapping:

from modules.scraper import CompanyDataScraper

scraper = CompanyDataScraper(
    source_url="https://your-api.com/customers",
    column_map={
        "TimeWithCompany": "tenure_months",
        "MonthlySpend":    "monthly_revenue",
        "DidTheyLeave":    "churn_raw",
    },
    request_headers={"Authorization": "Bearer YOUR_TOKEN"},
)
df = scraper.fetch()

💡 LLM Retention Strategies

Supports three modes:

Mode Description
mock Rich rule-based template (default, no API key needed)
gemini Google Gemini 1.5 Flash
openai OpenAI GPT-3.5-Turbo

📤 Power BI Integration

After training, click "Export CSV for Power BI" in the Dashboard tab. The exported file includes:

  • All original customer columns
  • churn_probability (float 0–1)
  • predicted_churn (0 or 1)
  • risk_segment (Low / Medium / High Risk)
  • ltv_24mo (estimated 24-month lifetime value)

📦 Dependencies

streamlit==1.32.0
pandas==2.2.1
numpy==1.26.4
scikit-learn==1.4.1
plotly==5.20.0
openai==1.14.3
google-generativeai==0.4.1
python-dotenv==1.0.1
faker==24.2.0
requests==2.31.0
joblib==1.3.2

👨‍💻 Author

Jaideep@jaideep005


📄 License

This project is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages