# **Problem Statement**  
## **3. Classify customer churn using logistic regression and explain business use.**

### Problem Statement

Customer Churn Prediction Using Logistic Regression

The goal is to predict whether a customer is likely to churn (leave a service) based on historical customer behavior and usage patterns.

This helps businesses:
- Reduce revenue loss
- Target high-risk customers
- Design retention strategies

Output:
- 1 → Churn
- 0 → Not Churn

### Constraints & Example Inputs/Outputs

### Constraints
- Binary classification problem
- Dataset size: small (demo / educational)
- Model: Logistic Regression (interpretable)
- No deep learning

### Features Used:
```python
| Feature         | Description                   |
| --------------- | ----------------------------- |
| tenure          | Months with company           |
| monthly_charges | Monthly bill                  |
| total_charges   | Lifetime spend                |
| support_calls   | Customer support interactions |
```

### Example Input:
```python
tenure = 2
monthly_charges = 90
total_charges = 180
support_calls = 5
```

### Expected Output:
Churn = 1

### Solution Approach

### Step 1: Understand Churn Business Logic
Customers are more likely to churn if:
- They have short tenure
- They make many support calls
- They pay high monthly charges
- Their total value is low

### Step 2: Why Logistic Regression?
- Outputs probability → business-friendly
- Interpretable coefficients
- Widely used in telecom, SaaS, banking

### Step 3: Workflow
1. Load & create customer data
2. Train logistic regression model
3. Predict churn probability
4. Convert probability → churn decision
5. Validate using test cases

### Solution Code

In [2]:
# Approach1: Brute Force (Rule-Based Churn Detection)

import pandas as pd

data = pd.DataFrame({
    "tenure": [1, 12, 24, 3, 36],
    "monthly_charges": [95, 50, 60, 85, 40],
    "total_charges": [95, 600, 1400, 255, 1500],
    "support_calls": [4, 1, 0, 5, 0],
    "churn": [1, 0, 0, 1, 0]
})

def rule_based_churn(row):
    if row["tenure"] < 6 and row["support_calls"] >= 4:
        return 1
    if row["monthly_charges"] > 80 and row["tenure"] < 12:
        return 1
    return 0

data["rule_prediction"] = data.apply(rule_based_churn, axis=1)
data


Unnamed: 0,tenure,monthly_charges,total_charges,support_calls,churn,rule_prediction
0,1,95,95,4,1,1
1,12,50,600,1,0,0
2,24,60,1400,0,0,0
3,3,85,255,5,1,1
4,36,40,1500,0,0,0


### Alternative Solution

In [3]:
# Approach 2: Optimized (Logistic Regression – ML Approach)
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

X = data[["tenure", "monthly_charges", "total_charges", "support_calls"]]
y = data["churn"]

model = LogisticRegression()
model.fit(X, y)

data["churn_probability"] = model.predict_proba(X)[:, 1]
data["ml_prediction"] = (data["churn_probability"] >= 0.5).astype(int)

data


Unnamed: 0,tenure,monthly_charges,total_charges,support_calls,churn,rule_prediction,churn_probability,ml_prediction
0,1,95,95,4,1,1,1.0,1
1,12,50,600,1,0,0,0.0001465853,0
2,24,60,1400,0,0,0,3.986145e-22,0
3,3,85,255,5,1,1,0.9998528,1
4,36,40,1500,0,0,0,2.248468e-24,0


### Alternative Approaches

### Alternative 1: Rule-Based Only
- Simple
- High bias
- Misses subtle patterns

### Alternative 2: Tree-Based Models
- Random Forest
- XGBoost
- Higher accuracy, less interpretability

### Alternative 3 (Industry Preferred)

✅ Logistic Regression + Business Rules
- Interpretable
- Compliant
- Easy to explain to stakeholders

### Test Case

In [4]:
# Test Case 1: t-Interval (Unknown Variance)
test_customer = pd.DataFrame({
    "tenure": [2],
    "monthly_charges": [90],
    "total_charges": [180],
    "support_calls": [5]
})

test_customer["churn_probability"] = model.predict_proba(test_customer)[:, 1]
test_customer["churn_prediction"] = (test_customer["churn_probability"] >= 0.5).astype(int)

test_customer


Unnamed: 0,tenure,monthly_charges,total_charges,support_calls,churn_probability,churn_prediction
0,2,90,180,5,0.999997,1


In [8]:
# Test Case 2: Loyal Customer
test_customer1 = pd.DataFrame({
    "tenure": [48],
    "monthly_charges": [45],
    "total_charges": [2200],
    "support_calls": [0]
})

test_customer1["churn_probability"] = model.predict_proba(test_customer1)[:, 1]
test_customer1["churn_prediction"] = (test_customer1["churn_probability"] >= 0.5).astype(int)

test_customer1

Unnamed: 0,tenure,monthly_charges,total_charges,support_calls,churn_probability,churn_prediction
0,48,45,2200,0,9.398392e-40,0


### Expected Outputs
- Churn probability for each customer
- Binary churn prediction
- Business-friendly explanation:
    - "Customer has 78% churn risk due to short tenure and frequent support calls."

### Business Use Explanation

**How Companies Use This**
- Identify customers at high churn risk
- Offer discounts / retention offers
- Reduce CAC and increase LTV

**KPIs Impacted**
- Churn Rate ↓
- Customer Lifetime Value ↑
- Revenue Stability ↑

## Complexity Analysis

### Logistic Regression
- Training Time: O(n × features)
- Prediction Time: O(n)
- Space Complexity: O(features)

### Rule-Based
- Time: O(n)
- Space: O(1)

#### Thank You!!