# Customer Churn – Business Context

This project aims to support a business decision, not just build a model.

## Business Decision

Each month, the company can only contact a limited number of customers.

**Decision to make:**  
Which customers should be contacted first to reduce churn?

## Business Assumptions

- Churn represents a loss of revenue
- Retention actions have a cost
- Not all churn can be prevented
- Prioritization is necessary    

In [4]:
COST_PER_RETENTION_CONTACT = 50          # $ per contacted customer
AVERAGE_CUSTOMER_LIFETIME_VALUE = 2000   # $ lost if customer churns
RETENTION_SUCCESS_RATE = 0.30            # 30% of contacted customers stay

The following values are business assumptions used for simulation purposes.
They are not derived from the dataset but represent realistic industry estimates.

In a real-world project, these values would be provided by finance, marketing,
or CRM teams and validated through historical analysis.

## Business Value of the Model

In [6]:
# Expected value of a correct churn prediction
expected_value_per_correct_contact = (
    AVERAGE_CUSTOMER_LIFETIME_VALUE * RETENTION_SUCCESS_RATE
    - COST_PER_RETENTION_CONTACT
)

expected_value_per_correct_contact

550.0

### Interpretation

If the model correctly identifies a customer who would churn:

- Expected retained value: 2000 × 30% = $600
  
- Retention contact cost: $50

  
- Net expected value: **$550 per correctly identified churner**

This shows that even a moderately accurate model
can generate strong positive ROI.

## Definition of Success

The project is successful if it helps:
- Reduce churn among contacted customers
- Optimize retention costs
- Support human decision-making

## Success Metrics

The success of the project will be evaluated using both
machine learning metrics and business-oriented thresholds.

| Metric | Target | Minimum Acceptable | Rationale |
|------|--------|-------------------|-----------|
| Recall (Churn) | ≥ 80% | ≥ 70% | Missing a churner is costly |
| ROC-AUC | ≥ 0.85 | ≥ 0.80 | Ranking quality |
| Precision | ≥ 50% | ≥ 40% | Control retention costs |

Priority is given to Recall, as the objective is to
identify as many potential churners as possible
within limited contact capacity.

## Constraints

- Monthly customer contact capacity: 500 customers
- Model must be interpretable for compliance and business trust
- Predictions must be generated by the 1st of each month
- Solution must be deployable as a batch process (CSV output)

## Expected Output Format

The final output of the project is a ranked list of customers
ready for operational use.

| customerID | churn_probability | risk_level | recommended_action |
|-----------|------------------|-----------|-------------------|
| 7590-VHVEG | 0.82 | High | Urgent Retention |
| 3668-QPYBK | 0.61 | Medium | Targeted Offer |
| 5575-GNVDE | 0.18 | Low | Monitor |

This output will be delivered as a CSV file
and consumed by business teams.

## Project Deliverables

- [ ] Trained churn prediction model
- [ ] Saved preprocessing pipeline
- [ ] Feature importance and interpretation report
- [ ] Customer churn risk ranking (CSV)
- [ ] Reproducible notebooks for audit and review

## Modeling Implications

Based on the business context:
- The model should prioritize recall over precision
- Interpretable models are preferred
- Probabilistic outputs are required to rank customers by risk