## Business Insights and Recommendations

### Purpose

This notebook presents the business insights and recommendations derived from the modeling and evaluation of multiple machine learning models. The objective is to identify the key customer features associated with churn and translate the modeling results into actionable business insights.

### 1. Executive Summary

**Problem Statement**  
Customer churn is a major challenge for a telecommunications company. Using machine learning, this project aims to estimate the risk that a customer will churn based on selected customer features.

**Model Selection**  
After evaluating multiple models, the Random Forest model was selected as the best-performing approach. This model achieved a strong balance between recall and F1-score for churn prediction. Other evaluated models included logistic regression (with and without class weighting) and a decision tree.

**Business Relevance**  
Being able to identify customers with a high probability of churn is highly valuable for the business. The results of this project allow the company to proactively target at-risk customers and implement retention strategies. By acting on these insights, the company can reduce customer attrition and minimize revenue loss.


### 2. Key Drivers of Customer Churn

The analysis across exploratory data analysis and modeling highlights several key factors associated with customer churn:

- **Customer Tenure:** Customers with shorter tenure are significantly more likely to churn.
- **Monthly Charges:** Higher monthly charges increase the likelihood of churn.
- **Contract Type:** Month-to-month contracts are associated with higher churn risk, while long-term contracts reduce churn.
- **Technical Support:** Customers with access to technical support tend to be more stable and less likely to churn.

These drivers consistently appear across different models, increasing confidence in the findings.

### 3. Final Model Overview

The Random Forest model was selected as the final model for churn prediction.

This model was chosen because:
- It achieved the highest F1-score among the evaluated models.
- It maintained a high recall for churn customers, which is critical for churn prevention.
- It provided a better balance between identifying churners and limiting false positives compared to other models.

The primary evaluation focus was on recall and F1-score for the churn class.

### 4. Identifying High-Risk Customers

The final Random Forest model can be used to assign a churn probability to each customer. This probability represents the estimated risk that a customer will churn.

Customers with higher predicted churn probabilities can be prioritized for retention actions. Rather than predicting churn with certainty, the model ranks customers based on relative risk, enabling efficient resource allocation.



In [1]:
# Predict churn probabilities for all customers
churn_probabilities = rf_model.predict_proba(X_encoded)[:, 1]

# Create a dataframe with customer ID and churn risk
risk_df = pd.DataFrame({
    "customerID": df["customerID"],
    "churn_probability": churn_probabilities
})

# Sort customers by churn risk
risk_df = risk_df.sort_values(by="churn_probability", ascending=False)

risk_df.head(10)


The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.


NameError: name 'rf_model' is not defined

### 5. Customer Risk Segmentation

Based on predicted churn probabilities, customers can be segmented into different risk groups:

- **High Risk:** Customers with very high churn probability who require immediate retention actions.
- **Medium Risk:** Customers who should be monitored and engaged proactively.
- **Low Risk:** Customers with low churn probability who do not require immediate intervention.

This segmentation allows the business to focus retention efforts where they are most impactful.


In [2]:
# Example risk segmentation
risk_df["risk_segment"] = pd.cut(
    risk_df["churn_probability"],
    bins=[0, 0.4, 0.7, 1.0],
    labels=["Low Risk", "Medium Risk", "High Risk"]
)

risk_df.head(10)


NameError: name 'pd' is not defined

### 6. Business Recommendations

Based on the analysis, the following actions are recommended:

- Prioritize retention campaigns for high-risk customers with short tenure.
- Encourage customers on month-to-month contracts to switch to longer-term plans.
- Review pricing strategies for customers with high monthly charges.
- Promote technical support services to improve customer satisfaction and retention.

These actions can help reduce churn and increase customer lifetime value.

### 7. Limitations

This project has several limitations:

- The dataset does not include customer behavior or usage metrics.
- External factors influencing churn are not captured.
- The model predictions are based on historical patterns and may change over time.

As a result, the model should be monitored and updated regularly in a production environment.

### 8. Next Steps

Potential next steps for this project include:

- Incorporating customer usage and interaction data.
- Performing threshold optimization based on business costs.
- Deploying the model in a production system for real-time scoring.
- Monitoring model performance and retraining periodically.

### 9. Conclusion

This project demonstrates how machine learning can be applied to identify customers at risk of churning and support data-driven retention strategies. The Random Forest model provides a reliable balance between recall and precision, allowing the business to prioritize high-risk customers effectively.

By acting on these insights, the company can reduce customer churn, improve customer satisfaction, and protect long-term revenue.
