The project focuses on analyzing customer churn, generating churn risk scores, identifying churn-flagged customers, and prioritizing those for targeted retention campaigns. Based on the performance of multiple machine learning algorithms, we have selected the most effective algorithms for each of these four key goals, ensuring the best results aligned with the business objectives.

# Model Selection:

From the above model results, the following models stand out:

1.Best Model for Accuracy: Artificial Neural Networks(ANN) with the highest accuracy of 0.891
2.Best Model for Recall: LIghtGBM, which makes it excellent for identifying customers likely to churn,though at the cost of many false positives, it needs tuning for precision
3.Best Model for Balance Between Precision and Recall: Support Vector Machine (SVM) with accuracy of 0.890,precision of 0.582,recall of 0.700 and ROC AUC of 0.898

## GOAL 1: Understandng the Variables Influencing Customers Migration

### Chosen Algorithm: Logistic Regression

Logistic Regression was chosen over SVM for its interpretablity and ability to handle a mix of categorical and numerical variables, making it the best fit for understanding the key drivers of customer churn.

In [3]:
from sklearn.inspection import permutation_importance

# Calculate permutation importance
perm_importance = permutation_importance(LR, x_test_scaled, y_test, scoring='roc_auc')

# Create a DataFrame for visualization
importance_df = pd.DataFrame({
    'Feature': x_test_scaled.columns,
    'Importance': perm_importance.importances_mean
}).sort_values(by='Importance', ascending=False)

# Display top features
print("Top Influencing Variables:")
print(importance_df.head(10))

# Plot the feature importance
plt.figure(figsize=(10, 6))
plt.barh(importance_df['Feature'].head(10), importance_df['Importance'].head(10))
plt.gca().invert_yaxis()
plt.title("Top Variables Influencing Churn")
plt.xlabel("Importance")
plt.ylabel("Feature")
plt.show()

NameError: name 'LR' is not defined

## GOAL 2 : Creating Churn Risk Scores¶

In [None]:
Chosen Algorithm : Support Vector Machines(SVM)

SVM delivered the highest ROC AUC score(0.89), ensuring reliable separation of churners and non-churners

In [4]:
# Predict churn probabilities
y_pred_prob = sv.predict_proba(x_test_scaled)[:,1]  # Probability for churn (Class 1)

# Save churn risk scores
churn_risk_df = pd.DataFrame({
    'Churn_Risk_Score': SV_y_pred_prob
})
print("Churn Risk Scores Generated")
churn_risk_df

NameError: name 'sv' is not defined

## GOAL 3: Introducing "CHURN-FLAG" Variable

Choosen Algorithm : LightGBM

LightGBM achieved the highest recall (0.936), ensuring that most churners were correctly flagged. While there were some misclassified cases, LightGBM accurately identified 147 out of 157 churners, making it the most effective model for minimizing missed churners.

In [None]:
# Create churn flag
churn_risk_df['Churn_Flag'] = (lgb_y_pred_prob > 0.5).astype(int)

# Save results with churn flag
print("Churn Risk Scores with Churn Flags:")
print(churn_risk_df.head())

# Save to CSV
churn_risk_df.to_csv('churn_risk_scores_with_flags.csv', index=False)

## GOAL 4: help to identify possible CHURN-FLAG YES customers and provide more attention in customer touch point areas, including customer care support, request fulfilment, auto categorizing tickets as high priority for quick resolutions any questions they may have etc.


### 1. Predicting Churn Flagged Customers¶

 . Chosen Algorithm : LightGBM

LightGBM was retained for this goal as its high recall aligns with the business objective of prioritizing churners. The model demonstrated strong performance by correctly flagging a significant number of churners while balancing acceptable misclassifications.

In [None]:
churn_risk_df['Churn_Flag'] = (lgb_y_pred_prob > 0.5).astype(int)

# Filter churn-flagged customers for high priority actions
churn_yes_customers = churn_risk_df[churn_risk_df['Churn_Flag'] == 1]

# Save flagged customers for further action
print("High Priority Churn-Flagged Customers:")
print(churn_yes_customers)

# Save to CSV for client analysis
churn_yes_customers.to_csv('high_priority_customers.csv', index=False)

In [None]:
print(churn_yes_customers.columns)

## 2. Customer Segmentation
Churn-flagged customers were segmented into:

High Priority: Customers with multiple unresolved tickets or frequent complaints.

Medium Priority: Customers with moderate engagement issues.

Low Priority: Customers with minor concerns.

In [None]:
# Simple segmentation based on churn risk score
churn_yes_customers['Priority_Level'] = churn_yes_customers['Churn_Risk_Score'].apply(lambda x: 
    'High' if x >= 0.8 else 
    'Medium' if 0.5 <= x < 0.8 else 
    'Low')

# View segmentation
print(churn_yes_customers[['Churn_Risk_Score', 'Priority_Level']].head())

# Save segmented data
churn_yes_customers.to_csv('segmented_churn_customers.csv', index=False)

### 3. Automating Ticket Categorization

An automated system was designed to categorize customer tickets into high, normal, and low priorities based on segmentation. This ensures prompt responses to high-priority customers.

In [None]:
# Function to auto-categorize tickets
def categorize_ticket(row):
    if row['Priority_Level'] == 'High':
        return 'High Priority'
    elif row['Priority_Level'] == 'Medium':
        return 'Normal Priority'
    else:
        return 'Low Priority'

# Apply categorization
churn_yes_customers['Ticket_Category'] = churn_yes_customers.apply(categorize_ticket, axis=1)

# Save for customer support system integration
churn_yes_customers.to_csv('categorized_tickets.csv', index=False)

# Display output
print("Automated Ticket Categories:")
print(churn_yes_customers[['Priority_Level', 'Ticket_Category']].head())

## 4. Actionable Insights¶

Insights and Dashboard Preparartion

In [None]:
# Summary statistics for churn analysis
priority_summary = churn_yes_customers['Priority_Level'].value_counts()
print("Priority Distribution:")
print(priority_summary)

# Save summary to a CSV
priority_summary.to_csv('priority_summary.csv', index=True)

# Overall Project Conclusion:

This project successfully addressed the client’s objectives by leveraging machine learning to predict churn, assign risk scores, and identify at-risk customers. The use of models like Logistic Regression, SVM, and LightGBM provided a balance of interpretability, accuracy, and recall, ensuring actionable insights for churn mitigation. These results equip the client with a data-driven foundation to reduce churn and enhance customer retention effectively.