#Demo 3 - Analyze feature contributions in Ensemble Models
##**Scenario: Predicting Customer Churn in a Subscription Service**

A video streaming company wants to optimize ensemble machine learning models (like Random Forest and Gradient Boosting) for predicting customer churn. Beyond achieving high accuracy, the team wants to understand which hyperparameters most influence model performance â€” so they can design efficient tuning strategies.



##**Objective:**
Analyze feature contributions in ensemble models to explain the underlying reasons for customer churn. By quantifying the importance and influence of different input features (e.g., number of support calls, payment issues, subscription type), the business aims to:

* Identify key drivers of churn.

* Take targeted actions (e.g., better support, loyalty rewards).

* Communicate model findings to non-technical stakeholders.

* Build trust in predictive models through interpretable results.

## Step 1: Importing Required Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, classification_report

## Step 2: Loading Dataset

In [None]:
df = pd.read_csv("customer_churn_data.csv")

## Step 3: Encoding Categorical Variables
 Encode 'ContractType' into numerical format (e.g., Monthly = 1, Yearly = 0)

In [None]:
le = LabelEncoder()
df["ContractType"] = le.fit_transform(df["ContractType"])

## Step 4: Preparing Feature Matrix (X) and Target Vector (y)
 Drop irrelevant columns and set target variable

In [None]:
X = df.drop(columns=["CustomerID", "Churn"])
y = df["Churn"]

## Step 5: Splitting the Dataset
 Split into train and test sets (80% train, 20% test)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Step 6: Defining Hyperparameter Grid
 These are the hyperparameters we want to evaluate for their contribution

In [None]:
param_grid = {
    'n_estimators': [50, 100],
    'max_depth': [None, 5, 10],
    'min_samples_split': [2, 5],
    'max_features': ['sqrt', 'log2'],
    'bootstrap': [True, False]
}

##Step 7: Running Grid Search with Cross-Validation
 This will train models for all hyperparameter combinations and evaluate their performance

In [None]:
grid_search = GridSearchCV(
    estimator=RandomForestClassifier(random_state=42),
    param_grid=param_grid,
    cv=5,  # 5-fold cross-validation
    scoring='accuracy',
    n_jobs=-1,
    verbose=1
)
grid_search.fit(X_train, y_train)

## Step 8: Evaluating the Best Model
Get the best performing hyperparameter combination and evaluate on the test set

In [None]:
print("Best Hyperparameters Found:")
print(grid_search.best_params_)

best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)

print("\nClassification Report on Test Data:")
print(classification_report(y_test, y_pred))

## Step 9: Analyzing Hyperparameter Contribution via Grid Scores
Convert grid search results to DataFrame for analysis

In [None]:
results_df = pd.DataFrame(grid_search.cv_results_)

# Select relevant columns for hyperparameters and mean test scores
param_cols = [col for col in results_df.columns if col.startswith('param_')]
score_col = 'mean_test_score'

# Sort and display top 10 configurations
sorted_results = results_df[param_cols + [score_col]].sort_values(by=score_col, ascending=False)
print("\nTop 10 Hyperparameter Combinations by Accuracy:")
print(sorted_results.head(10))

## Step 10: Visualizing Interaction Between Two Hyperparameters Using Heatmap
Here we analyze how combinations of max_depth and min_samples_split affect model accuracy

In [None]:
# Pivot the results into a 2D heatmap-friendly format
heatmap_data = results_df.pivot_table(
    index='param_max_depth',
    columns='param_min_samples_split',
    values='mean_test_score'
)

# Plot the heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(heatmap_data, annot=True, fmt=".3f", cmap="YlGnBu")
plt.title("Hyperparameter Interaction: max_depth vs min_samples_split")
plt.xlabel("min_samples_split")
plt.ylabel("max_depth")
plt.tight_layout()
plt.show()
