# Churn Prediction Model Analysis
This notebook analyzes and compares multiple models for predicting customer churn in a telecom dataset. It includes preprocessing, model training, evaluation, and feature importance analysis.

## Load and Preprocess Data

In [None]:
import pandas as pd

# Load dataset
df = pd.read_csv('churn-bigml-80.csv')

# Drop non-predictive column and encode binary values
df = df.drop(['Account length'], axis=1)
df = df.replace({'Yes': 1, 'No': 0})
df_encoded = pd.get_dummies(df, columns=['State', 'Area code'])
df_encoded.head()

We load the dataset, drop the non-informative 'Account length' feature, and encode categorical features using one-hot encoding.

## Prepare Features and Labels

In [None]:
X = df_encoded.drop('Churn', axis=1)
y = df_encoded['Churn'].astype(int)

Here, we separate the feature matrix `X` and the target variable `y` for model training.

## Train Models

In [None]:
from sklearn.linear_model import LogisticRegression, Ridge, Lasso
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier

# Train models
logreg = LogisticRegression(max_iter=1000).fit(X, y)
ridge_best = Ridge(alpha=1.0, max_iter=10000).fit(X, y)
lasso_best = Lasso(alpha=0.1, max_iter=10000).fit(X, y)
dtree = DecisionTreeClassifier(random_state=42).fit(X, y)
rf = RandomForestClassifier(random_state=42, n_estimators=100).fit(X, y)
gbc = GradientBoostingClassifier(random_state=42, n_estimators=100).fit(X, y)

We train six models: Logistic Regression, Ridge, Lasso, Decision Tree, Random Forest, and Gradient Boosting using the preprocessed training data.

## Load and Prepare Testing Data

In [None]:
testingDF = pd.read_csv('churn-bigml-20.csv')
testingDF = testingDF.drop(['Account length'], axis=1)
testingDF = testingDF.replace({'Yes': 1, 'No': 0})
testingDF = pd.get_dummies(testingDF, columns=['State', 'Area code'])

# Align columns with training set
missing_cols = set(X.columns) - set(testingDF.columns)
for col in missing_cols:
    testingDF[col] = 0
testingDF = testingDF[X.columns]
X_test = testingDF.drop('Churn', axis=1)
y_test = testingDF['Churn']

We load and preprocess the test set in the same way as the training data, ensuring feature alignment.

## Evaluate Model Performance

In [None]:
from sklearn.metrics import classification_report

print("Logistic Regression\n", classification_report(y_test, logreg.predict(X_test)))
print("Ridge Regression\n", classification_report(y_test, (ridge_best.predict(X_test) >= 0.5).astype(int)))
print("Lasso Regression\n", classification_report(y_test, (lasso_best.predict(X_test) >= 0.5).astype(int)))
print("Decision Tree\n", classification_report(y_test, dtree.predict(X_test)))
print("Random Forest\n", classification_report(y_test, rf.predict(X_test)))
print("Gradient Boosting\n", classification_report(y_test, gbc.predict(X_test)))

We evaluate each model using the `classification_report`, which includes precision, recall, f1-score, and support.

## Feature Importance and Model Coefficients

In [None]:
importance_data = {
    'Logistic Regression': pd.Series(logreg.coef_[0], index=X.columns),
    'Ridge Regression': pd.Series(ridge_best.coef_, index=X.columns),
    'Lasso Regression': pd.Series(lasso_best.coef_, index=X.columns),
    'Decision Tree': pd.Series(dtree.feature_importances_, index=X.columns),
    'Random Forest': pd.Series(rf.feature_importances_, index=X.columns),
    'Gradient Boosting': pd.Series(gbc.feature_importances_, index=X.columns),
}

importance_df = pd.DataFrame(importance_data)
importance_df

This table shows the feature coefficients for regression models and feature importances for tree-based models.

## Summary and Discussion

### Best Performing Model
Based on F1-score and balanced performance across precision and recall, **Gradient Boosting** often yields the best performance. It balances overfitting and generalization by building trees sequentially.

### Bias-Variance Trade-off
- **Logistic, Ridge, Lasso**: High bias, low variance — simpler models but may underfit complex patterns.
- **Tree-based models**: Decision Trees have high variance and overfit easily, while Random Forest and Gradient Boosting reduce variance using ensemble strategies.

### Interpretability vs. Predictive Power
- **Regression models**: More interpretable, with coefficients directly linked to features.
- **Tree ensembles**: Less interpretable but usually more powerful in prediction due to capturing nonlinear relationships.

### Real-world Implications
A telecom company can use these models to:
- **Identify customers likely to churn** and proactively offer incentives.
- **Understand key drivers** of churn using feature importances (e.g., 'International plan', 'Total day minutes').
- **Segment customers** for targeted retention strategies, reducing revenue loss and improving satisfaction.

Ultimately, the model selection depends on whether the priority is **accuracy** (Gradient Boosting) or **interpretability** (Logistic Regression or Lasso).