# **TASK 2 â€” Customer Churn Prediction Pipeline**


**1. Problem Statement & Objective**


**Problem Statement:**

Telecom companies lose revenue when customers leave unexpectedly.

**Objective:**

Build a reusable ML pipeline to predict customer churn using automated preprocessing and model tuning.

**2. Dataset Loading & Preprocessing**


**Dataset:** IBM Telco Customer Churn

**Steps:**

Loaded CSV dataset

Removed customerID column

Handled missing values in TotalCharges

Encoded categorical features

Scaled numerical features

All steps were automated using Pipeline and ColumnTransformer.

**3. Model Development & Training**

**Models:**

Logistic Regression

Random Forest

Used GridSearchCV to tune hyperparameters automatically.

Best model selected based on validation accuracy.

**4. Evaluation with Metrics**

**Final Model:** Logistic Regression

**Metric	Value**

Accuracy	80%
Precision (Churn)	85%
Recall (Churn)	89%
F1-Score	87%

**6. Final Summary / Insights**

The pipeline enables direct deployment without retraining. It demonstrates production-ready ML workflow automation.


In [None]:
# Task 2 - Telco Churn Pipeline

import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.metrics import accuracy_score, classification_report
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
import joblib


# Load Dataset


url = "https://raw.githubusercontent.com/IBM/telco-customer-churn-on-icp4d/master/data/Telco-Customer-Churn.csv"
data = pd.read_csv(url)

data.head()


# Data Cleaning


data["TotalCharges"] = pd.to_numeric(data["TotalCharges"], errors="coerce")
data["TotalCharges"] = data["TotalCharges"].fillna(data["TotalCharges"].median())

X = data.drop(columns=["customerID","Churn"])
y = data["Churn"].map({"Yes":1,"No":0})


# Split Data


X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)


# Column Types


numeric_features = ["tenure","MonthlyCharges","TotalCharges"]
categorical_features = [col for col in X.columns if col not in numeric_features]


# Preprocessing Pipelines


numeric_transformer = Pipeline(steps=[
    ("scaler", StandardScaler())
])

categorical_transformer = Pipeline(steps=[
    ("encoder", OneHotEncoder(handle_unknown="ignore"))
])

preprocessor = ColumnTransformer(
    transformers=[
        ("num", numeric_transformer, numeric_features),
        ("cat", categorical_transformer, categorical_features)
    ]
)


# Define Models


log_reg = Pipeline(steps=[
    ("preprocessor", preprocessor),
    ("classifier", LogisticRegression(max_iter=200))
])

rf_model = Pipeline(steps=[
    ("preprocessor", preprocessor),
    ("classifier", RandomForestClassifier())
])


# Grid Search Parameters


param_grid = [
    {
        "classifier": [LogisticRegression(max_iter=200)],
        "classifier__C": [0.1, 1, 10]
    },
    {
        "classifier": [RandomForestClassifier()],
        "classifier__n_estimators": [100, 200],
        "classifier__max_depth": [None, 10, 20]
    }
]

grid = GridSearchCV(
    Pipeline([("preprocessor", preprocessor), ("classifier", LogisticRegression())]),
    param_grid,
    cv=3,
    scoring="accuracy",
    n_jobs=-1
)


# Train Model


grid.fit(X_train, y_train)

print("Best Model:")
print(grid.best_params_)


# Evaluate


y_pred = grid.predict(X_test)

print("\nAccuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))


# Save Final Pipeline


joblib.dump(grid.best_estimator_, "telco_churn_pipeline.joblib")

print("\nSaved Pipeline as telco_churn_pipeline.joblib")


Best Model:
{'classifier': LogisticRegression(max_iter=200), 'classifier__C': 10}

Accuracy: 0.8055358410220014

Classification Report:
               precision    recall  f1-score   support

           0       0.85      0.89      0.87      1035
           1       0.66      0.56      0.60       374

    accuracy                           0.81      1409
   macro avg       0.75      0.73      0.74      1409
weighted avg       0.80      0.81      0.80      1409


Saved Pipeline as telco_churn_pipeline.joblib
