<a href="https://colab.research.google.com/github/marcorrea1/AAI2026/blob/main/Coding_Exercise_ML_Basics_Part_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

# =========================================================
# Part 2: Customer Churn Prediction (Logistic Regression)
#
# Data Source (Cited):
# Source: Used Chatgbt SJSU in order to generate data that was used for this problem.
# Columns: age, monthly_usage_hours, purchase_amount, customer_service_calls, region, churn
# =========================================================

# -----------------------
# 1) Load the dataset from CSV
# -----------------------
csv_path = "customer_churn_data.csv"
df = pd.read_csv(csv_path)

# -----------------------
# 2) Split the data into X and Y in order to prepare for machine learning
# X = inputs that the model learns from
# Y = Out label we want to predict
# -----------------------


X = df[["age", "monthly_usage_hours", "purchase_amount", "customer_service_calls", "region"]]
y = df["churn"]

num_features = ["age", "monthly_usage_hours", "purchase_amount", "customer_service_calls"]
cat_features = ["region"]

# -----------------------
# 3) Preprocessing (prepare data for Logistic Regression)
# We are usng StandardScaler to transform columns into numeric columns so they
# have mean of 0 and std = 1.
# OneHotEncoder converts region into 0/1
# -----------------------

preprocessor = ColumnTransformer(
    transformers=[
        ("num", StandardScaler(), num_features),
        ("cat", OneHotEncoder(handle_unknown="ignore", sparse_output=False), cat_features),
    ]
)

# -----------------------
# 4) Build pipeline + train logistic regression
# -----------------------
model = Pipeline(steps=[
    ("preprocessor", preprocessor),
    ("classifier", LogisticRegression(max_iter=1000, random_state=42))
])

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model.fit(X_train, y_train)

# -----------------------
# 5) Predict churn probability for a new customer
# Using a example for our data
# -----------------------
new_customer = pd.DataFrame({
    "age": [35],
    "monthly_usage_hours": [20],
    "purchase_amount": [150],
    "customer_service_calls": [5],
    "region": ["West"],
})

churn_probability = model.predict_proba(new_customer)[0][1]

threshold = 0.5
churn_prediction = 1 if churn_probability > threshold else 0

print(f"Churn Probability for new customer: {churn_probability:.2f}")
print(f"Churn Prediction (1 = churn, 0 = no churn): {churn_prediction}")

# -----------------------
# 6) Model output and evaluation
# -----------------------

print(f"\nTest Accuracy: {model.score(X_test, y_test):.2f}")

ohe = model.named_steps["preprocessor"].named_transformers_["cat"]
feature_names = num_features + ohe.get_feature_names_out(cat_features).tolist()
coefficients = model.named_steps["classifier"].coef_[0]

print("\nModel Coefficients:")
for feature, coef in zip(feature_names, coefficients):
    print(f"{feature}: {coef:.3f}")


Churn Probability for new customer: 0.69
Churn Prediction (1 = churn, 0 = no churn): 1

Test Accuracy: 0.60

Model Coefficients:
age: 0.175
monthly_usage_hours: -0.374
purchase_amount: -0.153
customer_service_calls: 0.742
region_East: 0.968
region_North: -0.557
region_South: -0.273
region_West: -0.138


# Interpretation: Churn Probability and Business Use

Churn probabilty represents the probabilty that a customer will stop using the service. The values range from 0 to 1, if the value is high that means that there's a change that the customer will churn.

#Interpretation of Results

The model that we created has a churn of 0.69 for the new customer. This shows tha there's 69% that a customer will churn. Since the number is greater than 0.5 this means the customer will most likely churn.

Looking at the other results we can see that customers service calls has a strong result of 74%, age as a small positive of 17%. Compared to to usage where it has -37% and purchase amount -15% which mean higher usage and spending reduce likelihood of churn.

A business can use this infomation to reduce churn by helping to identify customers or regions that have high churn rate. This helps a company create a business plan and address the issues that it is facing.