Supervised Learning – Regression

Train/Test Split and Regression with MAE & RMSE Comparison

In [2]:
# 📘 Customer Churn Prediction - Regression Model Evaluation

# Import libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error
import math

# Load dataset
data = pd.read_csv("customer_churn_dataset.csv")

# Display first few rows
print("📂 First 5 rows of dataset:")
display(data.head())

# Check for missing values
print("\n🔍 Missing Values:")
print(data.isnull().sum())

# ------------------------------
# Data Preprocessing
# ------------------------------

# Convert categorical columns to numeric using label encoding / dummy variables
data_encoded = pd.get_dummies(data, drop_first=True)

# Identify the target variable (you can change if your target column name is different)
target = 'Churn'   # make sure your dataset has this column

if target not in data_encoded.columns:
    raise ValueError("⚠️ 'Churn' column not found. Please check your dataset target variable name.")

# Define features (X) and target (y)
X = data_encoded.drop(target, axis=1)
y = data_encoded[target]

# Split data into train (80%) and test (20%)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("\n✅ Data split complete:")
print(f"Training samples: {len(X_train)}, Testing samples: {len(X_test)}")

# ------------------------------
# Apply Regression Model
# ------------------------------
model = LinearRegression()
model.fit(X_train, y_train)

# Predict on test data
y_pred = model.predict(X_test)

# ------------------------------
# Evaluate Model Performance
# ------------------------------
mae = mean_absolute_error(y_test, y_pred)
rmse = math.sqrt(mean_squared_error(y_test, y_pred))

print("\n📊 Regression Model Performance:")
print(f"Mean Absolute Error (MAE): {mae:.4f}")
print(f"Root Mean Squared Error (RMSE): {rmse:.4f}")

# ------------------------------
# Show Sample Predictions
# ------------------------------
comparison_df = pd.DataFrame({
    'Actual': y_test.values[:10],
    'Predicted': y_pred[:10]
})
print("\n🔎 Sample Predictions:")
display(comparison_df)

📂 First 5 rows of dataset:


Unnamed: 0,CustomerID,Age,Gender,Tenure,Usage Frequency,Support Calls,Payment Delay,Subscription Type,Contract Length,Total Spend,Last Interaction,Churn
0,1,22,Female,25,14,4,27,Basic,Monthly,598,9,1
1,2,41,Female,28,28,7,13,Standard,Monthly,584,20,0
2,3,47,Male,27,10,2,29,Premium,Annual,757,21,0
3,4,35,Male,9,12,5,17,Premium,Quarterly,232,18,0
4,5,53,Female,58,24,9,2,Standard,Annual,533,18,0



🔍 Missing Values:
CustomerID           0
Age                  0
Gender               0
Tenure               0
Usage Frequency      0
Support Calls        0
Payment Delay        0
Subscription Type    0
Contract Length      0
Total Spend          0
Last Interaction     0
Churn                0
dtype: int64

✅ Data split complete:
Training samples: 51499, Testing samples: 12875

📊 Regression Model Performance:
Mean Absolute Error (MAE): 0.2686
Root Mean Squared Error (RMSE): 0.3317

🔎 Sample Predictions:


Unnamed: 0,Actual,Predicted
0,0,-0.118037
1,0,-0.085518
2,1,0.856862
3,0,0.100889
4,0,0.462535
5,0,0.438752
6,0,0.102092
7,0,-0.177136
8,1,0.358974
9,0,-0.195631
