# ML Model Evaluation with Cross-Validation

This notebook evaluates multiple regression models using cross-validation to predict wellness scores based on Sleep, Steps, and Mood features. We'll compare three models:

**Random Forest Regression**

**Linear Regression**

**Ridge Regression**


## Import Required Libraries

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split, cross_val_score, cross_validate
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import numpy as np


## Data Loading and Initial Exploration

Load the wellness dataset and examine its basic properties including shape, target variable range, and statistics.

In [4]:
# Load your labeled dataset
df = pd.read_csv("/content/WellnessScores.csv")  # Replace with actual path if needed
print(f"Dataset loaded: {df.shape[0]} samples, {df.shape[1]} features")

# Features and target
X = df[['Sleep', 'Steps', 'Mood']]
y = df['Wellness_Score']

print(f"Target variable range: {y.min():.2f} - {y.max():.2f}")
print(f"Target variable mean: {y.mean():.2f} ± {y.std():.2f}")
print(f"Feature columns: {list(X.columns)}")

Dataset loaded: 2270 samples, 7 features
Target variable range: 0.00 - 100.00
Target variable mean: 70.84 ± 18.80
Feature columns: ['Sleep', 'Steps', 'Mood']


## Data Preparation and Model Definition

Split the data into training and testing sets, then define our three regression models with appropriate preprocessing pipelines.

In [5]:
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(f"Train/Test split: {X_train.shape[0]} train, {X_test.shape[0]} test samples")

# Define models with pipelines (for consistent preprocessing)
models = {
    "Random Forest": Pipeline([
        ('rf', RandomForestRegressor(random_state=42, n_estimators=100))
    ]),
    "Linear Regression": Pipeline([
        ('scaler', StandardScaler()),
        ('lr', LinearRegression())
    ]),
    "Ridge Regression": Pipeline([
        ('scaler', StandardScaler()),
        ('ridge', Ridge(alpha=1.0))
    ])
}

# Cross-validation setup
cv_folds = 5
scoring_metrics = ['neg_mean_absolute_error', 'neg_root_mean_squared_error', 'r2']

print(f"Cross-Validation Setup:")
print(f"   • Folds: {cv_folds}")
print(f"   • Metrics: MAE, RMSE, R²")
print(f"   • Random state: 42")

# Results storage
results = {}

Train/Test split: 1816 train, 454 test samples
Cross-Validation Setup:
   • Folds: 5
   • Metrics: MAE, RMSE, R²
   • Random state: 42


## Random Forest Regression

Evaluate the Random Forest model using cross-validation and holdout test set. Random Forest is an ensemble method that typically handles non-linear relationships well and is less prone to overfitting than individual decision trees.

In [6]:
# Random Forest Model Evaluation
model_name = "Random Forest"
model = models[model_name]


print(f"Evaluating: {model_name}")


# Cross-validation evaluation
print(f"Cross-Validation Results ({cv_folds}-fold):")
print("-" * 45)

cv_results = cross_validate(
    model, X_train, y_train,
    cv=cv_folds,
    scoring=scoring_metrics,
    return_train_score=False,
    n_jobs=-1
)

# Extract and convert scores
cv_mae_scores = -cv_results['test_neg_mean_absolute_error']
cv_rmse_scores = -cv_results['test_neg_root_mean_squared_error']
cv_r2_scores = cv_results['test_r2']

# Calculate statistics
mae_mean, mae_std = cv_mae_scores.mean(), cv_mae_scores.std()
rmse_mean, rmse_std = cv_rmse_scores.mean(), cv_rmse_scores.std()
r2_mean, r2_std = cv_r2_scores.mean(), cv_r2_scores.std()

print(f"MAE:  {mae_mean:.2f} ± {mae_std:.2f}")
print(f"RMSE: {rmse_mean:.2f} ± {rmse_std:.2f}")
print(f"R²:   {r2_mean:.3f} ± {r2_std:.3f}")

# Individual fold results
print(f"Individual Fold Results:")
print("Fold    MAE     RMSE    R²")
print("-" * 30)
for i in range(cv_folds):
    print(f"{i+1:>2}   {cv_mae_scores[i]:>6.2f}  {cv_rmse_scores[i]:>6.2f}  {cv_r2_scores[i]:>6.3f}")

# Holdout test set evaluation
print(f"Holdout Test Set Results:")
print("-" * 32)

model.fit(X_train, y_train)
test_preds = model.predict(X_test)

test_mae = mean_absolute_error(y_test, test_preds)
test_rmse = np.sqrt(mean_squared_error(y_test, test_preds))
test_r2 = r2_score(y_test, test_preds)

print(f"MAE:  {test_mae:.2f}")
print(f"RMSE: {test_rmse:.2f}")
print(f"R²:   {test_r2:.3f}")

# Overfitting analysis
print(f"Overfitting Analysis:")
print("-" * 25)

mae_diff = abs(test_mae - mae_mean)
rmse_diff = abs(test_rmse - rmse_mean)
r2_diff = abs(test_r2 - r2_mean)

print(f"MAE difference (CV vs Test):  {mae_diff:.2f}")
print(f"RMSE difference (CV vs Test): {rmse_diff:.2f}")
print(f"R² difference (CV vs Test):   {r2_diff:.3f}")

if mae_diff > mae_std * 2:
    print("Warning: Potential overfitting detected (MAE)")
if rmse_diff > rmse_std * 2:
    print("Warning: Potential overfitting detected (RMSE)")
if r2_diff > r2_std * 2:
    print("Warning: Potential overfitting detected (R²)")

# Store results
results[model_name] = {
    'cv_mae': (mae_mean, mae_std),
    'cv_rmse': (rmse_mean, rmse_std),
    'cv_r2': (r2_mean, r2_std),
    'test_mae': test_mae,
    'test_rmse': test_rmse,
    'test_r2': test_r2,
    'cv_scores': {
        'mae': cv_mae_scores,
        'rmse': cv_rmse_scores,
        'r2': cv_r2_scores
    }
}

Evaluating: Random Forest
Cross-Validation Results (5-fold):
---------------------------------------------
MAE:  0.86 ± 0.08
RMSE: 1.56 ± 0.30
R²:   0.993 ± 0.002
Individual Fold Results:
Fold    MAE     RMSE    R²
------------------------------
 1     0.91    1.64   0.993
 2     0.79    1.24   0.995
 3     0.93    1.73   0.992
 4     0.74    1.21   0.996
 5     0.94    2.00   0.990
Holdout Test Set Results:
--------------------------------
MAE:  0.76
RMSE: 1.51
R²:   0.992
Overfitting Analysis:
-------------------------
MAE difference (CV vs Test):  0.10
RMSE difference (CV vs Test): 0.05
R² difference (CV vs Test):   0.001


## Linear Regression

Evaluate the Linear Regression model. This is a simple baseline model that assumes a linear relationship between features and target. It includes StandardScaler preprocessing to handle different feature scales.

In [7]:
# Linear Regression Model Evaluation
model_name = "Linear Regression"
model = models[model_name]

print(f"{'='*60}")
print(f"Evaluating: {model_name}")
print(f"{'='*60}")

# Cross-validation evaluation
print(f"Cross-Validation Results ({cv_folds}-fold):")
print("-" * 45)

cv_results = cross_validate(
    model, X_train, y_train,
    cv=cv_folds,
    scoring=scoring_metrics,
    return_train_score=False,
    n_jobs=-1
)

# Extract and convert scores
cv_mae_scores = -cv_results['test_neg_mean_absolute_error']
cv_rmse_scores = -cv_results['test_neg_root_mean_squared_error']
cv_r2_scores = cv_results['test_r2']

# Calculate statistics
mae_mean, mae_std = cv_mae_scores.mean(), cv_mae_scores.std()
rmse_mean, rmse_std = cv_rmse_scores.mean(), cv_rmse_scores.std()
r2_mean, r2_std = cv_r2_scores.mean(), cv_r2_scores.std()

print(f"MAE:  {mae_mean:.2f} ± {mae_std:.2f}")
print(f"RMSE: {rmse_mean:.2f} ± {rmse_std:.2f}")
print(f"R²:   {r2_mean:.3f} ± {r2_std:.3f}")

# Individual fold results
print(f"Individual Fold Results:")
print("Fold    MAE     RMSE    R²")
print("-" * 30)
for i in range(cv_folds):
    print(f"{i+1:>2}   {cv_mae_scores[i]:>6.2f}  {cv_rmse_scores[i]:>6.2f}  {cv_r2_scores[i]:>6.3f}")

# Holdout test set evaluation
print(f"Holdout Test Set Results:")
print("-" * 32)

model.fit(X_train, y_train)
test_preds = model.predict(X_test)

test_mae = mean_absolute_error(y_test, test_preds)
test_rmse = np.sqrt(mean_squared_error(y_test, test_preds))
test_r2 = r2_score(y_test, test_preds)

print(f"MAE:  {test_mae:.2f}")
print(f"RMSE: {test_rmse:.2f}")
print(f"R²:   {test_r2:.3f}")

# Overfitting analysis
print(f"Overfitting Analysis:")
print("-" * 25)

mae_diff = abs(test_mae - mae_mean)
rmse_diff = abs(test_rmse - rmse_mean)
r2_diff = abs(test_r2 - r2_mean)

print(f"MAE difference (CV vs Test):  {mae_diff:.2f}")
print(f"RMSE difference (CV vs Test): {rmse_diff:.2f}")
print(f"R² difference (CV vs Test):   {r2_diff:.3f}")

if mae_diff > mae_std * 2:
    print("Warning: Potential overfitting detected (MAE)")
if rmse_diff > rmse_std * 2:
    print("Warning: Potential overfitting detected (RMSE)")
if r2_diff > r2_std * 2:
    print("Warning: Potential overfitting detected (R²)")

# Store results
results[model_name] = {
    'cv_mae': (mae_mean, mae_std),
    'cv_rmse': (rmse_mean, rmse_std),
    'cv_r2': (r2_mean, r2_std),
    'test_mae': test_mae,
    'test_rmse': test_rmse,
    'test_r2': test_r2,
    'cv_scores': {
        'mae': cv_mae_scores,
        'rmse': cv_rmse_scores,
        'r2': cv_r2_scores
    }
}

Evaluating: Linear Regression
Cross-Validation Results (5-fold):
---------------------------------------------
MAE:  8.74 ± 0.31
RMSE: 13.18 ± 0.94
R²:   0.523 ± 0.053
Individual Fold Results:
Fold    MAE     RMSE    R²
------------------------------
 1     8.99   13.78   0.492
 2     8.16   11.47   0.570
 3     8.91   14.16   0.433
 4     8.67   12.99   0.555
 5     8.98   13.51   0.565
Holdout Test Set Results:
--------------------------------
MAE:  9.26
RMSE: 14.16
R²:   0.326
Overfitting Analysis:
-------------------------
MAE difference (CV vs Test):  0.51
RMSE difference (CV vs Test): 0.98
R² difference (CV vs Test):   0.197


## Ridge Regression

Evaluate the Ridge Regression model. Ridge regression adds L2 regularization to linear regression, which helps prevent overfitting by penalizing large coefficient values. This is particularly useful when dealing with multicollinearity or when you have limited training data.

In [8]:
# Ridge Regression Model Evaluation
model_name = "Ridge Regression"
model = models[model_name]

print(f"{'='*60}")
print(f"Evaluating: {model_name}")
print(f"{'='*60}")

# Cross-validation evaluation
print(f"Cross-Validation Results ({cv_folds}-fold):")
print("-" * 45)

cv_results = cross_validate(
    model, X_train, y_train,
    cv=cv_folds,
    scoring=scoring_metrics,
    return_train_score=False,
    n_jobs=-1
)

# Extract and convert scores
cv_mae_scores = -cv_results['test_neg_mean_absolute_error']
cv_rmse_scores = -cv_results['test_neg_root_mean_squared_error']
cv_r2_scores = cv_results['test_r2']

# Calculate statistics
mae_mean, mae_std = cv_mae_scores.mean(), cv_mae_scores.std()
rmse_mean, rmse_std = cv_rmse_scores.mean(), cv_rmse_scores.std()
r2_mean, r2_std = cv_r2_scores.mean(), cv_r2_scores.std()

print(f"MAE:  {mae_mean:.2f} ± {mae_std:.2f}")
print(f"RMSE: {rmse_mean:.2f} ± {rmse_std:.2f}")
print(f"R²:   {r2_mean:.3f} ± {r2_std:.3f}")

# Individual fold results
print(f"Individual Fold Results:")
print("Fold    MAE     RMSE    R²")
print("-" * 30)
for i in range(cv_folds):
    print(f"{i+1:>2}   {cv_mae_scores[i]:>6.2f}  {cv_rmse_scores[i]:>6.2f}  {cv_r2_scores[i]:>6.3f}")

# Holdout test set evaluation
print(f"Holdout Test Set Results:")
print("-" * 32)

model.fit(X_train, y_train)
test_preds = model.predict(X_test)

test_mae = mean_absolute_error(y_test, test_preds)
test_rmse = np.sqrt(mean_squared_error(y_test, test_preds))
test_r2 = r2_score(y_test, test_preds)

print(f"MAE:  {test_mae:.2f}")
print(f"RMSE: {test_rmse:.2f}")
print(f"R²:   {test_r2:.3f}")

# Overfitting analysis
print(f"Overfitting Analysis:")
print("-" * 25)

mae_diff = abs(test_mae - mae_mean)
rmse_diff = abs(test_rmse - rmse_mean)
r2_diff = abs(test_r2 - r2_mean)

print(f"MAE difference (CV vs Test):  {mae_diff:.2f}")
print(f"RMSE difference (CV vs Test): {rmse_diff:.2f}")
print(f"R² difference (CV vs Test):   {r2_diff:.3f}")

if mae_diff > mae_std * 2:
    print("Warning: Potential overfitting detected (MAE)")
if rmse_diff > rmse_std * 2:
    print("Warning: Potential overfitting detected (RMSE)")
if r2_diff > r2_std * 2:
    print("Warning: Potential overfitting detected (R²)")

# Store results
results[model_name] = {
    'cv_mae': (mae_mean, mae_std),
    'cv_rmse': (rmse_mean, rmse_std),
    'cv_r2': (r2_mean, r2_std),
    'test_mae': test_mae,
    'test_rmse': test_rmse,
    'test_r2': test_r2,
    'cv_scores': {
        'mae': cv_mae_scores,
        'rmse': cv_rmse_scores,
        'r2': cv_r2_scores
    }
}

Evaluating: Ridge Regression
Cross-Validation Results (5-fold):
---------------------------------------------
MAE:  8.74 ± 0.31
RMSE: 13.18 ± 0.94
R²:   0.523 ± 0.053
Individual Fold Results:
Fold    MAE     RMSE    R²
------------------------------
 1     8.99   13.78   0.492
 2     8.16   11.47   0.570
 3     8.91   14.16   0.433
 4     8.67   12.99   0.555
 5     8.98   13.51   0.565
Holdout Test Set Results:
--------------------------------
MAE:  9.26
RMSE: 14.16
R²:   0.326
Overfitting Analysis:
-------------------------
MAE difference (CV vs Test):  0.51
RMSE difference (CV vs Test): 0.98
R² difference (CV vs Test):   0.197


## Model Comparison and Analysis

Compare all three models side by side to identify the best performing model and analyze their relative strengths and weaknesses.

In [9]:
# Comprehensive Model Comparison
print(f"{'='*60}")
print(f"COMPREHENSIVE MODEL COMPARISON")
print(f"{'='*60}")

print(f"Cross-Validation Performance Summary:")
print("-" * 50)
print("Model".ljust(18) + "MAE".ljust(12) + "RMSE".ljust(12) + "R²")
print("-" * 50)

for name, res in results.items():
    mae_mean, mae_std = res['cv_mae']
    rmse_mean, rmse_std = res['cv_rmse']
    r2_mean, r2_std = res['cv_r2']

    mae_str = f"{mae_mean:.2f}±{mae_std:.2f}"
    rmse_str = f"{rmse_mean:.2f}±{rmse_std:.2f}"
    r2_str = f"{r2_mean:.3f}±{r2_std:.3f}"

    print(f"{name:<18}{mae_str:<12}{rmse_str:<12}{r2_str}")

print(f"Test Set Performance Summary:")
print("-" * 40)
print("Model".ljust(18) + "MAE".ljust(8) + "RMSE".ljust(8) + "R²")
print("-" * 40)

for name, res in results.items():
    mae_str = f"{res['test_mae']:.2f}"
    rmse_str = f"{res['test_rmse']:.2f}"
    r2_str = f"{res['test_r2']:.3f}"

    print(f"{name:<18}{mae_str:<8}{rmse_str:<8}{r2_str}")

COMPREHENSIVE MODEL COMPARISON
Cross-Validation Performance Summary:
--------------------------------------------------
Model             MAE         RMSE        R²
--------------------------------------------------
Random Forest     0.86±0.08   1.56±0.30   0.993±0.002
Linear Regression 8.74±0.31   13.18±0.94  0.523±0.053
Ridge Regression  8.74±0.31   13.18±0.94  0.523±0.053
Test Set Performance Summary:
----------------------------------------
Model             MAE     RMSE    R²
----------------------------------------
Random Forest     0.76    1.51    0.992
Linear Regression 9.26    14.16   0.326
Ridge Regression  9.26    14.16   0.326


## Final Recommendations

Analyze the results to provide recommendations on the best model choice and insights for potential improvements.

In [11]:
# Model Recommendations
print(f"MODEL RECOMMENDATIONS")
print("-" * 25)

# Find best model based on CV R² score
best_model = max(results.keys(), key=lambda x: results[x]['cv_r2'][0])
best_r2 = results[best_model]['cv_r2'][0]

print(f"Best performing model: {best_model}")
print(f"   Cross-validation R²: {best_r2:.3f}")

# Check for overfitting in best model
best_cv_r2_std = results[best_model]['cv_r2'][1]
best_test_r2 = results[best_model]['test_r2']
r2_gap = abs(best_r2 - best_test_r2)

if r2_gap > best_cv_r2_std * 2:
    print(f"Warning: Potential overfitting detected in {best_model}")
    print(f"   CV R²: {best_r2:.3f}, Test R²: {best_test_r2:.3f}")
    print(f"   Consider regularization or feature engineering")
else:
    print(f"✓ {best_model} shows consistent performance")


MODEL RECOMMENDATIONS
-------------------------
Best performing model: Random Forest
   Cross-validation R²: 0.993
✓ Random Forest shows consistent performance


## Save the Best Performing Model

Save the Random Forest model to disk using joblib for future use.

In [15]:
import joblib
from datetime import datetime

# Get the trained Random Forest model
rf_model = models["Random Forest"]

model_filename = f"random_forest_wellness_model.pkl"

# Save the model
joblib.dump(rf_model, model_filename)

print(f"Random Forest model saved as: {model_filename}")
print(f"Model performance summary:")
print(f"  - Cross-validation R²: {results['Random Forest']['cv_r2'][0]:.3f} ± {results['Random Forest']['cv_r2'][1]:.3f}")
print(f"  - Test set R²: {results['Random Forest']['test_r2']:.3f}")
print(f"  - Test set MAE: {results['Random Forest']['test_mae']:.2f}")


# Display model parameters
print(f"\nModel configuration:")
rf_params = rf_model.named_steps['rf'].get_params()
key_params = ['n_estimators', 'max_depth', 'min_samples_split', 'min_samples_leaf', 'random_state']
for param in key_params:
    if param in rf_params:
        print(f"  - {param}: {rf_params[param]}")

Random Forest model saved as: random_forest_wellness_model.pkl
Model performance summary:
  - Cross-validation R²: 0.993 ± 0.002
  - Test set R²: 0.992
  - Test set MAE: 0.76

Model configuration:
  - n_estimators: 100
  - max_depth: None
  - min_samples_split: 2
  - min_samples_leaf: 1
  - random_state: 42


In [14]:
from google.colab import files
files.download(model_filename)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>