<a href="https://colab.research.google.com/github/fengfrankgthb/BUS-41204/blob/main/make_score_example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Example make_score()

In [2]:
import numpy as np
from sklearn.metrics import make_scorer, mean_squared_error
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression

# 1. Define a custom scoring function
def custom_rmse(y_true, y_pred):
    """Custom Root Mean Squared Error (RMSE) function."""
    mse = mean_squared_error(y_true, y_pred)
    rmse = np.sqrt(mse)
    return -rmse  # Return negative RMSE so higher is better for make_scorer

# 2. Create a scorer object using make_scorer
rmse_scorer = make_scorer(custom_rmse, greater_is_better=True)

# 3. Prepare some sample data
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 5, 4, 5])

# 4. Train a simple model
model = LinearRegression()

# 5. Use the custom scorer in cross_val_score
scores = cross_val_score(model, X, y, cv=3, scoring=rmse_scorer)

# 6. Examine the results

print("Raw scores returned by cross_val_score:", scores)
print("Mean cross-validation RMSE:", -np.mean(scores)) # Invert back to positive RMSE

print("\n--- Explanation ---")
print("The 'cross_val_score' function used our 'rmse_scorer' to evaluate the LinearRegression model across 3 cross-validation folds.")
print("For each fold, the model was trained on a subset of the data and evaluated on the remaining part.")
print("The 'rmse_scorer' called our 'custom_rmse' function, which calculated the RMSE and returned its negative.")
print("Therefore, the 'scores' array contains the negative RMSE values for each fold.")
print("To get the actual RMSE values, we negate the scores.")
print("A lower positive RMSE indicates better model performance.")

print("\n--- Another Example with a Different Metric ---")

from sklearn.metrics import r2_score

# 1. Define a custom R-squared scorer (greater_is_better is True by default)
r2_scorer = make_scorer(r2_score)

# 2. Use the custom scorer in cross_val_score
r2_scores = cross_val_score(model, X, y, cv=3, scoring=r2_scorer)

print("Raw R-squared scores:", r2_scores)
print("Mean cross-validation R-squared:", np.mean(r2_scores))
print("Here, higher R-squared values indicate better fit, and 'greater_is_better' was the default.")

print("\n--- Example with a Scoring Function Where Lower is Better ---")

def custom_mae(y_true, y_pred):
    """Custom Mean Absolute Error (MAE) function."""
    mae = np.mean(np.abs(y_true - y_pred))
    return mae  # Return positive MAE, but we'll tell make_scorer lower is better

# 1. Create a scorer object specifying greater_is_better=False
mae_scorer = make_scorer(custom_mae, greater_is_better=False)

# 2. Use the custom scorer in cross_val_score
mae_scores = cross_val_score(model, X, y, cv=3, scoring=mae_scorer)

print("Raw MAE scores (lower is better):", mae_scores)
print("Mean cross-validation MAE:", np.mean(mae_scores))
print("Here, 'greater_is_better=False' tells scikit-learn that lower scores from 'custom_mae' are better.")

Raw scores returned by cross_val_score: [-1.94365063 -0.87579212 -0.5       ]
Mean cross-validation RMSE: 1.106480916608513

--- Explanation ---
The 'cross_val_score' function used our 'rmse_scorer' to evaluate the LinearRegression model across 3 cross-validation folds.
For each fold, the model was trained on a subset of the data and evaluated on the remaining part.
The 'rmse_scorer' called our 'custom_rmse' function, which calculated the RMSE and returned its negative.
Therefore, the 'scores' array contains the negative RMSE values for each fold.
To get the actual RMSE values, we negate the scores.
A lower positive RMSE indicates better model performance.

--- Another Example with a Different Metric ---
Raw R-squared scores: [-2.77777778 -2.06804734         nan]
Mean cross-validation R-squared: nan
Here, higher R-squared values indicate better fit, and 'greater_is_better' was the default.

--- Example with a Scoring Function Where Lower is Better ---
Raw MAE scores (lower is better): 

