## Model Training, Evaluation, and Saving

This section trains, evaluates, and selects the best-performing machine learning model for predicting NBA playoff game outcomes.  
It loads processed features and labels, compares multiple models, selects the best based on **F1 Score**, retrains it using all available training data, and saves both the final model and predictions.

---

#### Why the Best Model Was Chosen Using F1 Score
In predicting playoff outcomes, **class imbalance** can occur, and some teams (like #1 seeds) win more often than underdogs. If we only optimized for accuracy, the model could favor predicting favorites and still achieve a decent score, while missing many true upsets.

- **Precision** -> Of all the games predicted as wins, how many were correct?
- **Recall** -> Of all the actual wins, how many did the model catch?
- **F1 Score** -> THe harmonic mean of precision and recall, balancing both.

By prioritizing **F1 Score**, I ensured the chosen model performed well not just at prediciting favorites, but also at identifying underdog wins. This makes the model more reliable in playoff settings where unexpected outcomes are common and more valuable to analysts who care about both predicting favorites and catching upsets.

---

##### Key Steps:
1. **Import Libraries**: Load Scikit-learn classifiers, metrics, and utilities for training and evaluation.
2. **Load Processed Data**: Import preprocessed train/validation/test splits created in Notebook 3.
3. **Train Candidate Models**: Fit multiple classifiers, including Logistic Regression, Random Forest, Gradient Boosting, and Support Vector Machines (SVM).
4. **Evaluate Models**: Compute accuracy, precision, recall, and F1 score on the validation set.
5. **Select Best Model**: Choose the model that best balances false positives and false negatives (highest F1 score).
6. **Test on 2025 Data**: Evaluate the final model on the held-out test set (2025 playoffs) to simulate real-world forecasting.
7. **Save Best Model**: Export the trained model with joblib for use in the simulator application.

In [1]:
# Import Libraries
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import joblib

# Load Processed Feature Sets and Labels
X_train = pd.read_csv("../data/processed/X_train_processed.csv")
X_val   = pd.read_csv("../data/processed/X_val_processed.csv")
X_test  = pd.read_csv("../data/processed/X_test_processed.csv")

y_train = pd.read_csv("../data/processed/y_train.csv").values.ravel()
y_val   = pd.read_csv("../data/processed/y_val.csv").values.ravel()
y_test  = pd.read_csv("../data/processed/y_test.csv").values.ravel()

# Combine Training + Validation
X_train_val = pd.concat([X_train, X_val], axis=0)
y_train_val = pd.concat([pd.Series(y_train), pd.Series(y_val)], axis=0).values

# Define Evaluation Function
def evaluate_model(name, model, X_test, y_test):
    # Evaluate model performance and print metrics
    y_pred = model.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    prec = precision_score(y_test, y_pred)
    rec = recall_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    
    print(f"{name}:")
    print(f"Accuracy : {acc:.4f}")
    print(f"Precision: {prec:.4f}")
    print(f"Recall   : {rec:.4f}")
    print(f"F1 Score : {f1:.4f}")
    print("-" * 40)
    
    return acc, prec, rec, f1

# Define & Evaluate Multiple Models
models = {
    "LogisticRegression": LogisticRegression(max_iter=1000, random_state=42),
    "RandomForest": RandomForestClassifier(random_state=42),
    "GradientBoosting": GradientBoostingClassifier(random_state=42),
    "SVM": SVC(probability=True, random_state=42)
}

results = []

for name, clf in models.items():
    pipe = Pipeline([("clf", clf)])
    pipe.fit(X_train_val, y_train_val)
    acc, prec, rec, f1 = evaluate_model(name, pipe, X_test, y_test)
    results.append({
        "Model": name,
        "Accuracy": acc,
        "Precision": prec,
        "Recall": rec,
        "F1 Score": f1
    })

results_df = pd.DataFrame(results)
print("\nModel Comparison:\n", results_df)

# Select Best Model Based on F1 Score
best_model_name = results_df.sort_values(by="F1 Score", ascending=False).iloc[0]["Model"]
print(f"\nBest model based on F1 Score: {best_model_name}")

best_clf = models[best_model_name]

# Retrain Best Model on All Train+Val Data
final_pipe = Pipeline([("clf", best_clf)])
final_pipe.fit(X_train_val, y_train_val)

# Save model
joblib.dump(final_pipe, f"../model/final_{best_model_name.lower()}_model.pkl")

# Save feature names
joblib.dump(X_train.columns.tolist(), "../model/modeled_features.pkl")

# Make Predictions with Final Model
final_model = joblib.load(f"../model/final_{best_model_name.lower()}_model.pkl")
y_pred = final_model.predict(X_test)

# Save predictions
predictions_df = pd.DataFrame({
    "true_label": y_test,
    "predicted_label": y_pred
})
predictions_df.to_csv(f"../model/{best_model_name.lower()}_predictions.csv", index=False)

LogisticRegression:
Accuracy : 0.5119
Precision: 0.5660
Recall   : 0.6250
F1 Score : 0.5941
----------------------------------------
RandomForest:
Accuracy : 0.5595
Precision: 0.6000
Recall   : 0.6875
F1 Score : 0.6408
----------------------------------------
GradientBoosting:
Accuracy : 0.5714
Precision: 0.6034
Recall   : 0.7292
F1 Score : 0.6604
----------------------------------------
SVM:
Accuracy : 0.5952
Precision: 0.6129
Recall   : 0.7917
F1 Score : 0.6909
----------------------------------------

Model Comparison:
                 Model  Accuracy  Precision    Recall  F1 Score
0  LogisticRegression  0.511905   0.566038  0.625000  0.594059
1        RandomForest  0.559524   0.600000  0.687500  0.640777
2    GradientBoosting  0.571429   0.603448  0.729167  0.660377
3                 SVM  0.595238   0.612903  0.791667  0.690909

Best model based on F1 Score: SVM
