## Notebook Overview

This notebook evaluates the performance of the trained Gradient Boosting model on unseen data. It computes accuracy metrics, visualizes prediction quality, and analyzes model behavior.

It will further explore edge cases and the model's behaviour, identifying any potential flaws or insufficiencies. 

## Imports and setup

In [71]:
import pandas as pd
import numpy as np
import joblib
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import sys
sys.path.append('../backend')

from model_loader import load_model
from preprocessing import preprocess_input

## Load Model & Evaluation Data

For evaluation purposes, we will be using `models\V1_GradientBoosting_F1_Race_Predictor_model.joblib` as it is trained on 2015 to 2024 race data, ensures unbiased predictions for 2025 races.

In [72]:
model = load_model('../models/V1_GradientBoosting_F1_Race_Predictor_model.joblib', True)

model

✅ Model loaded successfully from: ../models/V1_GradientBoosting_F1_Race_Predictor_model.joblib


## Preprocess the Evaluation Data

Since the `models\V1_GradientBoosting_F1_Race_Predictor_model.joblib` model is trained on race & qualifying data from 2015 to 2024. To predict the results of the 2025 races, we will use the corresponding qualifying data for 2025.

You can modify the `year` and `round` variables below as needed. However, please keep in mind that there might be some bias when applying this model to predict outcomes for races in 2025, as the model was trained on data from 2015 to 2024.

In [95]:
year = 2025
round = 12

df = pd.read_csv(r'C:\Users\CLL\OneDrive\Documents\GitHub\F1-Predictor\notebooks\2015_to_2025_df.csv')
eval_df = df.loc[(df['Season'] == year) & (df['Round'] == round)]

eval_df.head(5)

Unnamed: 0,Season,Round,EventName,Abbreviation,TeamName,GridPosition,Q1_s,Q2_s,Q3_s,Position
4438,2025,12,British Grand Prix,NOR,McLaren,3.0,86.123,85.231,85.01,1.0
4439,2025,12,British Grand Prix,PIA,McLaren,2.0,85.963,85.316,84.995,2.0
4440,2025,12,British Grand Prix,HUL,Kick Sauber,19.0,86.574,9999.0,9999.0,3.0
4441,2025,12,British Grand Prix,HAM,Ferrari,5.0,86.296,85.084,85.095,4.0
4442,2025,12,British Grand Prix,VER,Red Bull Racing,1.0,85.886,85.316,84.892,5.0


`notebooks\2015_to_2025_df.csv` is updated after any previously missing data are replaced and cleaned, we now call `preprocess_input()` imported from `backend\preprocessing.py` to prepare the data for use by our model.

In [96]:
test = preprocess_input(eval_df)

X = test.drop('Position', axis=1)
y = test['Position']

test.head(5) # For visualisation purposes

Unnamed: 0,Season,Round,GridPosition,Q1_s,Q2_s,Q3_s,Position,Abbreviation_ALB,Abbreviation_ALO,Abbreviation_ANT,...,EventName_Russian Grand Prix,EventName_Sakhir Grand Prix,EventName_Saudi Arabian Grand Prix,EventName_Singapore Grand Prix,EventName_Spanish Grand Prix,EventName_Styrian Grand Prix,EventName_São Paulo Grand Prix,EventName_Turkish Grand Prix,EventName_Tuscan Grand Prix,EventName_United States Grand Prix
4438,2025,12,3.0,86.123,85.231,85.01,1.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4439,2025,12,2.0,85.963,85.316,84.995,2.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4440,2025,12,19.0,86.574,9999.0,9999.0,3.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4441,2025,12,5.0,86.296,85.084,85.095,4.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4442,2025,12,1.0,85.886,85.316,84.892,5.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## Make Predictions

In [97]:
predictions = model.predict(X)

predictions

array([ 3.8991749 ,  3.46459521, 13.81606482,  6.79421738,  3.6642619 ,
       10.57729497, 14.32563559, 13.79290551,  8.70507937,  6.10202725,
       12.91079615, 12.24141125, 13.25073362,  7.15759138, 10.88927994,
       14.36505515, 13.79435156, 15.10137476, 15.1488419 , 15.2667469 ])

## Evaluate Performance

In [98]:
eval_df['ExactPredictedPosition'] = predictions

df_sorted = eval_df.sort_values('ExactPredictedPosition')
df_sorted['PredictedPosition'] = [float(i) for i in range (1, 21)]
df_sorted = df_sorted.sort_values('Position')

df_sorted[['Season', 'Round', 'EventName', 'Abbreviation', 'TeamName', 'GridPosition', 'Position', 'PredictedPosition', 'ExactPredictedPosition']]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  eval_df['ExactPredictedPosition'] = predictions


Unnamed: 0,Season,Round,EventName,Abbreviation,TeamName,GridPosition,Position,PredictedPosition,ExactPredictedPosition
4438,2025,12,British Grand Prix,NOR,McLaren,3.0,1.0,3.0,3.899175
4439,2025,12,British Grand Prix,PIA,McLaren,2.0,2.0,1.0,3.464595
4440,2025,12,British Grand Prix,HUL,Kick Sauber,19.0,3.0,15.0,13.816065
4441,2025,12,British Grand Prix,HAM,Ferrari,5.0,4.0,5.0,6.794217
4442,2025,12,British Grand Prix,VER,Red Bull Racing,1.0,5.0,2.0,3.664262
4443,2025,12,British Grand Prix,GAS,Alpine,8.0,6.0,8.0,10.577295
4444,2025,12,British Grand Prix,STR,Aston Martin,17.0,7.0,16.0,14.325636
4445,2025,12,British Grand Prix,ALB,Williams,13.0,8.0,13.0,13.792906
4446,2025,12,British Grand Prix,ALO,Aston Martin,7.0,9.0,7.0,8.705079
4447,2025,12,British Grand Prix,RUS,Mercedes,4.0,10.0,4.0,6.102027
