A machine learning project that predicts the 2026 Formula 1 Drivers' Championship using historical race data loaded via FastF1 and a linear regression model β with an interactive Streamlit dashboard for exploring predictions, model performance, and driver deep-dives.
- Project Overview
- Results
- Screenshots
- Project Structure
- Setup & Installation
- Usage
- Streamlit Dashboard
- Model Details
- Tech Stack
Goal: Predict which driver will win the 2026 F1 Drivers' Championship based on historical performance features derived from 2023β2025 season data.
Approach:
- Load historical F1 race results (2023β2025) via FastF1 API
- Engineer per-driver performance features (win rate, podium rate, DNF rate, average finish, rolling averages, etc.)
- Train a linear regression model on historical championship points
- Predict 2026 championship points and rank all drivers
| Pos | Driver | 2026 Team | Predicted Points | Win Probability |
|---|---|---|---|---|
| 1 | Max Verstappen | Red Bull Racing | 400 | 29.9% |
| 2 | Lando Norris | McLaren | 343 | 16.9% |
| 3 | Charles Leclerc | Ferrari | 328 | 14.5% |
| 4 | Carlos Sainz | Williams β | 277 | 8.7% |
| 5 | Oscar Piastri | McLaren | 262 | 7.5% |
| 6 | George Russell | Mercedes | 226 | 5.2% |
| 7 | Lewis Hamilton | Ferrari β | 206 | 4.3% |
| 8 | Sergio Perez | (Red Bull) β‘ | 140 | 2.2% |
| 9 | Fernando Alonso | Aston Martin | 67 | 1.1% |
| 10 | Oliver Bearman | Haas F1 Team | 50 | 0.9% |
Notable 2026 driver moves (not reflected in training data):
| Driver | Previous Team | 2026 Team |
|---|---|---|
| Lewis Hamilton | Mercedes | Ferrari |
| Carlos Sainz | Ferrari | Williams |
| Kimi Antonelli | β (rookie) | Mercedes |
| Liam Lawson | RB | Red Bull Racing |
| Esteban Ocon | Alpine | Haas F1 Team |
| Nico Hulkenberg | Haas F1 Team | Kick Sauber / Audi |
| Jack Doohan | β (reserve) | Alpine |
| Franco Colapinto | Williams | Alpine |
β Team label corrected to 2026 reality; model predictions are based on driver performance history (2023β2025) at their previous constructors.
β‘ Sergio Perez was replaced at Red Bull by Liam Lawson for 2026; his prediction reflects historical Red Bull-era performance.
Limitation: The model predicts based on a driver's historical performance regardless of constructor change. Actual results may differ significantly when a driver switches to a weaker or stronger team.
| Metric | Train | Test |
|---|---|---|
| RΒ² | 0.9995 | 0.9750 |
| RMSE | 2.89 pts | 14.66 pts |
| MAE | β | 7.43 pts |
| Cross-val RΒ² (mean Β± std) | 0.9935 Β± 0.003 | β |
The model achieves an RΒ² of 0.975 on the test set, indicating strong predictive accuracy. The high training RΒ² (0.9995) with good test performance confirms the model generalises well without significant overfitting.
| Feature | Coefficient | Direction |
|---|---|---|
| RollingPodiumRate | β70.49 | Negative |
| PodiumRate | +59.21 | Positive |
| AvgPoints | +23.94 | Positive |
| DNFRate | +16.71 | Positive |
| WinRate | β7.72 | Negative |
Note: Negative coefficients for rolling features vs positive for season-aggregate features reflect multi-collinearity in the linear model; the net effect of podium/win performance is strongly positive.
F1/
βββ app.py # Streamlit interactive dashboard
βββ data/
β βββ raw/ # Raw FastF1 cached data
β βββ processed/
β β βββ driver_features.csv # Engineered feature matrix
β βββ predictions/
β βββ 2026_championship_predictions.csv
β βββ prediction_summary.txt
βββ docs/
β βββ images/ # README screenshots
βββ models/
β βββ feature_importance.csv # Model coefficients per feature
β βββ model_metrics.csv # Train/test performance metrics
βββ notebooks/
β βββ 01_data_exploration.ipynb
β βββ 02_feature_engineering.ipynb
β βββ 03_model_training.ipynb
β βββ 04_predictions.ipynb
βββ src/
β βββ data/
β β βββ load_fastf1.py # FastF1 data loading utilities
β βββ features/
β β βββ build_features.py # Feature engineering pipeline
β βββ models/
β β βββ train_model.py # Model training script
β β βββ predict.py # Prediction script
β βββ visualization/
β βββ visualize.py # Plotting utilities
βββ requirements.txt
βββ README.md
- Python 3.9+
- pip
git clone https://github.com/isthatpaul/F1.git
cd F1# Windows
python -m venv .venv
.venv\Scripts\activate
# macOS / Linux
python -m venv .venv
source .venv/bin/activatepip install -r requirements.txtRun the four notebooks in order:
| Notebook | Purpose |
|---|---|
notebooks/01_data_exploration.ipynb |
Explore raw FastF1 data |
notebooks/02_feature_engineering.ipynb |
Build the driver feature matrix |
notebooks/03_model_training.ipynb |
Train & evaluate the linear regression model |
notebooks/04_predictions.ipynb |
Generate 2026 championship predictions |
jupyter lab# 1. Load and process data
python src/data/load_fastf1.py
# 2. Build features
python src/features/build_features.py
# 3. Train model
python src/models/train_model.py
# 4. Generate predictions
python src/models/predict.pyPredictions are saved to data/predictions/2026_championship_predictions.csv.
An interactive web dashboard (app.py) provides three views:
| Page | Description |
|---|---|
| π Championship Predictions | Bar chart of predicted points, win probability pie chart, and full standings table |
| π Model Performance | Train vs test metrics (RΒ², RMSE, MAE) and feature importance chart |
| π Driver Deep Dive | Per-driver breakdown with radar/detail charts |
streamlit run app.pyThe app opens at https://pauls-f1-lab.streamlit.app/ by default.
Algorithm: Linear Regression (scikit-learn LinearRegression)
Target variable: Season championship points
Input features:
| Feature | Description |
|---|---|
AvgPoints |
Average points per race in the season |
AvgFinish |
Average finishing position |
AvgGrid |
Average qualifying grid position |
WinRate |
Fraction of races won |
PodiumRate |
Fraction of races with a podium finish |
DNFRate |
Fraction of races not finished |
AvgPositionChange |
Average positions gained/lost vs grid |
PrevYearPoints |
Championship points from the previous season |
PrevYearPodiums |
Podium count from the previous season |
RollingAvgPoints |
Rolling 3-season average points |
RollingPodiumRate |
Rolling 3-season average podium rate |
Training data: 2023β2025 F1 seasons
| Library | Purpose |
|---|---|
| FastF1 | F1 race data API |
| scikit-learn | Machine learning |
| pandas | Data manipulation |
| numpy | Numerical computing |
| matplotlib | Static visualisation |
| seaborn | Statistical visualisation |
| plotly | Interactive charts |
| Streamlit | Interactive web dashboard |
Predictions are based on historical data and statistical modelling β not a guarantee of future results.



