# Evaluation Metrics

Formula 1 is the world's fastest and most technologically advanced motorsport, where drivers and teams compete in high-speed races across the globe. Each race is packed with excitement, strategy, and luckily for us, data.

## Problem Statement

### 🏎️ Predicting Formula 1 Driver Positions

In this activity, you will explore a real-world Formula 1 dataset from [F1nalyze Formula 1 Datathon](https://www.kaggle.com/competitions/f1nalyze-datathon-ieeecsmuj/overview). This dataset contains detailed information about past F1 races, including:

- Race results (`position`): Where each driver finished in each race
- Driver and team details: Information about the drivers and teams (constructors) they race for
- Race conditions: Data like starting grid position, points scored, laps completed, etc.

Many factors (e.g., starting position, team strategy, weather) can influence the outcome of a race. By analyzing this data, we can try to predict where each driver might finish in future races.

## Our Focus: Evaluation Metrics

While building a model to predict race results is exciting, our main goal in this activity is to learn how to evaluate the performance of such models. In machine learning, evaluation metrics help us understand how well our predictions match the real outcomes. By experimenting with the F1 dataset, you’ll see how different metrics (like accuracy, mean absolute error, and more) can tell different stories about model performance.

## Data

1. Download the dataset [here](https://www.kaggle.com/competitions/f1nalyze-datathon-ieeecsmuj/data?select=validation.csv)
2. Move it into your [Google Drive](https://drive.google.com/drive/u/0/my-drive) in the "Colab Notebooks" folder

- `resultId`: Unique identifier for the result.
- `racerId`: Identifier for the race.
- `driverId`: Identifier for the driver.
- `constructorId`: Identifier for the constructor (team).
- `number`: Car number of the driver.
- `grid`: Starting position on the grid.
- `position_x`: Position from where the driver started in the grid
- `positionText_x`: Starting Position in text format (e.g., 1st, 2nd).
- `positionOrder`: Order of starting position.
- `points`: Points scored in the race.
- `laps`: Total number of laps in the race.
- `time_x`: Time taken in the fastest lap
- `timetaken_in_millisec`: Time taken in milliseconds.
- `fastestLap`: Lap number of the fastest lap.
- `rank`: Rank in the race.
- `fastestLapTime`: Time of the fastest lap.
- `max_speed`: Maximum speed achieved during the race.
- `statusId`: Identifier for the race status.
- `year`: Year of the race.
- `round`: Round number of the race.
- `circuitId`: Identifier for the circuit.
- `grand_prix`: Name of the Grand Prix.
- `date`: Date of the race.
- `time_y`: Time of the race .
- `url_x`: URL associated with the race.
- `fp1_date`: Date of the first practice session.
- `fp1_time`: Time of the first practice session.
- `fp2_date`: Date of the second practice session.
- `fp2_time`: Time of the second practice session.
- `fp3_date`: Date of the third practice session.
- `fp3_time`: Time of the third practice session.
- `quali_date`: Date of the qualifying session.
- `quali_time`: Time of the qualifying session.
- `sprint_date`: Date of the sprint race session.
- `sprint_time`: Time of the sprint race session.
- `driverRef`: Reference to the driver.
- `driver_num`: Unique identifier for the driver.
- `driver_code`: Code assigned to the driver.
- `forename`: First name of the driver.
- `surname`: Last name of the driver.
- `dob`: Date of birth of the driver.
- `nationality`: Nationality of the driver.
- `url_y`: URL associated with the driver.
- `driverStandingsId`: Identifier for driver standings.
- `raceId_y`: Identifier for the race associated with the driver.
- `points_y`: Points scored by the driver.
- `position`: Final position for a particular lap. (Target Variable)
- `positionText_y`: Position text description.
- `wins`: Number of wins by the driver.
- `constructorRef`: Reference to the constructor (team).
- `company`: Company associated with the constructor.
- `nationality_y`: Nationality of the constructor.
- `url`: URL associated with the constructor.
- `status`: Status related to the driver or constructor.
- `result_driver_standing`: The unique id for the row.


## Set Up

We first have to do a few things before we can evaluate our models (of course)!

1. Import libraries
2. Get our data
3. Clean our data
4. Data preprocessing
5. Creating and training our models

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import OrdinalEncoder

from sklearn.pipeline import Pipeline
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score

In [None]:
# See all columns for table since we have a lot of columns
pd.set_option('display.max_columns', None)

# Set max number of datapoints to use
NUM_DATAPOINTS = 15_000

In [None]:
# Download the training and testing dataset
%pip install gdown
%pip install zipfile36

import os
import gdown
import zipfile

# Check if data/f1nalyze exists
if not os.path.exists('data/f1nalyze'):
    os.makedirs('data', exist_ok=True)  # Ensure data/ exists

    # Download the zip file into data/
    zip_path = 'data/f1nalyze.zip'
    gdown.download(
        'https://drive.google.com/uc?export=download&id=1mRepXFuLfaPNBAlOonoe2dsCqR16uih4',
        zip_path,
        quiet=False
    )

    # Extract zip into data/f1nalyze/
    with zipfile.ZipFile(zip_path, 'r') as zip_ref:
        zip_ref.extractall('data/f1nalyze')

    # Step 4: Remove the zip file
    os.remove(zip_path)
else:
    print("data/f1nalyze already exists, skipping download and extraction.")

In [None]:
f1_data_path = "data/f1nalyze/"

### Read Data

In [None]:
train = pd.read_csv(f1_data_path + "train.csv")

In [None]:
train.head()

In [None]:
test = pd.read_csv(f1_data_path + "validation.csv")

In [None]:
test.head()

In [None]:
print(train.shape)
print(test.shape)

In [None]:
# Take a subset of our data
# Since our training data has over 2M samples, training will take a long time!
# Our testing data has 353,762 samples
train = train.sample(n=NUM_DATAPOINTS, random_state=42)
test = test.sample(n=NUM_DATAPOINTS, random_state=42)

print(train.shape)
print(test.shape)

### Pre-processing

In [None]:
# Find columns in the training dataset but not in testing set
col_not_in_test = set(train.columns) - set(test.columns)

print("Columns not in test:")
for col in col_not_in_test:
    print(col)

In [None]:
# Replace "\N" with NaN
# so pandas recognizes them as missing values

train.replace("\\N", pd.NA, inplace=True)
test.replace("\\N", pd.NA, inplace=True)

In [None]:
train_empty = train.isnull().sum()
train_empty

In [None]:
test_empty = test.isnull().sum()
test_empty

In [None]:
# Only use relevant columns
columns_to_include = ["grid", "positionText_x", "points", "laps", "round", "nationality", "points_y", "position", "wins", "company", "status"]

train_clean = train[columns_to_include]

print("Train dataset with only relevant columns:")
train_clean

In [None]:
train_clean.info()

In [None]:
# See unique values for relevant columns
cols = ["positionText_x", "company", "status", "nationality"]

for col in cols:
    uniq = train_clean[col].unique()
    print(f"{col}:\n", uniq, "\n")

In [None]:
# Encode categorial variables
ordinal_cols = ["positionText_x"]
nominal_cols = ["nationality", "company", "status"]
numerical_cols = [col for col in train_clean.columns if col not in ordinal_cols and col not in nominal_cols and col != "position"]

# Define position order
position_order = [
    ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', 'R', 'N', 'W', 'D', 'F', 'E']
]

# Create ordinal encoder
ordinal_encoder = OrdinalEncoder(categories=position_order)

# Create column transformer
preprocessor = ColumnTransformer(
    transformers=[
        ("ord", ordinal_encoder, ordinal_cols),
        ("cat", OneHotEncoder(handle_unknown="ignore"), nominal_cols),
        ("num", "passthrough", numerical_cols)
    ]
)

### Training Our Model(s)

We are going to use evaluation metrics to compare a Decision Tree vs. Random Forest.

We are going to use a **pipeline** to sequentially apply a list of transforms and a final estimator i.e., apply our transformations and model fitting/training at the same time!

Read the scikit-learn documentation on [DecisionTreeClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier) and  [RandomForestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) to see which hyperparameters you can adjust!

In [None]:
# Use a **pipeline** to combine preprocessing and modeling
pipeline_decision_tree = Pipeline([
    ('preprocess', preprocessor),
    ('model', DecisionTreeClassifier())
])

pipeline_forest = Pipeline([
    ('preprocess', preprocessor),
    ('model', RandomForestClassifier())
])

In [None]:
pipeline_decision_tree

In [None]:
pipeline_forest

In [None]:
# Separate our data
X_train = train_clean.drop(columns="position")
y_train = train_clean["position"]

X_test = test.drop(columns="position")
y_test = test["position"]

In [None]:
pipeline_decision_tree.fit(X_train, y_train)

In [None]:
pipeline_forest.fit(X_train, y_train)

### Test our Models

In [None]:
# Test our models using .predict()
y_pred_decision_tree = pipeline_decision_tree.predict(X_test)
y_pred_forest = pipeline_forest.predict(X_test)

In [None]:
# Print unique drivers predicted to be in position 1 for Decision Tree
print("Decision Tree - Drivers predicted for position 1:")
dt_pos1_mask = y_pred_decision_tree == 1
dt_pos1_indices = X_test.index[dt_pos1_mask]

dt_drivers = set()
for idx in dt_pos1_indices:
    forename = X_test.loc[idx, 'forename']
    surname = X_test.loc[idx, 'surname']
    dt_drivers.add(f"{forename} {surname}")

for driver in sorted(dt_drivers):
    print(driver)

print("\n" + "="*50 + "\n")

# Print unique drivers predicted to be in position 1 for Random Forest
print("Random Forest - Drivers predicted for position 1:")
rf_pos1_mask = y_pred_forest == 1
rf_pos1_indices = X_test.index[rf_pos1_mask]

rf_drivers = set()
for idx in rf_pos1_indices:
    forename = X_test.loc[idx, 'forename']
    surname = X_test.loc[idx, 'surname']
    rf_drivers.add(f"{forename} {surname}")

for driver in sorted(rf_drivers):
    print(driver)

### Visualize our Models

In [None]:
from sklearn.tree import plot_tree

# Extract the trained DecisionTreeClassifier from the pipeline
dt_model = pipeline_decision_tree.named_steps['model']

# Plot the tree
plt.figure(figsize=(40, 20))
plot_tree(
    dt_model,
    filled=True,
    feature_names=pipeline_decision_tree.named_steps['preprocess'].get_feature_names_out(),
    max_depth=5,  # Change this for deeper/shallower trees
    impurity=False,
    fontsize=10
)
plt.title("Decision Tree Visualization")
plt.show()

In [None]:
# Extract the trained RandomForestClassifier from the pipeline
rf_model = pipeline_forest.named_steps['model']

# Pick one tree from the forest (e.g., the first one)
estimator = rf_model.estimators_[0]

plt.figure(figsize=(20, 20))
plot_tree(
    estimator,
    filled=True,
    feature_names=pipeline_forest.named_steps['preprocess'].get_feature_names_out(),
    max_depth=3,
    impurity=False,
    fontsize=10
)
plt.title("Random Forest - Tree 0 Visualization (max_depth=3)")
plt.show()

### Accuracy Scores

In [None]:
accuracy_score(y_test, y_pred_decision_tree)

In [None]:
accuracy_score(y_test, y_pred_forest)

## Your Turn: Pick an Evaluation Metric

Judging from our accuracy scores, both models seem to be performing poorly!

Let's use some other evaluation metrics.

1. Go visit scikit-learn's documentation on [different evaluation metrics](https://scikit-learn.org/stable/modules/model_evaluation.html)
2. Pick a 3-5 different evaluation metrics
3. Use them to see how the models' scores change!

### Question to Consider: Can we use any evaluation metric?

- Is `position` a categorical or numerical variable?

> Remember to import the evaluation metric and run the code to be able to use the functions from scikit-learn!

```python
# Example import
from sklearn.metrics import mean_squared_error
```

In [None]:
# TODO: Import any evaluation metric you want to use below



# TODO: Score your models



### Tune Your Models

Now, make your own pipelines (preprocessor and model) with tuned hyperparameters.

Test your models using new evaluation metrics.

To get you started, here is the code from above to create your own model.

```python
# Use a **pipeline** to combine preprocessing and modeling
pipeline_decision_tree = Pipeline([
    ('preprocess', preprocessor),
    ('model', DecisionTreeClassifier())
])

pipeline_forest = Pipeline([
    ('preprocess', preprocessor),
    ('model', RandomForestClassifier())
])
```

You will need to change some things about it to tune your hyperparameters. You can also use some other classification models to do so.

See available hyperparameters for:

- [DecisionTreeClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier)
- [RandomForestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)

In [None]:
# TODO: Create new pipelines/models with tuned hyperparameters


# TODO: Evaluate your new models with evaluation metrics