<a href="https://colab.research.google.com/github/protogia/jffa/blob/main/formula1-2024-melbourne.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Preconfiguration for Colab-Notebook to get prefered readability & layout
- Disabling IPython-warnings
- full-width layout for plots

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [3]:
from IPython.core.display import display, HTML

# Set notebook width to 100%
display(HTML(""))

## Install _fastf1_ and load data for first overview.

In [2]:
%%capture
!pip install fastf1;

In [4]:
import fastf1

fastf1.logger.LoggingManager.set_level(fastf1.logger.logging.ERROR);

sessions = {}

race        = fastf1.get_session(2023, "Melbourne", backend="ergast", identifier="Q")
qualifying  = fastf1.get_session(2023, "Melbourne", backend="ergast", identifier="R")



In [5]:
race.load()
qualifying.load()

INFO:fastf1.fastf1.core:Loading data for Australian Grand Prix - Qualifying [v3.6.1]
INFO:fastf1.fastf1.req:No cached data found for session_info. Loading data...
INFO:fastf1.api:Fetching session info data...
INFO:fastf1.fastf1.req:Data has been written to cache!
INFO:fastf1.fastf1.req:No cached data found for driver_info. Loading data...
INFO:fastf1.api:Fetching driver list...
INFO:fastf1.fastf1.req:Data has been written to cache!
DEBUG:fastf1.ergast:Failed to parse timestamp '' in Ergastresponse.
INFO:fastf1.fastf1.req:No cached data found for session_status_data. Loading data...
INFO:fastf1.api:Fetching session status data...
INFO:fastf1.fastf1.req:Data has been written to cache!
INFO:fastf1.fastf1.req:No cached data found for track_status_data. Loading data...
INFO:fastf1.api:Fetching track status data...
INFO:fastf1.fastf1.req:Data has been written to cache!
INFO:fastf1.fastf1.req:No cached data found for _extended_timing_data. Loading data...
INFO:fastf1.api:Fetching timing data.

### Save data as _.csv_

In [6]:
import os
import pandas as pd
from fastf1.core import Session

if os.path.exists('data') == False:
  os.mkdir('data')

qualifying.laps.to_csv('data/qualifying_laps.csv')
qualifying.race_control_messages.to_csv('data/qualifying_messages.csv')
qualifying.weather_data.to_csv('data/qualifying_weather.csv')
qualifying.track_status.to_csv('data/qualifying_track_status.csv')

race.race_control_messages.to_csv('data/race_messages.csv')
race.weather_data.to_csv('data/race_weather.csv')
race.laps.to_csv('data/race_laps.csv')
race.track_status.to_csv('data/race_track_status.csv')

In [7]:
# export drive specific data to csv

for driver in qualifying.drivers:
  if os.path.exists(f'data/{driver}') == False:
    os.mkdir(f'data/{driver}')
  qualifying.car_data[driver].to_csv(f'data/{driver}/qualifying_car_data.csv')
  qualifying.pos_data[driver].to_csv(f'data/{driver}/qualifying_pos_data.csv')

for driver in race.drivers:
  if os.path.exists(f'data/{driver}') == False:
    os.mkdir(f'data/{driver}')
  race.car_data[driver].to_csv(f'data/{driver}/race_car_data.csv')
  race.pos_data[driver].to_csv(f'data/{driver}/race_pos_data.csv')


## Analysing the Qualifying performence of Carlos Sainz Junior

In [105]:
choosen_drivers = ['SAI', 'VER']

filtered_qualifying = qualifying.laps[qualifying.laps['Driver'].isin(choosen_drivers)]

First we'll calculate the timedifference between each lap in order to analyse the performence of Carlos Sainz Junior over the whole qualifying.

In [106]:
# calculate lap time difference and time difference
filtered_qualifying['LapTimeDifference'] = filtered_qualifying['LapTime'].diff().dt.total_seconds()

As you can see in the following plot the laptime is decreasing over time if you ignore the outliers. The outliers are caused by external reasons but not by the driver himself, whereas the reasons are visualised via vertical dotted lines.

- blue line: Track is all clear.
- Yellow line: Yellow Flag.
- Red line: Red Flag.
- Grey/Black line: Safety Car/ Virtual Safety Car is deployed.
- Green line: Safety Car Ending.


In [107]:
from ctypes import alignment
import plotly.express as px
import pandas as pd
import plotly.graph_objects as go

track_status_colors = {
    "AllClear":"green",
    "Yellow":"yellow",
    "Red":"red",
    "SCDeployed":"purple",
    "VSCDeployed":"violet",
    "VSCEnding":"orange"
}

filtered_qualifying['LapTime_seconds'] = filtered_qualifying['LapTime'].dt.total_seconds() # needed for plotting

fig = px.line(filtered_qualifying, x='LapNumber', y='LapTime_seconds', color='Driver', template='none')
fig.update_layout(title='Laptime within Qualifying', xaxis_title='Lap Number', yaxis_title='Lap Time (seconds)')

# Add vertical lines for track status changes
for index, row in qualifying.track_status.iterrows():
    # Find the closest lap number in filtered_qualifying based on time
    closest_lap = filtered_qualifying.loc[filtered_qualifying['Time'].sub(row['Time']).abs().idxmin()]
    fig.add_vline(x=closest_lap['LapNumber'], line_width=1, line_dash="dash", line_color=track_status_colors[row['Message']])

    # Add hover information using a scatter trace with invisible markers
    fig.add_trace(go.Scatter(
        x=[closest_lap['LapNumber']],
        y=[0], # Place the marker at x-axis
        mode='markers',
        hoverinfo='text',
        text=f"Track Status: {row['Message']}",
        showlegend=False # Hide this trace from the legend
    ))


fig.show()

To compare the performence change between each lap we'll filter the outliers which are represented by values _<=25th-percentile and >=75th-percentile_ and calculate the percentage change between each row of the data.

For outlier detection we'll use IRQ (Interquartile Range) because it doesn't rely on mean and standard deviation of a dataset.  

In [108]:
# calculate thresholds for outliers
Q1 = filtered_qualifying['LapTime'].quantile(0.25)
Q3 = filtered_qualifying['LapTime'].quantile(0.75)
IQR = Q3 - Q1

upper_fence = Q3 + 1.5 * IQR
lower_fence = Q1 - 1.5 * IQR

# filter outliers
qualifying_without_outliers = filtered_qualifying[(filtered_qualifying['LapTime'] >= lower_fence) & (filtered_qualifying['LapTime'] <= upper_fence)]

# calculate percentage difference per lap
qualifying_without_outliers['LapTimePercentageDifference'] = qualifying_without_outliers['LapTime'].pct_change() * 100

In [109]:
import plotly.graph_objects as go

fig = px.bar(qualifying_without_outliers, x='LapNumber', y='LapTimePercentageDifference', color='Driver', template='none')
fig.update_layout(title='Percentage Difference in Lap Time between Laps', xaxis_title='Lap Number', yaxis_title='Percentage Difference')

# Add vertical lines for track status changes
for index, row in qualifying.track_status.iterrows():
    # Find the closest lap number in filtered_qualifying based on time
    closest_lap = filtered_qualifying.loc[filtered_qualifying['Time'].sub(row['Time']).abs().idxmin()]
    fig.add_vline(x=closest_lap['LapNumber'], line_width=1, line_dash="dash", line_color=track_status_colors[row['Message']])

    # Add hover information using a scatter trace with invisible markers
    fig.add_trace(go.Scatter(
        x=[closest_lap['LapNumber']],
        y=[0], # Place the marker at x-axis
        mode='markers',
        hoverinfo='text',
        text=f"Track Status: {row['Message']}",
        showlegend=False # Hide this trace from the legend
    ))

fig.show()

As you can see laps effected by events like SafetyCar periods are filtered out.

The remaining data shows that the driver performence differed always around 1% aside of lap 11 and 12. Moreover the difference between each lap decreased in the end of the qualifying with values around 0.3% decreasing of laptime. The driver performece became more and more stable to the end of the qualifying.

To evaluate the percentage performence per lap we could also calculate the average time per lap and calculate the percentage difference to the average.

In [110]:
# average lap time
av = qualifying_without_outliers['LapTime'].mean()

# Calculate percentage difference to average lap time
qualifying_without_outliers['LapTimePercentageDifferenceToAverage'] = ((qualifying_without_outliers['LapTime'].dt.total_seconds() - av.total_seconds()) / av.total_seconds()) * 100

In [111]:
import plotly.express as px
import plotly.graph_objects as go

fig
fig = px.bar(qualifying_without_outliers, x='LapNumber', y='LapTimePercentageDifferenceToAverage', template='none', color='Driver')
fig.update_layout(title='Percentage Difference to Average Lap Time per Lap (Qualifying)',
                  xaxis_title='Lap Number',
                  yaxis_title='Percentage Difference to Average')

# Add vertical lines for track status changes
for index, row in qualifying.track_status.iterrows():
    # Find the closest lap number in filtered_qualifying based on time
    closest_lap = filtered_qualifying.loc[filtered_qualifying['Time'].sub(row['Time']).abs().idxmin()]
    fig.add_vline(x=closest_lap['LapNumber'], line_width=1, line_dash="dash", line_color=track_status_colors[row['Message']])

    # Add hover information using a scatter trace with invisible markers
    fig.add_trace(go.Scatter(
        x=[closest_lap['LapNumber']],
        y=[0], # Place the marker at y=0 for bar charts
        mode='markers',
        hoverinfo='text',
        text=f"Track Status: {row['Message']}",
        showlegend=False # Hide this trace from the legend
    ))

fig.show()

The percentage difference to the average performece is evenly distributed. While in the beginning of the qualifying lap times were always higher then the average performece, after lap 31 the performence per lap was always better then the average.

Improved lap times were apparently the result of getting accustomed to the circuit with each passing lap. As a result the fastest Lap for Carlos Sainz Junior was Lap 53:

In [17]:
filtered_qualifying.pick_fastest().LapNumber

np.float64(53.0)

In [102]:
# Plotting tire compound over time
fig = px.scatter(qualifying_without_outliers, x='LapNumber', y='Compound', color='Compound', template='none')
fig.update_layout(title='Tire Compound during Race',
                        xaxis_title='Lap Number',
                        yaxis_title='Tire Compound')
fig.show()

In [104]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

weather_df = qualifying.weather_data.copy() # Use weather data from qualifying session

# Create subplots with multiple y-axes
fig_weather = make_subplots(specs=[[{"secondary_y": True}]])

# Add traces for each weather parameter
fig_weather.add_trace(go.Scatter(x=weather_df['Time'], y=weather_df['AirTemp'], mode='lines', name='Air Temperature'), secondary_y=False)
fig_weather.add_trace(go.Scatter(x=weather_df['Time'], y=weather_df['Humidity'], mode='lines', name='Humidity'), secondary_y=True)
fig_weather.add_trace(go.Scatter(x=weather_df['Time'], y=weather_df['WindSpeed'], mode='lines', name='Wind Speed'), secondary_y=True)
fig_weather.add_trace(go.Scatter(x=weather_df['Time'], y=weather_df['TrackTemp'], mode='lines', name='Track Temperature'), secondary_y=False)


# Add shaded regions for rainfall
for i in range(len(weather_df) - 1):
    if weather_df['Rainfall'].iloc[i] == True:
        fig_weather.add_shape(type="rect",
            x0=weather_df['Time'].iloc[i], y0=0, x1=weather_df['Time'].iloc[i+1], y1=1,
            xref='x', yref='paper',
            fillcolor="lightblue",
            opacity=0.5,
            layer="below",
            line_width=0,
        )


# Update layout
fig_weather.update_layout(
    title_text="Weather Data during Qualifying",
    template='none'
)

# Set y-axes titles
fig_weather.update_yaxes(title_text="Temperature (°C)", secondary_y=False)
fig_weather.update_yaxes(title_text="Humidity (%) / Wind Speed (m/s)", secondary_y=True)


fig_weather.show()

## Analysing Race Performence of Carlos Sainz Junior

In [68]:
filtered_race = race.laps[race.laps['Driver'].isin(choosen_drivers)]
filtered_race['Driver'].unique()

array(['SAI'], dtype=object)

Because within the race happened much more then within the qualifying I will plot the track status (SCDeployed, CleanTrack, etc.) over time as well as the race control messages to have a first overview about what happened in the race.
  

In [21]:
race.race_control_messages

Unnamed: 0,Time,Category,Message,Status,Flag,Scope,Sector,RacingNumber,Lap
0,2023-04-01 04:45:51,Other,RISK OF RAIN FOR F1 QUALIFYING SESSION IS 90%,,,,,,
1,2023-04-01 04:43:52,Other,LIGHT BLUE HEAD PADDING MATERIAL MUST BE USED,,,,,,
2,2023-04-01 04:45:51,Other,RISK OF RAIN FOR F1 QUALIFYING SESSION IS 90%,,,,,,
3,2023-04-01 05:00:00,Flag,GREEN LIGHT - PIT EXIT OPEN,,GREEN,Track,,,
4,2023-04-01 05:04:31,Flag,YELLOW IN TRACK SECTOR 19,,YELLOW,Sector,19.0,,
...,...,...,...,...,...,...,...,...,...
60,2023-04-01 06:15:42,Flag,CLEAR IN TRACK SECTOR 7,,CLEAR,Sector,7.0,,
61,2023-04-01 06:15:49,Flag,CLEAR IN TRACK SECTOR 18,,CLEAR,Sector,18.0,,
62,2023-04-01 06:16:17,Flag,CLEAR IN TRACK SECTOR 6,,CLEAR,Sector,6.0,,
63,2023-04-01 06:16:19,Flag,YELLOW IN TRACK SECTOR 13,,YELLOW,Sector,13.0,,


In [95]:
import plotly.graph_objects as go

# Plotting race control messages over time
fig_messages1 = px.scatter(race.race_control_messages, x='Time', y='Category', template='none')
fig_messages1.update_layout(title='Race Control Messages during Race', xaxis_title='Time', template='none')
fig_messages1.show()

fig_messages2 = px.scatter(race.race_control_messages, x='Time', y='Sector', color='Category', template='none')
fig_messages2.update_layout(title='Race Control Messages during Race', xaxis_title='Time', template='none')
fig_messages2.show()


As you can see from the plots above, the race had several incidents and changes in track status, which significantly impacted the ranking and overall race dynamics.

Let's analyze the lap times during the race, similar to what we did for qualifying, taking into account the outliers caused by race events.

In [33]:
# calculate thresholds for outliers for race lap times
Q1_race = filtered_race['LapTime'].quantile(0.25)
Q3_race = filtered_race['LapTime'].quantile(0.75)
IQR_race = Q3_race - Q1_race

upper_fence_race = Q3_race + 1.5 * IQR_race
lower_fence_race = Q1_race - 1.5 * IQR_race

# filter outliers from race lap times
without_outliers_race = filtered_race[(filtered_race['LapTime'] >= lower_fence_race) & (filtered_race['LapTime'] <= upper_fence_race)].copy()

# calculate percentage difference per lap for race
without_outliers_race['LapTimePercentageDifference'] = without_outliers_race['LapTime'].pct_change() * 100

In [34]:
import plotly.graph_objects as go

fig_race_pct_diff = go.Figure()
fig_race_pct_diff.add_trace(go.Bar(x=without_outliers_race['LapNumber'], y=without_outliers_race['LapTimePercentageDifference']))
fig_race_pct_diff.update_layout(title='Percentage Difference in Race Lap Time per Lap (Outliers Filtered)', xaxis_title='Lap Number', yaxis_title='Percentage Difference', template='none')

# Add vertical lines for track status changes to the race plot
for index, row in race.track_status.iterrows():
    # Find the closest lap number in without_outliers_race based on time
    closest_lap_race = without_outliers_race.loc[without_outliers_race['Time'].sub(row['Time']).abs().idxmin()]
    fig_race_pct_diff.add_vline(x=closest_lap_race['LapNumber'], line_width=1, line_dash="dash", line_color=track_status_colors[row['Message']], name=row['Message'])

fig_race_pct_diff.show()

Similar to the qualifying analysis, the percentage difference in lap times during the race, after filtering outliers, shows the variability in performance throughout the race. The vertical lines again indicate track status changes that might have influenced lap times.

In [35]:
# calculate percentage difference to average lap time for race
av_race = without_outliers_race['LapTime'].mean()
without_outliers_race['LapTimePercentageDifferenceToAverage'] = (without_outliers_race['LapTime'] - av_race).dt.total_seconds() / av_race.total_seconds() * 100

In [36]:
import plotly.graph_objects as go

fig_race_pct_diff_avg = go.Figure()
fig_race_pct_diff_avg.add_trace(go.Bar(x=without_outliers_race['LapNumber'], y=without_outliers_race['LapTimePercentageDifferenceToAverage']))
fig_race_pct_diff_avg.update_layout(title='Percentage Difference to Average Race Lap Time per Lap (Outliers Filtered)', xaxis_title='Lap Number', yaxis_title='Percentage Difference to Average', template='none')

# Add vertical lines for track status changes to the race plot
for index, row in race.track_status.iterrows():
    # Find the closest lap number in without_outliers_race based on time
    closest_lap_race = without_outliers_race.loc[without_outliers_race['Time'].sub(row['Time']).abs().idxmin()]
    fig_race_pct_diff_avg.add_vline(x=closest_lap_race['LapNumber'], line_width=1, line_dash="dash", line_color=track_status_colors[row['Message']], name=row['Message'])

fig_race_pct_diff_avg.show()

This plot shows the percentage difference of each lap time compared to the average lap time during the race, again with outliers removed. It gives a clearer picture of how consistently Carlos Sainz Jr. was performing relative to his average pace throughout the race, considering the various race events.