# Background

What I love about Formula 1 and its place in sports, is the attention all the teams pay to every ounce of performance it can gain on and off the track. It's pursuit of continuous improvement is truly admirable. On that note, I was very interested in the data on their pit stops and how it might provide some insights to parts of the sport I had not considered before.

Since the beginning of Formula 1, the pit stop has been an iconic part of the race format. At the beginning of the sport, the time spent in the pit lane made up a significant portion of the race. Being the optimizers these teams were, the time spent in the pit played a key part of shaving meaningful time off a team's race ultimately translating into success on the track. Technologies were developed and processes evaluated to minimize pit times, and rules changed over time to adapt to both performance and safety developments.

Technologies have gotten to a point where cars no longer need to be pitted to refuel or change tyres; however, the pit stop is a mainstay in the sport and adds an element of uncertainty and strategies making the sport all the more exciting for us the spectators. This notebook will explore the pit stop data and shed some light on how important the pit stop is in a race and what does it mean to have a "good" pit stop.

# Objective

This notebook will explore F1 pit stop data and gain a better understanding of how the pit stop plays its role in the F1 Grand Prix. Using the findings, I will attempt to answer the questions below which will hopefully yield some new perspectives on the sport!

## Questions I want to answer:
* How did pit stop durations change over time?
* Is there a relationship between pit stop durations and constructor?
* Is there a relationship between pit stop durations and race circuit?
* What is the time spent in the pit lane as a percentage of the race?
* Who is the best constructor on pit stop performance?

Without any further ado, let's go!

<div style="width:100%; text-align:center"><img style="align:middle; width:100%" src="https://c.tenor.com/tg28WEO2hLoAAAAC/f1-pitstop.gif"></div>

# Setup

The usual boring stuff but necessary stuff to get all set up. Importing libraries, reading in data, and cleaning it up. You know the drill.

In [96]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import datetime as dt
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px

import os
fnames = []
fpaths = []
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        fnames.append(filename.split('.')[0])
        fpaths.append(os.path.join(dirname, filename))
        print(os.path.join(dirname, filename))

## Reading the data

In [97]:
# Constructor color mapping
constructor_color_map = {
    'Toro Rosso':'#0000FF',
    'Mercedes':'#6CD3BF',
    'Red Bull':'#1E5BC6',
    'Ferrari':'#ED1C24',
    'Williams':'#37BEDD',
    'Force India':'#FF80C7',
    'Virgin':'#c82e37',
    'Renault':'#FFD800',
    'McLaren':'#F58020',
    'Sauber':'#006EFF',
    'Lotus':'#FFB800',
    'HRT':'#b2945e',
    'Caterham':'#0b361f',
    'Lotus F1':'#FFB800',
    'Marussia':'#6E0000',
    'Manor Marussia':'#6E0000',
    'Haas F1 Team':'#B6BABD',
    'Racing Point':'#F596C8',
    'Aston Martin':'#2D826D',
    'Alfa Romeo':'#B12039',
    'AlphaTauri':'#4E7C9B',
    'Alpine F1 Team':'#2293D1'
}

# Pit Stop Data

<b>Note:</b> In the context of the data I'm using, pit stop durations include the total time in the pit lane and not only when the car is stationary.  

In [98]:
pitStops

Unnamed: 0,raceId,driverId,stop,lap,pitTime,duration,milliseconds,seconds
0,841,153,1,1,17:05:23,26.898,26898,26.898
1,841,30,1,1,17:05:52,25.021,25021,25.021
2,841,17,1,11,17:20:48,23.426,23426,23.426
3,841,4,1,12,17:22:34,23.251,23251,23.251
4,841,13,1,13,17:24:10,23.842,23842,23.842
...,...,...,...,...,...,...,...,...
10985,1132,807,2,39,16:06:28,30.265,30265,30.265
10986,1132,840,2,39,16:06:33,29.469,29469,29.469
10987,1132,839,4,38,16:06:52,29.086,29086,29.086
10988,1132,815,4,47,16:20:38,28.871,28871,28.871


In [99]:
pitStops.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
raceId,10990.0,975.731665,89.041843,841.0,893.0,967.0,1055.0,1132.0
driverId,10990.0,542.601274,385.555797,1.0,20.0,817.0,832.0,860.0
stop,10990.0,1.797179,1.540691,1.0,1.0,2.0,2.0,70.0
lap,10990.0,25.314741,14.896984,1.0,13.0,25.0,36.0,78.0
milliseconds,10990.0,85304.309554,311489.432628,12897.0,21951.25,23629.0,26503.5,3069017.0
seconds,10990.0,85.30431,311.489433,12.897,21.95125,23.629,26.5035,3069.017


In [100]:
newResults = pd.merge(results,races,left_on='raceId',right_index=True,how='left')
newResults = pd.merge(newResults,circuits,left_on='circuitId',right_index=True,how='left')
newResults = pd.merge(newResults,constructors,left_on='constructorId',right_index=True,how='left')
newResults = pd.merge(newResults,drivers,left_on='driverId',right_index=True,how='left')
newResults

Unnamed: 0_level_0,raceId,driverId,constructorId,number_x,grid,position,positionText,positionOrder,points,laps,...,constructorUrl,driverRef,number_y,code,forename,surname,dob,driverNationality,driverUrl,driverName
resultId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,18,1,1,22.0,1,1.0,1,1,10.0,58,...,http://en.wikipedia.org/wiki/McLaren,hamilton,44.0,HAM,Lewis,Hamilton,1985-01-07,British,http://en.wikipedia.org/wiki/Lewis_Hamilton,Lewis Hamilton
2,18,2,2,3.0,5,2.0,2,2,8.0,58,...,http://en.wikipedia.org/wiki/BMW_Sauber,heidfeld,,HEI,Nick,Heidfeld,1977-05-10,German,http://en.wikipedia.org/wiki/Nick_Heidfeld,Nick Heidfeld
3,18,3,3,7.0,7,3.0,3,3,6.0,58,...,http://en.wikipedia.org/wiki/Williams_Grand_Pr...,rosberg,6.0,ROS,Nico,Rosberg,1985-06-27,German,http://en.wikipedia.org/wiki/Nico_Rosberg,Nico Rosberg
4,18,4,4,5.0,11,4.0,4,4,5.0,58,...,http://en.wikipedia.org/wiki/Renault_in_Formul...,alonso,14.0,ALO,Fernando,Alonso,1981-07-29,Spanish,http://en.wikipedia.org/wiki/Fernando_Alonso,Fernando Alonso
5,18,5,1,23.0,3,5.0,5,5,4.0,58,...,http://en.wikipedia.org/wiki/McLaren,kovalainen,,KOV,Heikki,Kovalainen,1981-10-19,Finnish,http://en.wikipedia.org/wiki/Heikki_Kovalainen,Heikki Kovalainen
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
26520,1132,839,214,31.0,18,16.0,16,16,0.0,50,...,http://en.wikipedia.org/wiki/Alpine_F1_Team,ocon,31.0,OCO,Esteban,Ocon,1996-09-17,French,http://en.wikipedia.org/wiki/Esteban_Ocon,Esteban Ocon
26521,1132,815,9,11.0,0,17.0,17,17,0.0,50,...,http://en.wikipedia.org/wiki/Red_Bull_Racing,perez,11.0,PER,Sergio,Pérez,1990-01-26,Mexican,http://en.wikipedia.org/wiki/Sergio_P%C3%A9rez,Sergio Pérez
26522,1132,855,15,24.0,14,18.0,18,18,0.0,50,...,http://en.wikipedia.org/wiki/Sauber_Motorsport,zhou,24.0,ZHO,Guanyu,Zhou,1999-05-30,Chinese,http://en.wikipedia.org/wiki/Zhou_Guanyu,Guanyu Zhou
26523,1132,847,131,63.0,1,,R,19,0.0,33,...,http://en.wikipedia.org/wiki/Mercedes-Benz_in_...,russell,63.0,RUS,George,Russell,1998-02-15,British,http://en.wikipedia.org/wiki/George_Russell_(r...,George Russell


In [101]:
newPitStops = pd.merge(pitStops,races,left_on='raceId',right_index=True,how='left')
newPitStops = pd.merge(newPitStops,circuits,left_on='circuitId',right_index=True,how='left')
newPitStops = pd.merge(newPitStops,newResults[['raceId','driverId','driverName','constructorId','constructorName']],left_on=['raceId','driverId'],right_on=['raceId','driverId'])
newPitStops

Unnamed: 0,raceId,driverId,stop,lap,pitTime,duration,milliseconds,seconds,year,round,...,circuitName,circuitLocation,circuitCountry,lat,lng,alt,circuitUrl,driverName,constructorId,constructorName
0,841,153,1,1,17:05:23,26.898,26898,26.898,2011,1,...,Albert Park Grand Prix Circuit,Melbourne,Australia,-37.8497,144.96800,10,http://en.wikipedia.org/wiki/Melbourne_Grand_P...,Jaime Alguersuari,5,Toro Rosso
1,841,30,1,1,17:05:52,25.021,25021,25.021,2011,1,...,Albert Park Grand Prix Circuit,Melbourne,Australia,-37.8497,144.96800,10,http://en.wikipedia.org/wiki/Melbourne_Grand_P...,Michael Schumacher,131,Mercedes
2,841,17,1,11,17:20:48,23.426,23426,23.426,2011,1,...,Albert Park Grand Prix Circuit,Melbourne,Australia,-37.8497,144.96800,10,http://en.wikipedia.org/wiki/Melbourne_Grand_P...,Mark Webber,9,Red Bull
3,841,4,1,12,17:22:34,23.251,23251,23.251,2011,1,...,Albert Park Grand Prix Circuit,Melbourne,Australia,-37.8497,144.96800,10,http://en.wikipedia.org/wiki/Melbourne_Grand_P...,Fernando Alonso,6,Ferrari
4,841,13,1,13,17:24:10,23.842,23842,23.842,2011,1,...,Albert Park Grand Prix Circuit,Melbourne,Australia,-37.8497,144.96800,10,http://en.wikipedia.org/wiki/Melbourne_Grand_P...,Felipe Massa,6,Ferrari
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10985,1132,807,2,39,16:06:28,30.265,30265,30.265,2024,12,...,Silverstone Circuit,Silverstone,UK,52.0786,-1.01694,153,http://en.wikipedia.org/wiki/Silverstone_Circuit,Nico Hülkenberg,210,Haas F1 Team
10986,1132,840,2,39,16:06:33,29.469,29469,29.469,2024,12,...,Silverstone Circuit,Silverstone,UK,52.0786,-1.01694,153,http://en.wikipedia.org/wiki/Silverstone_Circuit,Lance Stroll,117,Aston Martin
10987,1132,839,4,38,16:06:52,29.086,29086,29.086,2024,12,...,Silverstone Circuit,Silverstone,UK,52.0786,-1.01694,153,http://en.wikipedia.org/wiki/Silverstone_Circuit,Esteban Ocon,214,Alpine F1 Team
10988,1132,815,4,47,16:20:38,28.871,28871,28.871,2024,12,...,Silverstone Circuit,Silverstone,UK,52.0786,-1.01694,153,http://en.wikipedia.org/wiki/Silverstone_Circuit,Sergio Pérez,9,Red Bull


In [102]:
# Perform the groupby operation on a subset of columns from newPitStops,
# including the required raceId, driverId, constructorName, and driverName,
# and then merging the result with newResults

# First, calculate the sum of pit stop durations for each race, driver, and constructor
pitStopSummary = newPitStops.groupby(by=['raceId','driverId','constructorName','driverName'])['milliseconds'].sum().reset_index()

# Then, merge the summary with newResults
raceResults = pd.merge(newResults,pitStopSummary,left_on=['raceId','driverId','constructorName','driverName'],right_on=['raceId','driverId','constructorName','driverName'],how='left')

# Correct the column names used in the calculation:
# 'milliseconds' from pitStopSummary is now 'milliseconds_y' in raceResults
# 'milliseconds' from newResults is now 'milliseconds_x' in raceResults
raceResults['pitPercentage'] = raceResults['milliseconds_y']/raceResults['milliseconds_x']*100
raceResults

Unnamed: 0,raceId,driverId,constructorId,number_x,grid,position,positionText,positionOrder,points,laps,...,number_y,code,forename,surname,dob,driverNationality,driverUrl,driverName,milliseconds_y,pitPercentage
0,18,1,1,22.0,1,1.0,1,1,10.0,58,...,44.0,HAM,Lewis,Hamilton,1985-01-07,British,http://en.wikipedia.org/wiki/Lewis_Hamilton,Lewis Hamilton,,
1,18,2,2,3.0,5,2.0,2,2,8.0,58,...,,HEI,Nick,Heidfeld,1977-05-10,German,http://en.wikipedia.org/wiki/Nick_Heidfeld,Nick Heidfeld,,
2,18,3,3,7.0,7,3.0,3,3,6.0,58,...,6.0,ROS,Nico,Rosberg,1985-06-27,German,http://en.wikipedia.org/wiki/Nico_Rosberg,Nico Rosberg,,
3,18,4,4,5.0,11,4.0,4,4,5.0,58,...,14.0,ALO,Fernando,Alonso,1981-07-29,Spanish,http://en.wikipedia.org/wiki/Fernando_Alonso,Fernando Alonso,,
4,18,5,1,23.0,3,5.0,5,5,4.0,58,...,,KOV,Heikki,Kovalainen,1981-10-19,Finnish,http://en.wikipedia.org/wiki/Heikki_Kovalainen,Heikki Kovalainen,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
26514,1132,839,214,31.0,18,16.0,16,16,0.0,50,...,31.0,OCO,Esteban,Ocon,1996-09-17,French,http://en.wikipedia.org/wiki/Esteban_Ocon,Esteban Ocon,115146.0,
26515,1132,815,9,11.0,0,17.0,17,17,0.0,50,...,11.0,PER,Sergio,Pérez,1990-01-26,Mexican,http://en.wikipedia.org/wiki/Sergio_P%C3%A9rez,Sergio Pérez,118875.0,
26516,1132,855,15,24.0,14,18.0,18,18,0.0,50,...,24.0,ZHO,Guanyu,Zhou,1999-05-30,Chinese,http://en.wikipedia.org/wiki/Zhou_Guanyu,Guanyu Zhou,121224.0,
26517,1132,847,131,63.0,1,,R,19,0.0,33,...,63.0,RUS,George,Russell,1998-02-15,British,http://en.wikipedia.org/wiki/George_Russell_(r...,George Russell,32045.0,


# Exploratory Data Analysis

## How did pit stop durations change over time?

Main observations
* Average pit times had a meaningful increase from 2013 to 2014
* Average pit times have been fairly stable from 2014 onwards
* Majority of pit times are clustered around 20-35s
* Pit durations appear to have more variance in the recent years

In [103]:
fig = px.line(
    newPitStops[newPitStops['seconds'] < 50]
    .assign(seconds=lambda df: pd.to_timedelta(df['time']).dt.total_seconds()) # Convert 'time' to seconds
    .groupby(by=['year', 'constructorName'])['seconds'].mean() #Calculate mean on numeric column
    .reset_index(),
    x='year',
    y='seconds',
    color='constructorName',
    color_discrete_map=constructor_color_map,
)
fig.update_layout(title_text='Average Pit Stop Durations by Constructor')
fig.show()

In [104]:
fig = px.scatter(newPitStops[newPitStops['seconds']<50],
                 x='date',
                 y='seconds',
                 color='constructorName',
                 color_discrete_map=constructor_color_map,
                )
fig.update_layout(
    title_text='Pit Stop Durations over Time by Constructor',
)
fig.show()

In [105]:
fig = px.box(newPitStops[newPitStops['seconds']<50],
                 x='date',
                 y='seconds',
                 color='constructorName',
                 color_discrete_map=constructor_color_map,
                )
fig.update_layout(
    title_text='Pit Stop Durations over Time by Constructor',
)
fig.show()

## Is there a relationship between pit stop durations and constructors?

Main observations:
* Constructors on average are fairly similar in pit durations
* No significant performance discrepancy. Minor performance variations.

In [106]:
fig = px.box(newPitStops[newPitStops['seconds']<50]
             .assign(seconds=lambda df: pd.to_timedelta(df['time']).dt.total_seconds()) # Convert 'time' to total seconds if 'time' is a duration string
             .groupby(by=['raceId','date','constructorName']) # Removed 'raceName' from groupby keys
             .agg({'seconds': 'mean'}) # Explicitly specify 'seconds' column for mean calculation
             .reset_index()
             .sort_values(by='seconds',ascending=True),
             x='constructorName',
             y='seconds',
             color='constructorName',
             color_discrete_map=constructor_color_map,
            )
fig.update_layout(
    title_text='Pit Stop Durations by Constructor from 2011 to date',
)
fig.show()

In [107]:
year = 2024
# Ensure 'raceName' column is present before groupby
if 'raceName' in newPitStops.columns:
    fig = px.box(newPitStops[(newPitStops['seconds']<50)&(newPitStops['year']==year)]
                 .groupby(by=['raceId','raceName','date','constructorName'])
                 .mean()
                 .reset_index()
                 .sort_values(by='seconds',ascending=True),
                 x='constructorName',
                 y='seconds',
                 color='constructorName',
                 color_discrete_map=constructor_color_map,
                )
    fig.update_layout(
        title_text=f'Pit Stop Durations by Constructor for {year} Season',
    )
    fig.show()
else:
    print("Column 'raceName' not found in DataFrame. Please check your data.")

Column 'raceName' not found in DataFrame. Please check your data.


## Is there a relationship between pit stop durations and race circuit?

Main Observations:
* Race circuits appear to have a more significant impact on overall pit duration
* Race circuits appear to have an impact on total pit time over the course of the race
* Some circuits have larger variances, but on average the variance from track to track appear to be fairly consistent

In [108]:
# Check if 'raceName' column exists before groupby
if 'raceName' in newPitStops.columns:
    fig = px.box(newPitStops[newPitStops['seconds']<50].groupby(by=['raceId','raceName','circuitName']).mean().reset_index().sort_values(by='seconds',ascending=True),
                 x='circuitName',
                 y='seconds',
                )
    fig.update_layout(
        title_text='Pit Stop Durations by Race Circuit',
    )
    fig.show()
else:
    print("Column 'raceName' not found in DataFrame. Please check your data.")

Column 'raceName' not found in DataFrame. Please check your data.


In [109]:
fig = px.scatter(newPitStops[newPitStops['seconds']<50]
                 .assign(seconds=lambda df: pd.to_timedelta(df['time']).dt.total_seconds()) # Convert 'time' to total seconds if 'time' is a duration string
                 .groupby(by=['circuitName'])['seconds'].mean() # Select the 'seconds' column before calculating the mean
                 .reset_index()
                 .sort_values(by='seconds',ascending=True),
                 x='circuitName',
                 y='seconds',
                )
fig.update_layout(
    title_text='Average Race Pit Stop Durations by Circuit',
)
fig.show()

In [110]:
# Check if all required columns exist before groupby
required_columns = ['raceId', 'raceName', 'circuitName', 'constructorName']
if all(col in newPitStops.columns for col in required_columns):
    fig = px.box(newPitStops[newPitStops['seconds'] < 50].groupby(by=required_columns).mean().reset_index().sort_values(by='seconds', ascending=True),
                 x='circuitName',
                 y='seconds',
                 color='constructorName',
                 color_discrete_map=constructor_color_map,
                 )
    fig.update_layout(
        title_text='Average Race Pit Stop Durations by Race Circuit',
    )
    fig.show()
else:
    missing_cols = [col for col in required_columns if col not in newPitStops.columns]
    print(f"Columns {missing_cols} not found in DataFrame. Please check your data.")

Columns ['raceName'] not found in DataFrame. Please check your data.


## Total Time in the Pit Lane

In [111]:
# Instead of directly summing, select only numerical columns for the sum operation.
# This can be achieved by using the 'numeric_only=True' argument in the sum() function.
numeric_cols = newPitStops.select_dtypes(include=np.number).columns
# Instead of resetting the index, use the 'as_index=False' in the first groupby
# to keep the grouping columns as regular columns.
result = newPitStops[newPitStops['seconds'] < 50].groupby(by=['raceId', 'circuitName', 'driverId'], as_index=False)[numeric_cols].sum(numeric_only=True).groupby(by=['raceId', 'circuitName']).mean()
result

Unnamed: 0_level_0,Unnamed: 1_level_0,driverId,stop,lap,milliseconds,seconds,year,round,circuitId,lat,lng,alt,constructorId
raceId,circuitName,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
841,Albert Park Grand Prix Circuit,208.250000,1.000000,18.500000,23952.750000,23.952750,2011.0,1.0,1.0,-37.84970,144.96800,10.0,129.250000
842,Sepang International Circuit,415.000000,1.000000,15.000000,24556.500000,24.556500,2011.0,2.0,2.0,2.76083,101.73800,18.0,110.000000
843,Shanghai International Circuit,96.000000,1.000000,16.000000,24234.000000,24.234000,2011.0,3.0,17.0,31.33890,121.22000,5.0,84.500000
845,Circuit de Barcelona-Catalunya,24.000000,1.000000,16.000000,23157.000000,23.157000,2011.0,5.0,4.0,41.57000,2.26111,109.0,164.000000
846,Circuit de Monaco,151.428571,1.000000,24.285714,28979.714286,28.979714,2011.0,6.0,6.0,43.73470,7.42056,7.0,70.714286
...,...,...,...,...,...,...,...,...,...,...,...,...,...
5820,Shanghai International Circuit,2484.000000,21.000000,87.500000,122066.000000,122.066000,12102.0,12.0,102.0,188.03340,727.32000,30.0,39.000000
5892,Marina Bay Street Circuit,3769.500000,21.000000,84.250000,185480.250000,185.480250,12102.0,84.0,90.0,7.74840,623.18400,108.0,357.000000
6120,Hockenheimring,4950.000000,21.000000,202.000000,125009.000000,125.009000,12114.0,66.0,60.0,295.96680,51.39498,618.0,1260.000000
6588,Bahrain International Circuit,5076.000000,21.000000,192.000000,195334.000000,195.334000,12138.0,6.0,18.0,156.19500,303.06360,42.0,6.000000


In [112]:
# Instead of directly summing, select only numerical columns for the sum operation.
# This can be achieved by using the 'numeric_only=True' argument in the sum() function.
numeric_cols = newPitStops.select_dtypes(include=np.number).columns
# Instead of resetting the index, use the 'as_index=False' in the first groupby
# to keep the grouping columns as regular columns.
fig = px.box(newPitStops[newPitStops['seconds'] < 50].groupby(by=['raceId', 'circuitName', 'driverId'], as_index=False)[numeric_cols].sum(numeric_only=True).reset_index().sort_values(by='seconds', ascending=True),
             x='circuitName',
             y='seconds',
            )
fig.update_layout(
    title_text='Total Time Spent in Pit Lane by Circuit',
)
fig.show()

## Percentage of race spent in the pit lane

Main Observations:
* Findings from pit percentage unsurprisingly are very similar to the average pit time with its correlation to circuit  
* There doesn't appear to be much of a correlation between percentage of time in the pit and the race outcome
* Pit percentage does not appear to yield any interesting insights

In [113]:
import plotly.express as px
import numpy as np
import pandas as pd
# Assuming 'raceResults' is your DataFrame and 'constructor_color_map' is defined
fig = px.box(
    raceResults[raceResults['pitPercentage'] < 10]
    .groupby(by=['raceId', 'circuitName', 'constructorName'], as_index=False)
    # Explicitly selecting numeric columns before calculating the mean
    .agg({col: 'mean' for col in raceResults.select_dtypes(include=np.number).columns})
    .reset_index()
    .sort_values(by='pitPercentage', ascending=True),
    x='circuitName',
    y='pitPercentage',
    color='constructorName',
    color_discrete_map=constructor_color_map,
)
fig.update_layout(
    title_text='Average Race Percentage in the Pit Lane by Race Circuit',
)
# Correcting the line for calculating and adding the average line
avg_pit_percentage = raceResults[raceResults['pitPercentage'] < 10]['pitPercentage'].mean()
fig.add_hline(
    y=avg_pit_percentage,
    line_dash='dash',
    annotation_text=f"Average pit percentage: {avg_pit_percentage:.2f}%"
)
fig.show()

In [114]:
# Use as_index=False in the groupby to avoid raceId, circuitName, and constructorName
# becoming part of the index. This will prevent the conflict when using reset_index().
fig = px.scatter(raceResults[raceResults['pitPercentage']<10].groupby(by=['raceId','circuitName','constructorName'], as_index=False)[raceResults.select_dtypes(include=np.number).columns].mean(),
                 x='pitPercentage',
                 y='positionOrder',
                 color='constructorName',
                 color_discrete_map=constructor_color_map,
                )

## What is a "good" pit stop?

After exploring some relationships that pit stops might have with other features, the circuit has a meaningful effect to the pit times measured. To get a good reference on what a "good" pit stop is, we can take a look at the distribution of all the pit stops available.

There may be some minor normalization based on the circuit to provide a more meaningful comparison with respect to a particular pit time.

In [115]:
fig = px.histogram(newPitStops[(newPitStops['seconds']<50)],
                 x='seconds',

                )
fig.update_layout(
    title_text='Pit Stop Duration Distribution',
)
fig.add_vline(x=newPitStops[(newPitStops['seconds']<50)]['seconds'].mean(),annotation_text=f"Average: {newPitStops[(newPitStops['seconds']<50)]['seconds'].mean():.2f}s")
fig.add_vline(x=newPitStops[(newPitStops['seconds']<50)]['seconds'].quantile(0.1),line_dash='dash',annotation_text=f"Top Decile: {newPitStops[(newPitStops['seconds']<50)]['seconds'].quantile(0.1):.2f}s")
fig.add_vline(x=newPitStops[(newPitStops['seconds']<50)]['seconds'].quantile(0.9),line_dash='dash',annotation_text=f"Bottom Decile: {newPitStops[(newPitStops['seconds']<50)]['seconds'].quantile(0.9):.2f}s")
fig.update_traces(opacity=0.9)
fig.show()

### Circuit Specific Benchmarks

Double click on the circuit to see the circuit specific pit time distribution.

In [116]:
fig = px.histogram(newPitStops[(newPitStops['seconds']<50)],
                 x='seconds',
                 color='circuitName',
                )
fig.update_layout(
    title_text='Pit Stop Duration Distribution by Circuit',
    barmode='overlay',
)
fig.add_vline(x=newPitStops[(newPitStops['seconds']<50)]['seconds'].mean(),annotation_text=f"Average: {newPitStops[(newPitStops['seconds']<50)]['seconds'].mean():.2f}s",annotation_position='top')
fig.add_vline(x=newPitStops[(newPitStops['seconds']<50)]['seconds'].quantile(0.1),line_dash='dash',annotation_text=f"Top Decile: {newPitStops[(newPitStops['seconds']<50)]['seconds'].quantile(0.1):.2f}s",annotation_position='top left')
fig.add_vline(x=newPitStops[(newPitStops['seconds']<50)]['seconds'].quantile(0.9),line_dash='dash',annotation_text=f"Bottom Decile: {newPitStops[(newPitStops['seconds']<50)]['seconds'].quantile(0.9):.2f}s",annotation_position='bottom right')
fig.update_traces(opacity=0.9)
fig.show()

# So who does it best?

As far as performance is concerned, it seems like speed and consistency are the two main factors that indicate a great team. Average pit time will provide both an indication of speed and consistency as an expected performance metric. Obviously the lower average the better. Standard deviation is another aspect that we can look at to evaluate the consistency of a team's ability to perform.

### Ranking on Average Pit Time

In [117]:
year = 2021
newPitStops[(newPitStops['seconds']<50)&(newPitStops['year']==year)].groupby(by='constructorName')['seconds'].describe().sort_values(by='mean')

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
constructorName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Red Bull,78.0,23.978026,4.623024,15.277,21.392,22.744,25.38925,44.608
Mercedes,75.0,24.118133,4.659454,15.432,21.5555,22.68,25.5,40.266
Ferrari,63.0,24.301857,4.996534,15.092,21.522,23.064,26.379,42.786
McLaren,66.0,24.527,4.924235,14.994,21.4925,23.5255,26.5325,38.267
Aston Martin,66.0,24.7995,5.382271,14.945,21.4905,23.55,26.13175,43.124
Williams,67.0,24.93691,5.009406,18.153,21.9615,23.681,26.507,46.315
Alpine F1 Team,61.0,24.983246,5.237372,15.432,21.452,23.844,29.116,40.8
Alfa Romeo,68.0,25.003559,5.171887,14.881,21.7985,24.0195,28.4005,37.19
AlphaTauri,68.0,25.212441,5.380044,14.943,21.75,24.3225,28.79725,40.74
Haas F1 Team,69.0,25.39829,5.636702,15.054,22.113,24.293,27.186,49.729


### Ranking on Consistency

In [118]:
newPitStops[(newPitStops['seconds']<50)&(newPitStops['year']==year)].groupby(by='constructorName')['seconds'].describe().sort_values(by='std')

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
constructorName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Red Bull,78.0,23.978026,4.623024,15.277,21.392,22.744,25.38925,44.608
Mercedes,75.0,24.118133,4.659454,15.432,21.5555,22.68,25.5,40.266
McLaren,66.0,24.527,4.924235,14.994,21.4925,23.5255,26.5325,38.267
Ferrari,63.0,24.301857,4.996534,15.092,21.522,23.064,26.379,42.786
Williams,67.0,24.93691,5.009406,18.153,21.9615,23.681,26.507,46.315
Alfa Romeo,68.0,25.003559,5.171887,14.881,21.7985,24.0195,28.4005,37.19
Alpine F1 Team,61.0,24.983246,5.237372,15.432,21.452,23.844,29.116,40.8
AlphaTauri,68.0,25.212441,5.380044,14.943,21.75,24.3225,28.79725,40.74
Aston Martin,66.0,24.7995,5.382271,14.945,21.4905,23.55,26.13175,43.124
Haas F1 Team,69.0,25.39829,5.636702,15.054,22.113,24.293,27.186,49.729


### Constructor Specific Performance

In [119]:
fig = px.histogram(newPitStops[(newPitStops['seconds']<50)&(newPitStops['year']==year)],
                 x='seconds',
                 color='constructorName',
                 color_discrete_map=constructor_color_map,
                )
fig.update_layout(
    title_text='Pit Stop Duration Distribution by Constructor',
    barmode='overlay',
)
fig.add_vline(x=newPitStops[(newPitStops['seconds']<50)]['seconds'].mean(),annotation_text=f"Average: {newPitStops[(newPitStops['seconds']<50)]['seconds'].mean():.2f}s",annotation_position='top')
fig.add_vline(x=newPitStops[(newPitStops['seconds']<50)]['seconds'].quantile(0.1),line_dash='dash',annotation_text=f"Top Decile: {newPitStops[(newPitStops['seconds']<50)]['seconds'].quantile(0.1):.2f}s",annotation_position='top left')
fig.add_vline(x=newPitStops[(newPitStops['seconds']<50)]['seconds'].quantile(0.9),line_dash='dash',annotation_text=f"Bottom Decile: {newPitStops[(newPitStops['seconds']<50)]['seconds'].quantile(0.9):.2f}s",annotation_position='bottom right')

fig.update_traces(opacity=0.5)
fig.show()

# Conclusions

Pit stops are cool and play an integral part of the F1 sport, but optimizing it is probably a waste of time (at least in today's state). The pit time on average accounts for less than 1% of the race time (0.83%). Your efforts are likely better spent in other areas.

### How did pit stop durations change over time?
Average pit stops have increased since 2013 and have stayed relatively similar since then. Pit stop times have also increased in variance over the past couple years as well

### Is there a relationship between pit stop durations and constructor?
Not meaningful. There are some differences between the constructors; however, it doesn't appear to make a significant results on the race outcome.

### Is there a relationship between pit stop durations and race circuit?
Yes. The circuits have an impact on the overall time spent in the pit lane. Either by the number of stops, track layout, or the length of pit lane.

### What is the time spent in the pit lane as a percentage of the race?
Average time spent in the pit lane is about

### Who is the best constructor on pit stop performance (for 2021)?
1. Red Bull
2. Mercedes
3. Ferrari

<img style="width:100%" src="https://www.wsupercars.com/thumbnails-wide/Formula-1/Red-Bull-Racing/2022-Formula1-Red-Bull-Racing-RB18-001.jpg">