# pit-perfect
Exploration of strategy choice during the 2024 Brazillian Grand Prix. This race was incredibly diverse in strategy with many stories within. Who did what they could during safety car restarts, and who ight have failed to capitalise on optimal pit windows?

Why? F1 interests me in many ways. Look to Piere Gasley's maiden win and more recently Hulkenburg's surprise podium late in his career. Although fortuitous circumstances aided in their respective achievements, generally, strategy comes down to pitting at the right moment. Maintaining pace and keeping track position.

What I hope to learn: Increase my comfortability working with Python and analysing somewhat disconnected and messy data sets from API's and other sources. I was also curious watching this race what on earth was going on strategy wise and how the teams managed their races.

## 1. Preliminaries

### 1.1 Import Libraries

In [163]:
# Import libraries
import pandas as pd
import requests
from datetime import datetime
import json


### 1.2 Import Data

In [164]:
lap_times_df = pd.read_csv('data/lap_times.csv')
circuits_df = pd.read_csv('data/circuits.csv')
races_df = pd.read_csv('data/races.csv')
drivers_df = pd.read_csv('data/drivers.csv')
pit_stops_df = pd.read_csv('data/drivers.csv')

# circuits (circuitId) <- races (raceId) <- lap_times -> drivers (driverId)


### 1.3 Extract Lap Times
As the weather API has no lap-by-lap data, we need to do this manually and map to each lap time. 
As for the lap data, there is no API which provides time stamps for each lap start time, so we will extract these from the lap data ourselves. 
Fortunately, this is quite simple to do. In a race, the leader determines when the next lap starts, so all we need to do is for each lap, find the earliest time, and this is the time that lap begun. From there we will approximate to the weather data the closest time and connect these features to provide approximate lap-by-lap weather.

In [165]:
# Get meeting and session details for Brazillian Grand Prix 2024
location = "Brazil"
year = 2024
url = f"https://api.openf1.org/v1/sessions?country_name={location}&session_name=Race&year={year}"
session_details_response = requests.get(url).json()
meeting = session_details_response[0]["meeting_key"]
session = session_details_response[0]["session_key"]
print(f"Meeting Key ({meeting}), Session Key ({session})")


Meeting Key (1249), Session Key (9636)


In [166]:
# Get raw lap data
url = f"https://api.openf1.org/v1/laps?session_key={session}"
lap_data_response = requests.get(url).json()
df_laps = pd.DataFrame(lap_data_response)
df_laps.sample(n=3)

Unnamed: 0,meeting_key,session_key,driver_number,lap_number,date_start,duration_sector_1,duration_sector_2,duration_sector_3,i1_speed,i2_speed,is_pit_out_lap,lap_duration,segments_sector_1,segments_sector_2,segments_sector_3,st_speed
730,1249,9636,50,42,2024-11-03T17:18:02.147000+00:00,26.293,58.854,34.787,188.0,110.0,False,119.934,"[None, 2048, 2048, 2048, 2048, 2048, 2048]","[2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048]","[2048, 2048, 2048, 2048, 2048]",175.0
1125,1249,9636,16,69,2024-11-03T17:55:59.569000+00:00,21.186,43.692,18.116,303.0,228.0,False,82.994,"[None, 2048, 2048, 2048, 2048, 2048, 2048]","[2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048]","[2048, 2048, 2048, 2048, 2048]",293.0
1115,1249,9636,44,68,2024-11-03T17:54:59.074000+00:00,21.202,42.957,17.945,306.0,229.0,False,82.104,"[2048, 2049, 2048, 2048, 2048, 2048, 2048]","[2048, 2048, 2051, 2049, 2048, 2049, 2048, 2048]","[2048, 2048, 2048, 2048, 2048]",277.0


In [167]:
# Now we need to get only the earliest lap which in effect gives us the time the lap started
leader_laps = df_laps.groupby('lap_number', as_index=False).agg({'date_start': 'min'})
leader_laps.rename(columns={"date_start": "lap_date_start"}, inplace=True)
leader_laps.loc[0, "lap_date_start"] = "2024-11-03T15:50:00.000000+00:00" # Race start time
leader_laps['lap_date_start'] = pd.to_datetime(leader_laps['lap_date_start'])
leader_laps.head(n=3)

Unnamed: 0,lap_number,lap_date_start
0,1,2024-11-03 15:50:00+00:00
1,2,2024-11-03 15:51:28.489000+00:00
2,3,2024-11-03 15:52:54.148000+00:00


### 1.4 Retrieve Weather Data

In [168]:
# Now we can query for the weather data during the event
url = f"https://api.openf1.org/v1/weather?meeting_key={meeting}&session_key={session}"
response = requests.get(url).json()
response

[{'date': '2024-11-03T14:38:25.648000+00:00',
  'session_key': 9636,
  'air_temperature': 22.2,
  'track_temperature': 27.3,
  'humidity': 86.0,
  'pressure': 927.4,
  'wind_direction': 0,
  'meeting_key': 1249,
  'wind_speed': 0.4,
  'rainfall': 1},
 {'date': '2024-11-03T14:39:25.648000+00:00',
  'session_key': 9636,
  'air_temperature': 22.3,
  'track_temperature': 26.2,
  'humidity': 87.0,
  'pressure': 927.3,
  'wind_direction': 191,
  'meeting_key': 1249,
  'wind_speed': 0.0,
  'rainfall': 1},
 {'date': '2024-11-03T14:40:25.648000+00:00',
  'session_key': 9636,
  'air_temperature': 22.1,
  'track_temperature': 26.4,
  'humidity': 86.0,
  'pressure': 927.4,
  'wind_direction': 182,
  'meeting_key': 1249,
  'wind_speed': 0.8,
  'rainfall': 1},
 {'date': '2024-11-03T14:41:25.651000+00:00',
  'session_key': 9636,
  'air_temperature': 22.1,
  'track_temperature': 26.4,
  'humidity': 87.0,
  'pressure': 927.4,
  'wind_direction': 210,
  'meeting_key': 1249,
  'wind_speed': 0.5,
  'rainf

## 2. Data Exploration
Now that we have relatively clean lap-by-lap data, weather aligned, we can begin to explore the strategy most teams adopted, and how they went.