<a href="https://colab.research.google.com/github/kennethajensen/FormulaOne/blob/main/F1_Download_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Analyzing Formula 1 Data (Part 1)**
## *How do I get some data, anyway?*

---

This notebook retrieves and processes Formula 1 qualifying session data from [OpenF1](https://openf1.org/)'s open-source API.

The goal is to use this data to create a profile of how each track is driven and to then analyze the results to find groups of tracks that are similar and identify any anomalies or changes over time.

The article [Analyzing Formula 1 Data | How do I get some data, anyway?](https://medium.com/@kenneth.agregaard.jensen/analyzing-formula-1-data-part-1-5b745527516a) describes the project and the data retrieval in more detail and also contains links to the further analysis and results.

This notebook collects the source data needed to perform the analyses that I currently have in mind. All data is saved locally as CSV files for persistent storage to avoid having to request the same data repeatedly from the API.

**Output:**\
A series of locally stored CSV files with the relevant data from the OpenF1 APIs.
- **Meetings:**\
  A record for each event on the Formula 1 calendar including pre-season testing events and regular race weekends.
- **Qualifying Sessions:** \
  A record for each qualifying session held during every meeting. These include both sprint qualifying and qualifying for the feature race.
- **Drivers:**\
  A record for each driver in every qualifying sessions held during every meeting. This contains the driver's number, name, team, nationality, and so on.
- **Qualifying Laps:** \
  One record for each lap during every qualifying session.
- **Fastest Qualifying Laps:** \
  The fastest qualifying lap from each driver in each qualifying session.
- **Car Data:** \
  The car's telemetry including speed, selected gear, throttle position, and brake usage during all fastest qualifying laps.\
  Averages around 4 records per second.
- **Location:** \
  The car's location in three dimensions during all fastest qualifying laps.\
  Averages around 4 records per second.


In [None]:
# Import all required libraries

# Used when querying the API endpoints
from urllib.request import urlopen
from urllib.error import HTTPError
# Used to convert the JSON response from the API to data tables
import json
# Allows access to Google Drive
from google.colab import drive
# Standard libraries
import pandas as pd
import numpy as np
from numpy import empty
import time
import os

# Set the base URL for the OpenF1 API
# Used by the 'get_data' function
base_url = 'https://api.openf1.org/v1/'

# Mount my Google Drive and set the path for the data storage
# Used when reading from and writing to Google Drive
drive.mount('/content/drive')
data_path = '/content/drive/MyDrive/Data Science/[02] Articles - Formula 1 [Work-in-progress]/Data/'

# Names for data files stored on Google Drive
meetings_file_name     = 'meetings.csv'
sessions_file_name     = 'qualifying_sessions.csv'
drivers_file_name      = 'drivers.csv'
laps_file_name         = 'qualifying_laps.csv'
fastest_laps_file_name = 'qualifying_fastest_laps.csv'
car_data_file_name     = 'car_data.csv'
location_file_name     = 'location.csv'
# Full file paths
meetings_path     = os.path.join(data_path, meetings_file_name)
sessions_path     = os.path.join(data_path, sessions_file_name)
drivers_path      = os.path.join(data_path, drivers_file_name)
laps_path         = os.path.join(data_path, laps_file_name)
fastest_laps_path = os.path.join(data_path, fastest_laps_file_name)
car_data_path     = os.path.join(data_path, car_data_file_name)
location_path     = os.path.join(data_path, location_file_name)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# Function: `get_data`

The `get_data` function is a crucial utility in this notebook for safely retrieving data from the OpenF1 API.

**Purpose:**

It's designed to fetch data from a specified API endpoint, optionally applying a filter. It also includes robust error handling, especially for API rate limits.

**Parameters:**

- `endpoint` (str): The specific API resource you want to query (e.g., `meetings`, `laps`).
- `filter` (str): An optional string containing URL query parameters to refine your request (e.g., `session_type=Qualifying`).\
It defaults to an empty string if no filter is provided.
- `max_retries` (int): The maximum number of times the function will attempt to retry the API call if a rate limit (HTTP 429) error occurs. Defaults to 5.
- `initial_retry_delay` (int): The starting delay in seconds before the first retry. Defaults to 2.

**Functionality:**

1. **Retry Loop:** It uses a `while` loop to attempt the API request up to `max_retries` times.
2. **API Request:** It constructs the full API URL using the `base_url`, `endpoint`, and `filter`, then makes the request using `urlopen()`.
3. **JSON Parsing:** The response is read, decoded from bytes to UTF-8, and then parsed as a JSON object into a Python list of dictionaries.
4. **DataFrame Conversion:** This JSON data is then converted into a Pandas DataFrame, making it easy to work with.
5. Rate Limit Handling (HTTP 429): If the API returns an HTTP 429 (Too Many Requests) error, the function prints a message, pauses for `retry_delay` seconds (which doubles with each subsequent retry â€“ an exponential backoff strategy), and then retries the request.
6. **Other Error Handling:** Any other HTTP errors or unexpected exceptions are immediately re-raised, as they indicate a different kind of issue.
7. **Success:** If the request is successful, the DataFrame is returned.
8. **Failure:** If all retries fail, an exception is raised, indicating that the data could not be retrieved.
9. **Delay:** After a successful API call, it pauses for 1 second (`time.sleep(1)`) to proactively avoid hitting rate limits on subsequent calls.

In [8]:
def get_data(endpoint, filter='', max_retries=5, initial_retry_delay=2):
    """
    Retrieves data from an API endpoint.
    Optionally, applies a filter to select the record to include.

    Args:
        endpoint (str): The endpoint name to query.
        filter (str): The data filter to apply to the request
        max_retries (int): The maximum number of retries to attempt when encountering a
                           429 (Too many requests),
                           500 (Internal Server Error), or
                           503 (Service Unavailable) error.
        initial_retry_delay (int): The initial delay between retries
    """

    retries = 0
    retry_delay = initial_retry_delay
    while retries <= max_retries:
      try:
        if filter:
          response = urlopen(base_url + endpoint + '?' + filter)
        else:
          response = urlopen(base_url + endpoint)
        data = json.loads(response.read().decode('utf-8'))
        df = pd.DataFrame(data)
        time.sleep(1)   # Pause for 1 second to avoid rate limiting
        return df
      except HTTPError as e:
        # 429: Too Many Requests | 500: Internal Server Error | 503: Service Unavailable
        if e.code == 429 or e.code == 500 or e.code == 503:
           print(f"Rate limit hit for {endpoint}?{filter}. Retrying in {retry_delay} seconds (Retry {retries+1}/{max_retries})...")
           time.sleep(retry_delay)
           retries += 1
           retry_delay *= 2   # Exponential backoff
        else:
          raise   # Re-raise other HTTP errors immediately
      except Exception as e:
        print(f"An unexpected error occurred: {e}")
        raise   # Re-raise other unexpected errors

    raise Exception(f"Failed to retrieve data for {endpoint}?{filter} after {max_retries} retries.")

# Function: `save_dataframe_to_csv`
The `save_dataframe_to_csv` function is a utility designed to save a Pandas DataFrame to a CSV file in a smart way.

**Purpose:** It saves a DataFrame to a specified file path. Its key feature is that it can either create a new file (including headers) or append data to an existing file (without adding duplicate headers).

**Parameters:**

- `df` (pd.DataFrame): This is the Pandas DataFrame that you want to save.
- `path` (str): This is the full file path (including the file name and extension, e.g., `/content/drive/MyDrive/data.csv`) where the CSV should be saved.

**Functionality:**

1. **Check for Existing File:** It first checks if a file already exists at the given `path` using `os.path.exists(path)`.
2. **Append Data:** If the file does exist, it appends the DataFrame's data to the end of that file. It uses `mode='a'` (append mode) and `header=False` (to avoid writing headers again) in the `df.to_csv()` call.
3. **Create New File:** If the file does not exist, it creates a new CSV file at the specified path. It uses `mode='w'` (write mode, which creates a new file or overwrites an existing one) and `header=True` (to include the column headers in the first row of the new file).
4. **Confirmation Message:** After saving, it prints a message indicating whether rows were appended to an existing file or a new file was created, along with the number of rows processed.

In [None]:
def save_dataframe_to_csv(df, path):
    """
    Saves a DataFrame to a CSV file. If the file exists, it appends the data without headers.
    Otherwise, it creates a new file with headers.

    Args:
        df (pd.DataFrame): The DataFrame to save.
        path (str): The file path where the CSV should be saved.
    """
    if os.path.exists(path):
        df.to_csv(path, mode='a', header=False, index=False)
        print(f"Appended {len(df)} rows to existing file: {path}")
    else:
        df.to_csv(path, mode='w', header=True, index=False)
        print(f"Created new file and wrote {len(df)} rows: {path}")

print("Defined save_dataframe_to_csv_smart function.")

Defined save_dataframe_to_csv_smart function.


# Meetings
First get the complete data set with all **Meetings** where each meeting is either a testing or racing event covering multiple sessions and days.\
Save the data to a CSV file replacing any previous file.

In [None]:
meetings = get_data('meetings')
meetings.to_csv(meetings_path, index=False)

# Qualifying sessions
Second, get all of the qualifying sessions. This includes both the sprint qualifying and the qualifying for the feature race. This always gets every session and does not check the existing data storage first.\
Save the data to a CSV file replacing any previous file.



In [None]:
selected_session_type = "Qualifying" # @param ["Practice","Qualifying","Race","All"]

if selected_session_type == "All":
    sessions_filter = ''
else:
    sessions_filter = f'session_type={selected_session_type}'

sessions_filter = sessions_filter.replace(' ', '')   # Remove whitespace

sessions = get_data(endpoint = 'sessions',
                    filter = sessions_filter)

# Convert the date columns to datetime objects for comparison
sessions['date_start'] = pd.to_datetime(sessions['date_start'])
sessions['date_end'] = pd.to_datetime(sessions['date_end'])
# Remove all session where the end time is after the current time
sessions = sessions[sessions['date_end'] <= pd.Timestamp.now(tz='UTC')]

sessions.to_csv(sessions_path, index=False)

# Create a list of all the unique sessions
# The session_key should be unique in the dataframe to begin with
all_session_keys = sessions['session_key'].unique()
print(f"Total number of sessions: {len(all_session_keys)}")

Total number of sessions: 89


# Drivers

Retrieve the driver information for each session and store it in a local file.
- First check for an existing CSV file and the sessions that are already stored in the file.
- Get the driver information from the API but only from any qualifying sessions that have not been retrieved earlier.
- Then append the newly retrieved data to the existing CSV file.

*This method is slower than just getting the full data set with all drivers from all sessions in a request.\
However, as the number of sessions increase, that might come up against the size limit for each individual response from the API.*




In [None]:
# If the file already exists, read it and
# identify the sessions that have not yet been retrieved

drivers_csv_file_exists = os.path.exists(drivers_path)

if not drivers_csv_file_exists:
    # Create an empty dataframe
    drivers_from_csv = []
    retrieved_session_keys = empty(0)
else:
    # Load the data from the file
    drivers_from_csv = pd.read_csv(drivers_path)
    # Get a list of unique sessions
    retrieved_session_keys = drivers_from_csv['session_key'].unique()

# Identify the sessions that do not appear in the 'Drivers' data file
# The drivers from missing sessions will need to be retrived
unretrieved_session_keys = list(set(all_session_keys) - set(retrieved_session_keys))

print(f"Number of sessions already retrieved: {len(retrieved_session_keys)}")
print(f"Number of sessions to be retrieved: {len(unretrieved_session_keys)}")

Number of sessions already retrieved: 359
Number of sessions to be retrieved: 1


In [None]:
drivers = []

# Get the laps from every session not already retrieved
for session_key in unretrieved_session_keys:
    drivers_by_session = get_data('drivers', f'session_key={session_key}')
    drivers.append(drivers_by_session)

print(f"Fetched driver data for {len(drivers)} sessions.")

Fetched driver data for 1 sessions.


In [None]:
if drivers:
    # The laps dataframe is a dataframe of dataframes
    # This collapses the dataframes
    drivers = pd.concat(drivers, ignore_index=True)
    save_dataframe_to_csv(df=drivers, path=drivers_path)

else:
    print("No new lap data to append or save.")
    laps = pd.DataFrame() # Ensure laps is always a DataFrame, even if empty

Appended 0 rows to existing file: /content/drive/MyDrive/Data Science/[02] Articles - Formula 1 [Work-in-progress]/Data/drivers.csv


# Laps
Retrieve the Lap data and store it in a local file before identifying the fastest qualifying laps for each driver in each session.
- First check for an existing CSV file and the laps that are already stored in the file.
- Get the lap data from the API but only from any qualifying sessions that have not been retrieved earlier.
- Then append the newly retrieved data to the existing CSV file.



In [None]:
# If the file already exists, read it and
# identify the sessions that have not yet been retrieved

laps_csv_file_exists = os.path.exists(laps_path)

if not laps_csv_file_exists:
    # Create an empty dataframe
    laps_from_csv = []
    retrieved_session_keys = empty(0)
else:
    # Load the data from the file
    laps_from_csv = pd.read_csv(laps_path)
    # Get a list of unique sessions
    retrieved_session_keys = laps_from_csv['session_key'].unique()

# Identify the sessions that do not appear in the 'Laps' data file
# The laps from missing sessions will need to be retrived
unretrieved_session_keys = list(set(all_session_keys) - set(retrieved_session_keys))

print(f"Number of sessions already retrieved: {len(retrieved_session_keys)}")
print(f"Number of sessions to be retrieved: {len(unretrieved_session_keys)}")
# The lap data from the qualifying in Baku in 2025 is missing from OpenF1.org
# and the program will try to pick it up every time it is executed

Number of sessions already retrieved: 14
Number of sessions to be retrieved: 75


In [None]:
laps = []

# Get the laps from every session not already retrieved
for session_key in unretrieved_session_keys:
    laps_by_session = get_data('laps', f'session_key={session_key}')
    laps.append(laps_by_session)

print(f"Fetched lap data for {len(laps)} sessions.")

Fetched lap data for 75 sessions.


In [None]:
if laps:
    # The laps dataframe is a dataframe of dataframes
    # This collapses the dataframes
    laps = pd.concat(laps, ignore_index=True)
    save_dataframe_to_csv(laps, laps_path)

else:
    print("No new lap data to append or save.")
    laps = pd.DataFrame() # Ensure laps is always a DataFrame, even if empty

Appended 22402 rows to existing file: /content/drive/MyDrive/Data Science/[02] Articles - Formula 1 [Work-in-progress]/Data/qualifying_laps.csv


# Identify the fastest lap from each driver in every session

To find the fastest lap for each driver in each session, we need to:
1. Ensure the `laps` DataFrame contains data.\
If `laps` is currently a list of DataFrames, we will concatenate it into a single DataFrame.
2. Group the DataFrame by `session_key` and `driver_number`.
3. For each group, find the row with the minimum `lap_duration`.


In [None]:
if not laps.empty:
    # Create a copy to avoid SettingWithCopyWarning and resets the index
    fastest_laps = laps.copy().reset_index(drop=True)

    # Convert is_pit_out_lap to boolean and drop rows where is_pit_out_lap is True
    fastest_laps['is_pit_out_lap'] = fastest_laps['is_pit_out_lap'].astype(bool)
    fastest_laps = fastest_laps[fastest_laps['is_pit_out_lap'] == False]
    # Ensure 'date_start' is datetime and 'lap_duration' is numeric
    fastest_laps['date_start'] = pd.to_datetime(fastest_laps['date_start'],
                                                format='ISO8601')
    fastest_laps['lap_duration'] = pd.to_numeric(fastest_laps['lap_duration'],
                                                 errors='coerce')
    # Drop rows where lap_duration is NaN (couldn't be converted)
    fastest_laps.dropna(subset=['lap_duration'], inplace=True)

    # Calculate 'date_end' by adding 'lap_duration' to 'date_start'
    fastest_laps['date_end'] = fastest_laps['date_start'] \
                               + pd.to_timedelta(fastest_laps['lap_duration'], unit='s')

    # Find the fastest lap for each driver in each session
    fastest_laps = fastest_laps.loc[fastest_laps.groupby(['session_key', 'driver_number'])['lap_duration'].idxmin()]

    print(f"Found {len(fastest_laps)} fastest laps.")

    save_dataframe_to_csv(fastest_laps, fastest_laps_path)

else:
    fastest_laps = pd.DataFrame() # Ensure fastest_laps is always a DataFrame, even if empty
    print("No lap data available to process.")

Found 1454 fastest laps.
Appended 1454 rows to existing file: /content/drive/MyDrive/Data Science/[02] Articles - Formula 1 [Work-in-progress]/Data/qualifying_fastest_laps.csv


# Get a complete list of all qualifying laps

The complete list of qualifying laps will be used to get the car and location data for just those laps.

1. Combine the data from the CSV file with any newly
retrieved qualifying laps
2. Get evey unique combination of the session and the driver
3. Merge in the start and end time for each of the fastest qualifying laps



In [None]:
# Combine the laps from the CSV file with the newly retrieved laps
# to get a complete list
all_qualifying_laps = pd.concat([laps_from_csv, laps], ignore_index=True)

# Create a list of all session and driver combinations
unique_laps = all_qualifying_laps[['session_key','driver_number']].drop_duplicates()

if not unique_laps.empty and not fastest_laps.empty:
    # Select only the necessary columns from fastest_laps to merge
    fastest_laps_for_merge = fastest_laps[['session_key', 'driver_number', 'date_start', 'date_end']]

    # Merge unique_laps_keys with these selected columns
    unique_laps = pd.merge(
        unique_laps,
        fastest_laps_for_merge,
        on=['session_key', 'driver_number'],
        how='left'
    )

    print("Added 'date_start' and 'date_end' to unique_laps_keys.")
    display(unique_laps.head())
else:
    print("Either unique_laps or fastest_laps DataFrame is empty, cannot add dates.")

Added 'date_start' and 'date_end' to unique_laps_keys.


Unnamed: 0,session_key,driver_number,date_start,date_end
0,9608,23,NaT,NaT
1,9608,24,NaT,NaT
2,9608,10,NaT,NaT
3,9608,31,NaT,NaT
4,9608,77,NaT,NaT


# Car data

First check for an existing CSV file and the car data that are already stored in the file.

Get the car data for each of the fastst qualifying laps from the API but only from any session and driver combination that have not been retrieved earlier.

Then append the newly retrieved data to the existing CSV file.

In [None]:
# Identify the qualifying laps for which the
# car telemetry has not yet been retrieved

car_data_csv_file_exists = os.path.exists(car_data_path)

if not car_data_csv_file_exists:
    # Create an empty dataframe with the expected columns
    car_data_from_csv = pd.DataFrame(columns=['session_key', 'driver_number'])
    retrieved_car_data_laps = pd.DataFrame(columns=['session_key', 'driver_number'])
else:
    # Load the data from the file
    car_data_from_csv = pd.read_csv(car_data_path)
    # Get a list of unique session_key and driver_number combinations that have car data
    retrieved_car_data_laps = car_data_from_csv[['session_key','driver_number']].drop_duplicates()

# Convert DataFrames to sets of tuples for efficient comparison
# This creates a unique identifier for each session-driver combination (session_key, driver_number)
set_unique_laps = set(tuple(row) for row in unique_laps[['session_key', 'driver_number']].values)
set_retrieved_car_data_laps = set(tuple(row) for row in retrieved_car_data_laps.values)

# Identify the unique session_key and driver_number combinations that have not been retrieved
unretrieved_combinations_set = set_unique_laps - set_retrieved_car_data_laps

# Convert the set of unretrieved combinations back into a DataFrame
unretrieved_car_data_laps = pd.DataFrame(list(unretrieved_combinations_set), columns=['session_key', 'driver_number'])


if not unretrieved_car_data_laps.empty and not fastest_laps.empty:
    # Now, merge with the unique_laps DataFrame to get the date_start and
    # date_end for these unretrieved combinations
    unretrieved_car_data_laps = pd.merge(
        unretrieved_car_data_laps,
        unique_laps[['session_key', 'driver_number', 'date_start', 'date_end']],
        on=['session_key', 'driver_number'],
        how='left'
    )
    # Remove records that do not have both a date_start and a date_end
    unretrieved_car_data_laps.dropna(subset=['date_start', 'date_end'], inplace=True)
else:
    unretrieved_car_data_laps = pd.DataFrame(columns=['session_key', 'driver_number', 'date_start', 'date_end'])

print(f"Number of laps already retrieved: {len(retrieved_car_data_laps)}")
print(f"Number of laps to be retrieved: {len(unretrieved_car_data_laps)}")

Number of laps already retrieved: 1268
Number of laps to be retrieved: 1454


In [None]:
# Request the car telemetry for each of the remaining laps

car_data = []

# Iterate through each of the missing laps
for index, row in unretrieved_car_data_laps.iterrows():

    # Get the session and driver number from the current row
    session_key = row['session_key']
    driver_number = row['driver_number']

    # date_start and date_end columns are already datetime objects from previous steps
    # We need to ensure they are UTC and then format them.
    date_start_utc = row['date_start'].tz_convert('UTC')
    date_end_utc = row['date_end'].tz_convert('UTC')

    formatted_date_start = date_start_utc.strftime('%Y-%m-%dT%H:%M:%SZ')
    formatted_date_end = date_end_utc.strftime('%Y-%m-%dT%H:%M:%SZ')

    data_filter = f"session_key={session_key}& \
                    driver_number={driver_number}& \
                    date>={formatted_date_start}& \
                    date<={formatted_date_end}"
    data_filter = data_filter.replace(' ', '')

    # Fetching 'car_data'
    car_data_by_session_driver = get_data('car_data', data_filter)
    if not car_data_by_session_driver.empty:
        car_data.append(car_data_by_session_driver)

In [None]:
# Save the newly retrieved car telemtry to a file

if car_data:
    # The 'car_data' dataframe is a dataframe of dataframes
    # This collapses the dataframes
    car_data = pd.concat(car_data, ignore_index=True)
    save_dataframe_to_csv(df=car_data, path=car_data_path)

else:
    print("No new car data to append or save.")

Appended 453053 rows to existing file: /content/drive/MyDrive/Data Science/[02] Articles - Formula 1 [Work-in-progress]/Data/car_data.csv


# Location

First check for an existing CSV file and the location data that are already stored in the file.

Get the location data for each of the fastst qualifying laps from the API but only from any session and driver combination that have not been retrieved earlier.

Then append the newly retrieved data to the existing CSV file.

In [None]:
# Identify the qualifying laps for which the
# car location has not yet been retrieved

location_csv_file_exists = os.path.exists(car_data_path)

if not location_csv_file_exists:
    # Create an empty dataframe with the expected columns
    location_from_csv = pd.DataFrame(columns=['session_key', 'driver_number'])
    retrieved_location_laps = pd.DataFrame(columns=['session_key', 'driver_number'])
else:
    # Load the data from the file
    location_from_csv = pd.read_csv(location_path)
    # Get a list of unique session_key and driver_number combinations that have car data
    retrieved_location_laps = location_from_csv[['session_key','driver_number']].drop_duplicates()

# Convert DataFrames to sets of tuples for efficient comparison
# This creates a unique identifier for each session-driver combination (session_key, driver_number)
set_unique_laps = set(tuple(row) for row in unique_laps[['session_key', 'driver_number']].values)
set_retrieved_location_laps = set(tuple(row) for row in retrieved_location_laps.values)

# Identify the unique session_key and driver_number combinations that have not been retrieved
unretrieved_combinations_set = set_unique_laps - set_retrieved_location_laps

# Convert the set of unretrieved combinations back into a DataFrame
unretrieved_location_laps = pd.DataFrame(list(unretrieved_combinations_set), columns=['session_key', 'driver_number'])

if not unretrieved_location_laps.empty and not fastest_laps.empty:
    # Now, merge with the unique_laps DataFrame to get the date_start and
    # date_end for these unretrieved combinations
    unretrieved_location_laps = pd.merge(
        unretrieved_location_laps,
        unique_laps[['session_key', 'driver_number', 'date_start', 'date_end']],
        on=['session_key', 'driver_number'],
        how='left'
    )
    # Remove records that do not have both a date_start and a date_end
    unretrieved_location_laps.dropna(subset=['date_start', 'date_end'], inplace=True)
else:
    unretrieved_location_laps = pd.DataFrame(columns=['session_key', 'driver_number', 'date_start', 'date_end'])

print(f"Number of laps already retrieved: {len(retrieved_location_laps)}")
print(f"Number of laps to be retrieved: {len(unretrieved_location_laps)}")

Number of laps already retrieved: 998
Number of laps to be retrieved: 1454


In [None]:
# Request the car location for each of the remaining laps

location = []

# Iterate through each of the missing laps
for index, row in unretrieved_location_laps.iterrows():
    # Get the session and driver number from the current row
    session_key = row['session_key']
    driver_number = row['driver_number']

    # date_start and date_end columns are already datetime objects from previous steps
    # We need to ensure they are UTC and then format them.
    date_start_utc = row['date_start'].tz_convert('UTC')
    date_end_utc = row['date_end'].tz_convert('UTC')

    formatted_date_start = date_start_utc.strftime('%Y-%m-%dT%H:%M:%SZ')
    formatted_date_end = date_end_utc.strftime('%Y-%m-%dT%H:%M:%SZ')

    data_filter = f"session_key={session_key}& \
                    driver_number={driver_number}& \
                    date>={formatted_date_start}& \
                    date<={formatted_date_end}"
    data_filter = data_filter.replace(' ', '')

    # Fetching 'location'
    location_by_session_driver = get_data('location', data_filter)
    if not location_by_session_driver.empty:
        location.append(location_by_session_driver)

In [None]:
# Save the newly retrieved car telemtry to a file

if location:
    # The 'location' dataframe is a dataframe of dataframes
    # This collapses the dataframes
    location = pd.concat(location, ignore_index=True)
    save_dataframe_to_csv(df=location, path=location_path)

else:
    print("No new location data to append or save.")