# Formula One Insights with Pyton & SQL
Since its inception in the 1950s, Formula One has represented the pinnacle of global motorsport, pushing the boundaries of racing and automotive engineering. This analysis leverages Python and SQL to uncover insights into the achievements of drivers and constructors across F1's decades-long history.

*For a more detailed exploration, please refer to the accompanying PDF document. The comments within this Jupyter notebook are provided exclusively to explain the functionality of the code.*
*The values and data in this Jupyter Notebook were last updated on the 17th of January 2025.*

# Organizing the Notebook into Multiple Parts
Due to the extensive code in this notebook, it has been divided into five parts. The first notebook focuses on retrieving data from the formula1.com website. The second notebook handles data retrieval from the F1DB database. The third notebook is dedicated to creating statistics and visualizations. The fourth notebook explores the question of who is the Greatest Driver of All Time. The fifth notebook consolidates multiple CSV files into separate Excel worksheets.

# Data Retrieval from formula1.com

# Importing Python Libraries
This Jupyter notebook is designed to run on most modern Python installations. However, to ensure reproducibility, note that it was developed and tested with Python 3.12.3. The following libraries and their respective versions were used in this analysis:

- bs4 0.0.2
- pandas 2.2.2
- pycountry 24.6.1
- requests 2.32.2
- tqdm 4.66.4

In [1]:
# Import libraries
import os
import unicodedata

import pandas as pd
import pycountry
import requests

from bs4 import BeautifulSoup
from tqdm import tqdm

print("Libraries imported")

Libraries imported


# Creating the Working Directories

This step creates several directories to organize and manage the retrieved data efficiently.

In [2]:
# Create working directories
charts_dir = "charts"
os.makedirs(charts_dir, exist_ok=True)
csv_dir = "csv"
os.makedirs(csv_dir, exist_ok=True)
f1db_dir = "f1db"
os.makedirs(f1db_dir, exist_ok=True)
maps_dir = "maps"
os.makedirs(maps_dir, exist_ok=True)
results_dir = "results"
os.makedirs(results_dir, exist_ok=True)

print("Working directories created")

Working directories created


# Data Collection - Retrieving Race Winners Data from F1 Website

We use BeautifulSoup to scrape race winners data from the official Formula One website. The scraped data undergoes initial processing, including consolidating the winner's name into a single column for easier analysis.

The data is retrieved from the following URL: https://formula1.com/en/results/2024/races

In [3]:
# Initialize the URL template and the empty DataFrame
url_template = "https://formula1.com/en/results/{}/races"
columns = ["Season", "Round", "GrandPrix", "Date", "Winner", "Car", "Laps", "Time"]
race_winners = pd.DataFrame(columns=columns)

# Years to scrape
years = range(2024, 1949, -1)

# Fetching data
with tqdm(total=len(years), desc="Fetching F1 race winners data", bar_format="{l_bar}{bar}| {percentage:3.0f}%", colour="white") as pbar:
    for year in years:
        url = url_template.format(year)
        try:
            response = requests.get(url)
            response.raise_for_status()
            soup = BeautifulSoup(response.content, 'html.parser')
            
            # Locate the table
            race_table = soup.find("table", class_="f1-table f1-table-with-data w-full")
            
            if race_table:
            
                # Extract rows of the table from the <tbody> section
                rows = race_table.find("tbody").find_all("tr")
                
                # Initialize round for the current season
                round = 1
                
                # Iterate over rows and populate data
                for row in rows:
                    columns = row.find_all("td")
                    if len(columns) == 6:
                        
                        # Extract winner's first and last name
                        winner_first_name = columns[2].find("span", class_="max-desktop:hidden").text.strip()
                        winner_last_name = columns[2].find("span", class_="max-tablet:hidden").text.strip()
                        winner_full_name = f"{winner_first_name} {winner_last_name}"
    
                        # Extract race data
                        race_data = [
                            year,                     # Season
                            round,                    # Round
                            columns[0].text.strip(),  # GrandPrix
                            columns[1].text.strip(),  # Date
                            winner_full_name,         # Winner (First and Last Name)
                            columns[3].text.strip(),  # Car
                            columns[4].text.strip(),  # Laps
                            columns[5].text.strip()   # Time
                        ]
                        race_winners.loc[len(race_winners)] = race_data
                        
                        round += 1
            else:
                print(f"No table found for {year}.")
                
        except Exception as e:
            print(f"Error fetching data for {year}: {e}")
        finally:
            pbar.update(1)

# Convert Round and Laps columns to numeric
race_winners['Round'] = pd.to_numeric(race_winners['Round'], errors='coerce')
race_winners['Laps'] = pd.to_numeric(race_winners['Laps'], errors='coerce')

# Convert the Date column to a datetime format
race_winners['Date'] = pd.to_datetime(race_winners['Date'], format='%d %b %Y')

# Change the name of the Winner column
race_winners = race_winners.rename(columns={"Winner": "Driver"})

# Sort the dataset first by Season and then by Round
race_winners = race_winners.sort_values(by=['Season', 'Round']).reset_index(drop=True)

# Save the DataFrame to CSV
output_csv = os.path.join(csv_dir, "race_winners.csv")
race_winners.to_csv(output_csv, index=False, encoding='utf-8')
print(f"Data saved to {output_csv}", end="")

Fetching F1 race winners data: 100%|[37m██████████[0m| 100%

Data saved to csv/race_winners.csv




# Data Collection - Retrieving Driver Standings Data from F1 Website

We use BeautifulSoup to scrape driver standings data from the official Formula One website. The country names on the F1 page are provided as three-letter codes, which we convert to full country names using the *pycountry* library. A few codes are not automatically converted due to alternative shorthand usage, so we handle these cases with a manual mapping. Additionally, we consolidate the driver names into a single column for consistency and ease of analysis.

The data is retrieved from the following URL: https://formula1.com/en/results/2024/drivers

In [4]:
# Initialize the URL template and the empty DataFrame
url_template = "https://formula1.com/en/results/{}/drivers"
columns = ["Season", "Position", "Driver", "Nationality", "Car", "Points"]
driver_standings = pd.DataFrame(columns=columns)

# Convert 3-letter country codes to full country names
manual_country_mapping = {
    "CHI": "Chile",
    "DEN": "Denmark",
    "GER": "Germany",
    "INA": "Indonesia",
    "MAS": "Malaysia",
    "MON": "Monaco",
    "NED": "Netherlands",
    "POR": "Portugal",
    "RAF": "Russian Federation",
    "RHO": "Rhodesia",
    "RSA": "South Africa",
    "SUI": "Switzerland"    
}

def get_full_country_name(code):
    if code in manual_country_mapping:
        return manual_country_mapping[code]
    try:
        return pycountry.countries.get(alpha_3=code).name
    except AttributeError:
        return code

# Years to scrape
years = range(2024, 1949, -1)

# Fetching data
with tqdm(total=len(years), desc="Fetching F1 driver standings data", bar_format="{l_bar}{bar}| {percentage:3.0f}%", colour="white") as pbar:
    for year in years:
        url = url_template.format(year)
        try:
            response = requests.get(url)
            response.raise_for_status()
            soup = BeautifulSoup(response.content, 'html.parser')
            
            # Locate the table
            standings_table = soup.find("table", class_="f1-table f1-table-with-data w-full")
            
            if standings_table:
                
                # Extract rows of the table from the <tbody> section
                rows = standings_table.find("tbody").find_all("tr")
                
                # Iterate over rows and populate data
                for row in rows:
                    columns = row.find_all("td")
                    if len(columns) == 5:
                        
                        # Extract driver's first and last name
                        driver_first_name = columns[1].find("span", class_="max-desktop:hidden").text.strip()
                        driver_last_name = columns[1].find("span", class_="max-tablet:hidden").text.strip()
                        driver_full_name = f"{driver_first_name} {driver_last_name}"
                        
                        # Convert nationality to full country name
                        full_country_name = get_full_country_name(columns[2].text.strip())

                        # Extract driver data
                        driver_data = [
                            year,                     # Season
                            columns[0].text.strip(),  # Position
                            driver_full_name,         # Driver (First and Last Name)
                            full_country_name,        # Full Nationality
                            columns[3].text.strip(),  # Car
                            columns[4].text.strip(),  # Points
                        ]
                        driver_standings.loc[len(driver_standings)] = driver_data
                        
            else:
                print(f"No table found for {year}.")
                
        except Exception as e:
            print(f"Error fetching data for {year}: {e}")
        finally:
            pbar.update(1)

# Manually modify a few country entries
country_corrections = {
    "Russian Federation": "Russia",
    "Venezuela, Bolivarian Republic of": "Venezuela"
}
driver_standings['Nationality'] = driver_standings['Nationality'].replace(country_corrections)

# Convert the Position and Points columns to numeric
driver_standings['Position'] = pd.to_numeric(driver_standings['Position'], errors='coerce')
driver_standings['Points'] = pd.to_numeric(driver_standings['Points'], errors='coerce')

# Convert the Season column to datetime and extract only the year
driver_standings['Season'] = pd.to_datetime(driver_standings['Season'], format='%Y').dt.year

# Change the name of the Nationality column
driver_standings = driver_standings.rename(columns={"Nationality": "Country"})

# Sort the dataset first by Season and then by Position
driver_standings = driver_standings.sort_values(by=['Season', 'Position']).reset_index(drop=True)

# Save the DataFrame to CSV
output_csv = os.path.join(csv_dir, "driver_standings.csv")
driver_standings.to_csv(output_csv, index=False, encoding='utf-8')
print(f"Data saved to {output_csv}", end="")

Fetching F1 driver standings data: 100%|[37m██████████[0m| 100%

Data saved to csv/driver_standings.csv




# Data Collection - Retrieving Constructor Standings Data from F1 Website

We use BeautifulSoup to scrape constructor standings data from the official Formula One website. 

The data is retrieved from the following URL: https://formula1.com/en/results/2024/team

In [5]:
# Initialize the URL template and the empty DataFrame
url_template = "https://formula1.com/en/results/{}/team"
columns = ["Season","Position", "Constructor", "Points"]
constructor_standings = pd.DataFrame(columns=columns)

# Years to scrape
years = range(2024, 1957, -1) # Constructor's championship began in 1958

# Fetching data
with tqdm(total=len(years), desc="Fetching F1 constructor standings data", bar_format="{l_bar}{bar}| {percentage:3.0f}%", colour="white") as pbar:
    for year in years:
        url = url_template.format(year)
        try:
            response = requests.get(url)
            response.raise_for_status()
            soup = BeautifulSoup(response.content, 'html.parser')
            
            # Locate the table
            standings_table = soup.select_one("table.f1-table.f1-table-with-data.w-full")
            
            if standings_table:
                
                # Extract rows of the table from the <tbody> section
                rows = standings_table.find("tbody").find_all("tr")
                
                # Iterate over rows and populate data
                for row in rows:
                    columns = row.find_all("td")
                    if len(columns) == 3:

                        # Extract constructor data
                        constructor_data = [
                            year,                     # Season
                            columns[0].text.strip(),  # Position
                            columns[1].text.strip(),  # Constructor
                            columns[2].text.strip(),  # Points
                        ]
                        constructor_standings.loc[len(constructor_standings)] = constructor_data
                        
            else:
                print(f"No table found for {year}.")
                
        except Exception as e:
            print(f"Error fetching data for {year}: {e}")
        finally:
            pbar.update(1)

# Convert the Position and Points columns to numeric
driver_standings['Position'] = pd.to_numeric(driver_standings['Position'], errors='coerce')
driver_standings['Points'] = pd.to_numeric(driver_standings['Points'], errors='coerce')

# Convert the Season column to datetime and extract only the year
constructor_standings['Season'] = pd.to_datetime(constructor_standings['Season'], format='%Y').dt.year

# Sort the dataset first by Season and then by Position
constructor_standings = constructor_standings.sort_values(by=['Season', 'Position']).reset_index(drop=True)

# Save the DataFrame to CSV
output_csv = os.path.join(csv_dir, "constructor_standings.csv")
constructor_standings.to_csv(output_csv, index=False, encoding='utf-8')
print(f"Data saved to {output_csv}", end="")

Fetching F1 constructor standings data: 100%|[37m██████████[0m| 100%

Data saved to csv/constructor_standings.csv




# Data Collection - Retrieving Fastest Laps Data from F1 Website

We use BeautifulSoup to scrape fastest laps data from the official Formula One website.

The data is retrieved from the following URL: https://formula1.com/en/results/2024/fastest-laps

In [6]:
# Get F1 Fastest Laps Data

# Initialize the URL template and empty DataFrame
url_template = "https://formula1.com/en/results/{}/fastest-laps"
columns = ["Season", "Round", "GrandPrix", "Driver", "Car", "LapTime"]
fastest_laps = pd.DataFrame(columns=columns)

# Years to scrape
years = range(2024, 1949, -1)

# Fetching data
with tqdm(total=len(years), desc="Fetching Fastest Laps Data", bar_format="{l_bar}{bar}| {percentage:3.0f}%", colour="white") as pbar:
    for year in years:
        url = url_template.format(year)
        try:
            response = requests.get(url)
            response.raise_for_status()
            soup = BeautifulSoup(response.content, 'html.parser')
            
            # Locate the table
            lap_table = soup.find("table", class_="f1-table f1-table-with-data w-full")
            
            if lap_table:
            
                # Extract rows of the table from the <tbody> section
                rows = lap_table.find("tbody").find_all("tr")
                
                # Initialize round for the current season
                round = 1
                
                # Iterate over rows and populate data
                for row in rows:
                    columns = row.find_all("td")
                    if len(columns) == 4:
                        
                        # Extract driver's first and last name
                        driver_first_name = columns[1].find("span", class_="max-desktop:hidden").text.strip()
                        driver_last_name = columns[1].find("span", class_="max-tablet:hidden").text.strip()
                        driver_full_name = f"{driver_first_name} {driver_last_name}"

                        # Extract fastest laps data
                        lap_data = [
                            year,                     # Season
                            round,                    # Round
                            columns[0].text.strip(),  # GrandPrix
                            driver_full_name,         # Driver
                            columns[2].text.strip(),  # Car
                            columns[3].text.strip()   # LapTime
                        ]
                        fastest_laps.loc[len(fastest_laps)] = lap_data
                        round += 1
                        
            else:
                print(f"No table found for {year}.")
                
        except Exception as e:
            print(f"Error fetching data for {year}: {e}")
        finally:
            pbar.update(1)

# Convert Round column to numeric
fastest_laps['Round'] = pd.to_numeric(fastest_laps['Round'], errors='coerce')

# Convert the Season column to datetime and extract only the year
fastest_laps['Season'] = pd.to_datetime(fastest_laps['Season'], format='%Y').dt.year

# Sort the dataset first by Season and then by Round
fastest_laps = fastest_laps.sort_values(by=['Season', 'Round']).reset_index(drop=True)

# Save the DataFrame to CSV
output_csv = os.path.join(csv_dir, "fastest_laps.csv")
fastest_laps.to_csv(output_csv, index=False, encoding='utf-8')
print(f"Data saved to {output_csv}", end="")

Fetching Fastest Laps Data: 100%|[37m██████████[0m| 100%

Data saved to csv/fastest_laps.csv




# Data Cleaning and Preprocessing - Normalizing Text

Some drivers' and constructors' names contain non-Latin characters, which may cause issues in later stages of analysis. To address this, we normalize all text by converting it into Unicode format to ensure consistency and compatibility.

In [7]:
# List of CSV files to edit
csv_files = [
    "csv/race_winners.csv",
    "csv/driver_standings.csv",
    "csv/constructor_standings.csv",
    "csv/fastest_laps.csv"
]

# Function to normalize text
def normalize_text(text):
    if isinstance(text, str):
        return unicodedata.normalize('NFKD', text).encode('ASCII', 'ignore').decode('ASCII')
    return text

# Process each file
for file_path in csv_files:
    if os.path.exists(file_path):
        data = pd.read_csv(file_path)

        # Normalize all string columns
        for column in data.columns:
            if data[column].dtype == 'object':
                data[column] = data[column].apply(normalize_text)

        # Save the updated file back
        data.to_csv(file_path, index=False)
        print(f"Processed file: {file_path}")

Processed file: csv/race_winners.csv
Processed file: csv/driver_standings.csv
Processed file: csv/constructor_standings.csv
Processed file: csv/fastest_laps.csv


# Data Cleaning and Preprocessing - Manually Correcting the Retrieved Datasets

We apply manual corrections to the retrieved data. These adjustments include fixing a few incorrect timestamp values, correcting position values for constructors excluded from the championship, and rectifying some lap time discrepancies.

In [8]:
# Correct Time values in the race_winners dataset
race_winners_df = pd.read_csv("csv/race_winners.csv")
corrections = {
    "42:53.700": "0:42:53.700",
    "57:56.690": "0:57:56.690",
    "24:34.899": "0:24:34.899",
    "55:30.622": "0:55:30.622",
    "3:27.071" : "0:03:27.071"
}
race_winners_df['Time'] = race_winners_df['Time'].replace(corrections)
race_winners_df.to_csv("csv/race_winners.csv", index=False)
print("Updated race_winners.csv with the corrections")

# Correct Position values for the excluded constructors in 2007 and 2018 in the constructor_standings
constructor_standings_df = pd.read_csv("csv/constructor_standings.csv")
constructor_standings_df['Position'] = pd.to_numeric(constructor_standings_df['Position'], errors='coerce')
constructor_standings_df = constructor_standings_df.sort_values(by=['Season', 'Position']).reset_index(drop=True)
constructor_standings_df.to_csv("csv/constructor_standings.csv", index=False)
print("Updated constructor_standings.csv with the corrections")

# Correct Lap Time values in the fastest_laps dataset
fastest_laps_df = pd.read_csv("csv/fastest_laps.csv")
fastest_laps_df['LapTime'] = fastest_laps_df['LapTime'].replace("55.404", "0:55.404")
fastest_laps_df.to_csv("csv/fastest_laps.csv", index=False)
print("Updated fastest_laps.csv with the corrections")

Updated race_winners.csv with the corrections
Updated constructor_standings.csv with the corrections
Updated fastest_laps.csv with the corrections


# Data Cleaning and Preprocessing - Manually Editing the Column Names

We rename the "Car" column to "Constructor" across multiple datasets to ensure consistency and clarity in the data.

In [9]:
# List of CSV files to edit
csv_files = [
    "csv/race_winners.csv",
    "csv/driver_standings.csv",
    "csv/fastest_laps.csv"
]

# Iterate through each file and rename the column
for csv_file in csv_files:
    # Load the CSV file
    df = pd.read_csv(csv_file)
    
    # Check if "Car" exists in the columns
    if 'Car' in df.columns:
        # Rename the column "Car" to "Constructor"
        df.rename(columns={"Car": "Constructor"}, inplace=True)
        
        # Save the updated CSV file back to the same path
        df.to_csv(csv_file, index=False)
        
        print(f"Renamed 'Car' to 'Constructor' in {csv_file}")
    else:
        print(f"No column 'Car' found in {csv_file}")

Renamed 'Car' to 'Constructor' in csv/race_winners.csv
Renamed 'Car' to 'Constructor' in csv/driver_standings.csv
Renamed 'Car' to 'Constructor' in csv/fastest_laps.csv


# Data Cleaning and Preprocessing - Manually Editing Constructor Names in the Race Winners Dataset

In Formula One, team names often change over the years due to factors such as different engine manufacturers, sponsorship agreements, or other reasons. Despite these changes, many of these teams remain the same entities. To ensure consistency, we manually standardize the constructor names across the dataset.

While the specific changes are too numerous to detail here, this step relies on extensive domain knowledge of the sport's history, as each constructor has a unique legacy and identity.

In [10]:
# CSV file to edit
csv_file = "csv/race_winners.csv"

# Dictionary of original and replacement values
constructor_replacements = {
    "AlphaTauri Honda": "Toro Rosso",
    "Alpine Renault": "Renault",
    "Benetton BMW": "Benetton",
    "Benetton Ford": "Benetton",
    "Benetton Renault": "Benetton",
    "Brabham Alfa Romeo": "Brabham",
    "Brabham BMW": "Brabham",
    "Brabham Climax": "Brabham",
    "Brabham Ford": "Brabham",
    "Brabham Repco": "Brabham",
    "Brawn Mercedes": "Brawn",
    "Cooper Climax": "Cooper",
    "Cooper Maserati": "Cooper",
    "Eagle Weslake": "Eagle",
    "Epperly Offenhauser": "Epperly",
    "Hesketh Ford": "Hesketh",
    "Jordan Ford": "Jordan",
    "Jordan Mugen Honda": "Jordan",
    "Kurtis Kraft Offenhauser": "Kurtis Kraft",
    "Kuzma Offenhauser": "Kuzma",
    "Ligier Ford": "Ligier",
    "Ligier Matra": "Ligier",
    "Ligier Mugen Honda": "Ligier",
    "Lotus BRM": "Lotus",
    "Lotus Climax": "Lotus",
    "Lotus Ford": "Lotus",
    "Lotus Honda": "Lotus",
    "Lotus Renault": "Lotus",
    "March Ford": "March",
    "Matra Ford": "Matra",
    "McLaren Ford": "McLaren",
    "McLaren Honda": "McLaren",
    "McLaren Mercedes": "McLaren",
    "McLaren TAG": "McLaren",
    "Mercedes-Benz": "Mercedes",
    "Penske Ford": "Penske",
    "Racing Point BWT Mercedes": "Force India",
    "RBR Renault": "Red Bull",
    "Red Bull Racing Honda RBPT": "Red Bull",
    "Red Bull Racing Honda": "Red Bull",
    "Red Bull Racing RBPT": "Red Bull",
    "Red Bull Racing Renault": "Red Bull",
    "Red Bull Racing TAG Heuer": "Red Bull",
    "Sauber BMW": "Sauber",
    "Shadow Ford": "Shadow",
    "Stewart Ford": "Stewart",
    "STR Ferrari": "Toro Rosso",
    "Tyrrell Ford": "Tyrrell",
    "Watson Offenhauser": "Watson",
    "Williams BMW": "Williams",
    "Williams Ford": "Williams",
    "Williams Honda": "Williams",
    "Williams Renault": "Williams",
    "Wolf Ford": "Wolf",
}

# Load the CSV file
df = pd.read_csv(csv_file)

# Replace values in the "Constructor" column
df["Constructor"] = df["Constructor"].replace(constructor_replacements)

# Save the updated DataFrame back to the file
df.to_csv(csv_file, index=False)

print("Constructor names in race_winners dataset updated successfully")

Constructor names in race_winners dataset updated successfully


# Data Cleaning and Preprocessing - Manually Editing Constructor Names in the Driver Standings Dataset

Similar to the previous step, we standardize constructor names to ensure consistency. This collection includes more constructor names than the race winners dataset, as it accounts for all constructors participating throughout the season, not just those that secured victories.

In [11]:
# CSV file to edit
csv_file = "csv/driver_standings.csv"

# Dictionary of original and replacement values
constructor_replacements = {
    "AGS Ford": "AGS",
    "Alfa Romeo Ferrari": "Alfa Romeo",
    "Alfa Romeo Racing Ferrari": "Sauber",
    "AlphaTauri Honda RBPT": "Toro Rosso",
    "AlphaTauri Honda": "Toro Rosso",
    "AlphaTauri RBPT": "Toro Rosso",
    "Alpine Renault": "Renault",
    "Arrows Asiatech": "Arrows",
    "Arrows BMW": "Arrows",
    "Arrows Cosworth": "Arrows",
    "Arrows Ford": "Arrows",
    "Arrows Megatron": "Arrows",
    "Arrows Supertec": "Arrows",
    "Arrows Yamaha": "Arrows",
    "Aston Martin Aramco Mercedes": "Aston Martin",    
    "Aston Martin Mercedes": "Aston Martin",
    "ATS Ford": "ATS",
    "BAR Honda": "BAR",
    "Benetton BMW": "Benetton",
    "Benetton Ford": "Benetton",
    "Benetton Playlife": "Benetton",
    "Benetton Renault": "Benetton",
    "Brabham Alfa Romeo": "Brabham",
    "Brabham BMW": "Brabham",
    "Brabham BRM": "Brabham",
    "Brabham Climax": "Brabham",
    "Brabham Ford": "Brabham",
    "Brabham Judd": "Brabham",
    "Brabham Repco": "Brabham",
    "Brabham Yamaha": "Brabham",
    "Brawn Mercedes": "Brawn",
    "BRM Climax": "BRM",
    "BRP BRM": "BRP",
    "Caterham Renault": "Caterham",
    "Connaught Alta": "Connaught",
    "Connaught Lea Francis": "Connaught",
    "Cooper Bristol": "Cooper",
    "Cooper BRM": "Cooper",
    "Cooper Castellotti": "Cooper",
    "Cooper Climax": "Cooper",
    "Cooper Maserati": "Cooper",
    "Dallara Ferrari": "Dallara",
    "Dallara Ford": "Dallara",
    "Dallara Judd": "Dallara",
    "Deidt Offenhauser": "Deidt",
    "Eagle Climax": "Eagle",
    "Eagle Weslake": "Eagle",
    "Ensign Ford": "Ensign",
    "Epperly Offenhauser": "Epperly",
    "Fittipaldi Ford": "Fittipaldi",
    "Footwork Ford": "Arrows",
    "Footwork Hart": "Arrows",
    "Footwork Mugen Honda": "Arrows",
    "Force India Ferrari": "Force India",
    "Force India Mercedes": "Force India",
    "Frank Williams Racing Cars/Williams": "Williams",
    "Haas Ferrari": "Haas",
    "Hesketh Ford": "Hesketh",
    "Hill Ford": "Hill",
    "HRT Cosworth": "HRT",
    "Iso Marlboro Ford": "Williams",
    "Jaguar Cosworth": "Jaguar",
    "Jordan Ford": "Jordan",
    "Jordan Hart": "Jordan",
    "Jordan Hart": "Jordan",
    "Jordan Honda": "Jordan",
    "Jordan Mugen Honda": "Jordan",
    "Jordan Peugeot": "Jordan",
    "Jordan Toyota": "Jordan",
    "Jordan Yamaha": "Jordan",
    "Kick Sauber Ferrari": "Sauber",
    "Kurtis Kraft Novi": "Kurtis Kraft",
    "Kurtis Kraft Offenhauser": "Kurtis Kraft",
    "Kuzma Offenhauser": "Kuzma",
    "Larrousse Ford": "Larrousse",
    "Larrousse Lamborghini": "Larrousse",
    "Lesovsky Offenhauser": "Lesovsky",
    "Leyton House Ilmor": "Leyton House",
    "Leyton House Judd": "Leyton House",
    "Ligier Ford": "Ligier",
    "Ligier Matra": "Ligier",
    "Ligier Megatron": "Ligier",
    "Ligier Mugen Honda": "Ligier",
    "Ligier Renault": "Ligier",
    "Lola Climax": "Lola",
    "Lola Ford": "Lola",
    "Lola Lamborghini": "Lola",
    "Lotus BRM": "Lotus",
    "Lotus Climax": "Lotus",
    "Lotus Cosworth": "Caterham",
    "Lotus Ford": "Lotus",
    "Lotus Honda": "Lotus",
    "Lotus Judd": "Lotus",
    "Lotus Lamborghini": "Lotus",
    "Lotus Mercedes": "Lotus",
    "Lotus Mugen Honda": "Lotus",
    "Lotus Renault": "Lotus",
    "March Ford": "March",
    "March Ilmor": "March",
    "March Judd": "March",
    "Marussia Cosworth": "Marussia",
    "Marussia Ferrari": "Marussia",
    "Matra Ford": "Matra",
    "McLaren BRM": "McLaren",
    "McLaren Ford": "McLaren",
    "McLaren Honda": "McLaren",
    "McLaren Mercedes": "McLaren",
    "McLaren Peugeot": "McLaren",
    "McLaren Renault": "McLaren",
    "McLaren TAG": "McLaren",
    "Mercedes-Benz": "Mercedes",
    "MF1 Toyota": "Midland",
    "Milano Speluzzi": "Milano",
    "Minardi Asiatech": "Minardi",
    "Minardi Cosworth": "Minardi",
    "Minardi Ferrari": "Minardi",
    "Minardi Fondmetal": "Minardi",
    "Minardi Ford": "Minardi",
    "Minardi Lamborghini": "Minardi",
    "MRT": "Marussia",
    "MRT Mercedes": "Marussia",
    "Onyx Ford": "Onyx",
    "Osella Alfa Romeo": "Osella",
    "Osella Ford": "Osella",
    "Parnelli Ford": "Parnelli",
    "Penske Ford": "Penske",
    "Phillips Offenhauser": "Phillips",
    "Prost Acer": "Prost",
    "Prost Mugen Honda": "Prost",
    "Prost Peugeot": "Prost",
    "Racing Point BWT Mercedes": "Force India",
    "RB Honda RBPT": "Toro Rosso",
    "RBR Cosworth": "Red Bull",
    "RBR Ferrari": "Red Bull",
    "RBR Renault": "Red Bull",
    "Red Bull Racing Honda RBPT": "Red Bull",
    "Red Bull Racing Honda": "Red Bull",
    "Red Bull Racing RBPT": "Red Bull",
    "Red Bull Racing Renault": "Red Bull",
    "Red Bull Racing TAG Heuer": "Red Bull",
    "Red Bull Renault": "Red Bull",
    "Rial Ford": "Rial",
    "Sauber BMW": "Sauber",
    "Sauber Ferrari": "Sauber",
    "Sauber Ford": "Sauber",
    "Sauber Mercedes": "Sauber",
    "Sauber Petronas": "Sauber",
    "Schroeder Offenhauser": "Schroeder",
    "Scuderia Toro Rosso Honda": "Toro Rosso",
    "Shadow Ford": "Shadow",
    "Sherman Offenhauser": "Sherman",
    "Simca-Gordini": "Simca-Gordini",
    "Spyker Ferrari": "Spyker",
    "Stewart Ford": "Stewart",
    "STR Cosworth": "Toro Rosso",
    "STR Ferrari": "Toro Rosso",
    "STR Renault": "Toro Rosso",
    "Super Aguri Honda": "Super Aguri",
    "Surtees Ford": "Surtees",
    "Theodore Ford": "Theodore",
    "Toleman Hart": "Toleman",
    "Toro Rosso Ferrari": "Toro Rosso",
    "Trevis Offenhauser": "Trevis",
    "Tyrrell Ford": "Tyrrell",
    "Tyrrell Honda": "Tyrrell",
    "Tyrrell Ilmor": "Tyrrell",
    "Tyrrell Renault": "Tyrrell",
    "Tyrrell Yamaha": "Tyrrell",
    "Venturi Lamborghini": "Venturi",
    "Virgin Cosworth": "Virgin",
    "Watson Offenhauser": "Watson",
    "Williams BMW": "Williams",
    "Williams Cosworth": "Williams",
    "Williams Ford": "Williams",
    "Williams Honda": "Williams",
    "Williams Judd": "Williams",
    "Williams Mecachrome": "Williams",
    "Williams Mercedes": "Williams",
    "Williams Renault": "Williams",
    "Williams Supertec": "Williams",
    "Williams Toyota": "Williams",
    "Wolf Ford": "Wolf",
    "Wolf-Williams": "Williams",
    "Zakspeed": "Zakspeed",
}

# Load the CSV file
df = pd.read_csv(csv_file)

# Replace values in the "Constructor" column
df["Constructor"] = df["Constructor"].replace(constructor_replacements)

# Save the updated DataFrame back to the file
df.to_csv(csv_file, index=False)

print("Constructor names in driver_standings dataset updated successfully")

Constructor names in driver_standings dataset updated successfully


# Data Cleaning and Preprocessing - Manually Editing Constructor Names in the Constructor Standings Dataset

In [12]:
# CSV file to edit
csv_file = "csv/constructor_standings.csv"

# Dictionary of original and replacement values
constructor_replacements = {
    "AGS Ford": "AGS",
    "Alfa Romeo Ferrari": "Alfa Romeo",
    "Alfa Romeo Racing Ferrari": "Sauber",
    "AlphaTauri Honda RBPT": "Toro Rosso",
    "AlphaTauri Honda": "Toro Rosso",
    "AlphaTauri RBPT": "Toro Rosso",
    "Alpine Renault": "Renault",
    "Arrows Asiatech": "Arrows",
    "Arrows BMW": "Arrows",
    "Arrows Cosworth": "Arrows",
    "Arrows Ford": "Arrows",
    "Arrows Megatron": "Arrows",
    "Arrows Supertec": "Arrows",
    "Arrows Yamaha": "Arrows",
    "Aston Martin Aramco Mercedes": "Aston Martin",    
    "Aston Martin Mercedes": "Aston Martin",
    "ATS Ford": "ATS",
    "BAR Honda": "BAR",
    "Benetton BMW": "Benetton",
    "Benetton Ford": "Benetton",
    "Benetton Playlife": "Benetton",
    "Benetton Renault": "Benetton",
    "Brabham Alfa Romeo": "Brabham",
    "Brabham BMW": "Brabham",
    "Brabham BRM": "Brabham",
    "Brabham Climax": "Brabham",
    "Brabham Ford": "Brabham",
    "Brabham Judd": "Brabham",
    "Brabham Repco": "Brabham",
    "Brabham Yamaha": "Brabham",
    "Brawn Mercedes": "Brawn",
    "BRM Climax": "BRM",
    "BRP BRM": "BRP",
    "Caterham Renault": "Caterham",
    "Connaught Alta": "Connaught",
    "Connaught Lea Francis": "Connaught",
    "Cooper Bristol": "Cooper",
    "Cooper BRM": "Cooper",
    "Cooper Castellotti": "Cooper",
    "Cooper Climax": "Cooper",
    "Cooper Maserati": "Cooper",
    "Dallara Ferrari": "Dallara",
    "Dallara Ford": "Dallara",
    "Dallara Judd": "Dallara",
    "Deidt Offenhauser": "Deidt",
    "Eagle Climax": "Eagle",
    "Eagle Weslake": "Eagle",
    "Ensign Ford": "Ensign",
    "Epperly Offenhauser": "Epperly",
    "Fittipaldi Ford": "Fittipaldi",
    "Footwork Ford": "Arrows",
    "Footwork Hart": "Arrows",
    "Footwork Mugen Honda": "Arrows",
    "Force India Ferrari": "Force India",
    "Force India Mercedes": "Force India",
    "Force India Sahara": "Force India",
    "Frank Williams Racing Cars/Williams": "Williams",
    "Haas Ferrari": "Haas",
    "Hesketh Ford": "Hesketh",
    "Hill Ford": "Hill",
    "HRT Cosworth": "HRT",
    "Iso Marlboro Ford": "Williams",
    "Jaguar Cosworth": "Jaguar",
    "Jordan Ford": "Jordan",
    "Jordan Hart": "Jordan",
    "Jordan Hart": "Jordan",
    "Jordan Honda": "Jordan",
    "Jordan Mugen Honda": "Jordan",
    "Jordan Peugeot": "Jordan",
    "Jordan Toyota": "Jordan",
    "Jordan Yamaha": "Jordan",
    "Kick Sauber Ferrari": "Sauber",
    "Kurtis Kraft Novi": "Kurtis Kraft",
    "Kurtis Kraft Offenhauser": "Kurtis Kraft",
    "Kuzma Offenhauser": "Kuzma",
    "Larrousse Ford": "Larrousse",
    "Larrousse Lamborghini": "Larrousse",
    "Lesovsky Offenhauser": "Lesovsky",
    "Leyton House Ilmor": "Leyton House",
    "Leyton House Judd": "Leyton House",
    "Ligier Ford": "Ligier",
    "Ligier Matra": "Ligier",
    "Ligier Megatron": "Ligier",
    "Ligier Mugen Honda": "Ligier",
    "Ligier Renault": "Ligier",
    "Lola Climax": "Lola",
    "Lola Ford": "Lola",
    "Lola Lamborghini": "Lola",
    "Lotus BRM": "Lotus",
    "Lotus Climax": "Lotus",
    "Lotus Cosworth": "Caterham",
    "Lotus Ford": "Lotus",
    "Lotus Honda": "Lotus",
    "Lotus Judd": "Lotus",
    "Lotus Lamborghini": "Lotus",
    "Lotus Mercedes": "Lotus",
    "Lotus Mugen Honda": "Lotus",
    "Lotus Renault": "Lotus",
    "March Ford": "March",
    "March Ilmor": "March",
    "March Judd": "March",
    "Marussia Cosworth": "Marussia",
    "Marussia Ferrari": "Marussia",
    "Matra Ford": "Matra",
    "Mclaren BRM": "McLaren",
    "McLaren BRM": "McLaren",
    "McLaren Ford": "McLaren",
    "McLaren Honda": "McLaren",
    "McLaren Mercedes": "McLaren",
    "McLaren Peugeot": "McLaren",
    "McLaren Renault": "McLaren",
    "McLaren Serenissima": "McLaren",
    "McLaren TAG": "McLaren",
    "Mercedes-Benz": "Mercedes",
    "MF1 Toyota": "Midland",
    "Milano Speluzzi": "Milano",
    "Minardi Asiatech": "Minardi",
    "Minardi Cosworth": "Minardi",
    "Minardi Ferrari": "Minardi",
    "Minardi Fondmetal": "Minardi",
    "Minardi Ford": "Minardi",
    "Minardi Lamborghini": "Minardi",
    "MRT": "Marussia",
    "MRT Mercedes": "Marussia",
    "Onyx Ford": "Onyx",
    "Osella Alfa Romeo": "Osella",
    "Osella Ford": "Osella",
    "Parnelli Ford": "Parnelli",
    "Penske Ford": "Penske",
    "Phillips Offenhauser": "Phillips",
    "Prost Acer": "Prost",
    "Prost Mugen Honda": "Prost",
    "Prost Peugeot": "Prost",
    "Racing Point BWT Mercedes": "Force India",
    "RB Honda RBPT": "Toro Rosso",
    "RBR Cosworth": "Red Bull",
    "RBR Ferrari": "Red Bull",
    "RBR Renault": "Red Bull",
    "Red Bull Racing Honda RBPT": "Red Bull",
    "Red Bull Racing Honda": "Red Bull",
    "Red Bull Racing RBPT": "Red Bull",
    "Red Bull Racing Renault": "Red Bull",
    "Red Bull Racing TAG Heuer": "Red Bull",
    "Red Bull Renault": "Red Bull",
    "Rial Ford": "Rial",
    "Sauber BMW": "Sauber",
    "Sauber Ferrari": "Sauber",
    "Sauber Ford": "Sauber",
    "Sauber Mercedes": "Sauber",
    "Sauber Petronas": "Sauber",
    "Schroeder Offenhauser": "Schroeder",
    "Scuderia Toro Rosso Honda": "Toro Rosso",
    "Shadow Ford": "Shadow",
    "Sherman Offenhauser": "Sherman",
    "Simca-Gordini": "Simca-Gordini",
    "Spyker Ferrari": "Spyker",
    "Stewart Ford": "Stewart",
    "STR Cosworth": "Toro Rosso",
    "STR Ferrari": "Toro Rosso",
    "STR Renault": "Toro Rosso",
    "Super Aguri Honda": "Super Aguri",
    "Surtees Ford": "Surtees",
    "Theodore Ford": "Theodore",
    "Toleman Hart": "Toleman",
    "Toro Rosso Ferrari": "Toro Rosso",
    "Trevis Offenhauser": "Trevis",
    "Tyrrell Ford": "Tyrrell",
    "Tyrrell Honda": "Tyrrell",
    "Tyrrell Ilmor": "Tyrrell",
    "Tyrrell Renault": "Tyrrell",
    "Tyrrell Yamaha": "Tyrrell",
    "Venturi Lamborghini": "Venturi",
    "Virgin Cosworth": "Virgin",
    "Watson Offenhauser": "Watson",
    "Williams BMW": "Williams",
    "Williams Cosworth": "Williams",
    "Williams Ford": "Williams",
    "Williams Honda": "Williams",
    "Williams Judd": "Williams",
    "Williams Mecachrome": "Williams",
    "Williams Mercedes": "Williams",
    "Williams Renault": "Williams",
    "Williams Supertec": "Williams",
    "Williams Toyota": "Williams",
    "Wolf Ford": "Wolf",
    "Wolf-Williams": "Williams",
    "Zakspeed": "Zakspeed",
}

# Load the CSV file
df = pd.read_csv(csv_file)

# Replace values in the "Constructor" column
df["Constructor"] = df["Constructor"].replace(constructor_replacements)

# Save the updated DataFrame back to the file
df.to_csv(csv_file, index=False)

print("Constructor names in constructor_standings dataset updated successfully")

Constructor names in constructor_standings dataset updated successfully


# Data Cleaning and Preprocessing - Manually Editing Constructor Names in the Fastest Laps Dataset

In [13]:
# CSV file to edit
csv_file = "csv/fastest_laps.csv"

# Dictionary of original and replacement values
constructor_replacements = {
    "AGS Ford": "AGS",
    "Alfa Romeo Ferrari": "Alfa Romeo",
    "Alfa Romeo Racing Ferrari": "Sauber",
    "AlphaTauri Honda RBPT": "Toro Rosso",
    "AlphaTauri Honda": "Toro Rosso",
    "AlphaTauri RBPT": "Toro Rosso",
    "Alpine Renault": "Renault",
    "Arrows Asiatech": "Arrows",
    "Arrows BMW": "Arrows",
    "Arrows Cosworth": "Arrows",
    "Arrows Ford": "Arrows",
    "Arrows Megatron": "Arrows",
    "Arrows Supertec": "Arrows",
    "Arrows Yamaha": "Arrows",
    "Aston Martin Aramco Mercedes": "Aston Martin",    
    "Aston Martin Mercedes": "Aston Martin",
    "ATS Ford": "ATS",
    "BAR Honda": "BAR",
    "Benetton BMW": "Benetton",
    "Benetton Ford": "Benetton",
    "Benetton Playlife": "Benetton",
    "Benetton Renault": "Benetton",
    "Brabham Alfa Romeo": "Brabham",
    "Brabham BMW": "Brabham",
    "Brabham BRM": "Brabham",
    "Brabham Climax": "Brabham",
    "Brabham Ford": "Brabham",
    "Brabham Judd": "Brabham",
    "Brabham Repco": "Brabham",
    "Brabham Yamaha": "Brabham",
    "Brawn Mercedes": "Brawn",
    "BRM Climax": "BRM",
    "BRP BRM": "BRP",
    "Caterham Renault": "Caterham",
    "Connaught Alta": "Connaught",
    "Connaught Lea Francis": "Connaught",
    "Cooper Bristol": "Cooper",
    "Cooper BRM": "Cooper",
    "Cooper Castellotti": "Cooper",
    "Cooper Climax": "Cooper",
    "Cooper Maserati": "Cooper",
    "Dallara Ferrari": "Dallara",
    "Dallara Ford": "Dallara",
    "Dallara Judd": "Dallara",
    "Deidt Offenhauser": "Deidt",
    "Eagle Climax": "Eagle",
    "Eagle Weslake": "Eagle",
    "Ensign Ford": "Ensign",
    "Epperly Offenhauser": "Epperly",
    "Fittipaldi Ford": "Fittipaldi",
    "Footwork Ford": "Arrows",
    "Footwork Hart": "Arrows",
    "Footwork Mugen Honda": "Arrows",
    "Force India Ferrari": "Force India",
    "Force India Mercedes": "Force India",
    "Force India Sahara": "Force India",
    "Frank Williams Racing Cars/Williams": "Williams",
    "Haas Ferrari": "Haas",
    "Hesketh Ford": "Hesketh",
    "Hill Ford": "Hill",
    "HRT Cosworth": "HRT",
    "Iso Marlboro Ford": "Williams",
    "Jaguar Cosworth": "Jaguar",
    "Jordan Ford": "Jordan",
    "Jordan Hart": "Jordan",
    "Jordan Hart": "Jordan",
    "Jordan Honda": "Jordan",
    "Jordan Mugen Honda": "Jordan",
    "Jordan Peugeot": "Jordan",
    "Jordan Toyota": "Jordan",
    "Jordan Yamaha": "Jordan",
    "Kick Sauber Ferrari": "Sauber",
    "Kurtis Kraft Novi": "Kurtis Kraft",
    "Kurtis Kraft Offenhauser": "Kurtis Kraft",
    "Kuzma Offenhauser": "Kuzma",
    "Larrousse Ford": "Larrousse",
    "Larrousse Lamborghini": "Larrousse",
    "Lesovsky Offenhauser": "Lesovsky",
    "Leyton House Ilmor": "Leyton House",
    "Leyton House Judd": "Leyton House",
    "Ligier Ford": "Ligier",
    "Ligier Matra": "Ligier",
    "Ligier Megatron": "Ligier",
    "Ligier Mugen Honda": "Ligier",
    "Ligier Renault": "Ligier",
    "Lola Climax": "Lola",
    "Lola Ford": "Lola",
    "Lola Lamborghini": "Lola",
    "Lotus BRM": "Lotus",
    "Lotus Climax": "Lotus",
    "Lotus Cosworth": "Caterham",
    "Lotus Ford": "Lotus",
    "Lotus Honda": "Lotus",
    "Lotus Judd": "Lotus",
    "Lotus Lamborghini": "Lotus",
    "Lotus Mercedes": "Lotus",
    "Lotus Mugen Honda": "Lotus",
    "Lotus Renault": "Lotus",
    "March Ford": "March",
    "March Ilmor": "March",
    "March Judd": "March",
    "Marussia Cosworth": "Marussia",
    "Marussia Ferrari": "Marussia",
    "Matra Ford": "Matra",
    "Mclaren BRM": "McLaren",
    "McLaren BRM": "McLaren",
    "McLaren Ford": "McLaren",
    "McLaren Honda": "McLaren",
    "McLaren Mercedes": "McLaren",
    "McLaren Peugeot": "McLaren",
    "McLaren Renault": "McLaren",
    "McLaren Serenissima": "McLaren",
    "McLaren TAG": "McLaren",
    "Mercedes-Benz": "Mercedes",
    "MF1 Toyota": "Midland",
    "Milano Speluzzi": "Milano",
    "Minardi Asiatech": "Minardi",
    "Minardi Cosworth": "Minardi",
    "Minardi Ferrari": "Minardi",
    "Minardi Fondmetal": "Minardi",
    "Minardi Ford": "Minardi",
    "Minardi Lamborghini": "Minardi",
    "MRT": "Marussia",
    "MRT Mercedes": "Marussia",
    "Onyx Ford": "Onyx",
    "Osella Alfa Romeo": "Osella",
    "Osella Ford": "Osella",
    "Parnelli Ford": "Parnelli",
    "Penske Ford": "Penske",
    "Phillips Offenhauser": "Phillips",
    "Prost Acer": "Prost",
    "Prost Mugen Honda": "Prost",
    "Prost Peugeot": "Prost",
    "Racing Point BWT Mercedes": "Force India",
    "RB Honda RBPT": "Toro Rosso",
    "RBR Cosworth": "Red Bull",
    "RBR Ferrari": "Red Bull",
    "RBR Renault": "Red Bull",
    "Red Bull Racing Honda RBPT": "Red Bull",
    "Red Bull Racing Honda": "Red Bull",
    "Red Bull Racing RBPT": "Red Bull",
    "Red Bull Racing Renault": "Red Bull",
    "Red Bull Racing TAG Heuer": "Red Bull",
    "Red Bull Renault": "Red Bull",
    "Rial Ford": "Rial",
    "Sauber BMW": "Sauber",
    "Sauber Ferrari": "Sauber",
    "Sauber Ford": "Sauber",
    "Sauber Mercedes": "Sauber",
    "Sauber Petronas": "Sauber",
    "Schroeder Offenhauser": "Schroeder",
    "Scuderia Toro Rosso Honda": "Toro Rosso",
    "Shadow Ford": "Shadow",
    "Shadow Matra": "Shadow",
    "Sherman Offenhauser": "Sherman",
    "Simca-Gordini": "Simca-Gordini",
    "Spyker Ferrari": "Spyker",
    "Stewart Ford": "Stewart",
    "STR Cosworth": "Toro Rosso",
    "STR Ferrari": "Toro Rosso",
    "STR Renault": "Toro Rosso",
    "Super Aguri Honda": "Super Aguri",
    "Surtees Ford": "Surtees",
    "Theodore Ford": "Theodore",
    "Toleman Hart": "Toleman",
    "Toro Rosso Ferrari": "Toro Rosso",
    "Trevis Offenhauser": "Trevis",
    "Tyrrell Ford": "Tyrrell",
    "Tyrrell Honda": "Tyrrell",
    "Tyrrell Ilmor": "Tyrrell",
    "Tyrrell Renault": "Tyrrell",
    "Tyrrell Yamaha": "Tyrrell",
    "Venturi Lamborghini": "Venturi",
    "Virgin Cosworth": "Virgin",
    "Watson Offenhauser": "Watson",
    "Williams BMW": "Williams",
    "Williams Cosworth": "Williams",
    "Williams Ford": "Williams",
    "Williams Honda": "Williams",
    "Williams Judd": "Williams",
    "Williams Mecachrome": "Williams",
    "Williams Mercedes": "Williams",
    "Williams Renault": "Williams",
    "Williams Supertec": "Williams",
    "Williams Toyota": "Williams",
    "Wolf Ford": "Wolf",
    "Wolf-Williams": "Williams",
    "Zakspeed": "Zakspeed",
}

# Load the CSV file
df = pd.read_csv(csv_file)

# Replace values in the "Constructor" column
df["Constructor"] = df["Constructor"].replace(constructor_replacements)

# Save the updated DataFrame back to the file
df.to_csv(csv_file, index=False)

print("Constructor names in fastest_laps dataset updated successfully")

Constructor names in fastest_laps dataset updated successfully


# Data Cleaning and Preprocessing - Manually Changing "United States" to "United States of America"

We standardize the country name to "United States of America" to ensure consistency between the datasets retrieved from the F1 website and subsequent datasets.

In [14]:
# List of CSV files to update
csv_files = [
    "csv/race_winners.csv",
    "csv/driver_standings.csv",
    "csv/constructor_standings.csv",
    "csv/fastest_laps.csv"
]

# Function to replace "United States" with "United States of America"
def update_us_name(file_path):
    # Read the CSV file
    df = pd.read_csv(file_path)
    
    # Replace occurrences of "United States" with "United States of America"
    df.replace(to_replace="United States", value="United States of America", inplace=True)
    
    # Save the updated dataframe back to the same CSV file
    df.to_csv(file_path, index=False)
    print(f"Updated file: {file_path}")

# Apply the function to each file
for file in csv_files:
    update_us_name(file)

Updated file: csv/race_winners.csv
Updated file: csv/driver_standings.csv
Updated file: csv/constructor_standings.csv
Updated file: csv/fastest_laps.csv
