# Pitstops and their Impact on Race Outcome
We will be exploring pitstop data from F1 seasons 2018-2023 and looking at how they determine the outcome of the races

## STEP 1 - Loading the Data & Libraries

In [None]:
# Let us start by importing the necessary libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import sklearn.model_selection as model_selection

# Load the datasets
races = pd.read_csv('data/races.csv')
results = pd.read_csv('data/results.csv')
pit_stops = pd.read_csv('data/pit_stops.csv')
drivers = pd.read_csv('data/drivers.csv')

# Display the first few rows of the dataset
races.head(), results.head(), pit_stops.head(), drivers.head(), 

## STEP 2 - Filtering the Data
We are only interested in data from seasons 2018-2023 so let us try to filter the dataset

In [None]:
# Filtering the dataset for years 2018 to 2023
races_2018_2023 = races[(races['year'] >= 2018) & (races['year'] <= 2023)]

# Getting the raceId for the years 2018 to 2023
raceIds_2018_2023 = races_2018_2023['raceId'].unique()

# Filtering pit_stops & results dataset with the raceIds from 2018 to 2023
pit_stops_2018_2023 = pit_stops[pit_stops['raceId'].isin(raceIds_2018_2023)]
results_2018_2023 = results[results['raceId'].isin(raceIds_2018_2023)]

# Display the shape of the filtered datasets
races_2018_2023.shape, pit_stops_2018_2023.shape, results_2018_2023.shape

## STEP 3 - Cleaning the Datasets
Now that we have the filtered data we can carry on with cleaning the data by handling missing values and outliers

In [None]:
# Check for missing values in the filtered datasets
missing_values_races = races_2018_2023.isnull().sum()
missing_values_results = results_2018_2023.isnull().sum()
missing_values_pit_stops = pit_stops_2018_2023.isnull().sum()

missing_values_races, missing_values_results, missing_values_pit_stops

There are no missing values. Let us look for outliers by checking the descriptive statistics in each dataset

In [None]:
# Descriptive statistics of the filtered datasets
desc_stats_races = races_2018_2023.describe()
desc_stats_results = results_2018_2023.describe()
desc_stats_pit_stops = pit_stops_2018_2023.describe()

desc_stats_races, desc_stats_results, desc_stats_pit_stops