# Lap Time Decomposer

## 00 - Packages

In [None]:
import fastf1 
import pandas as pd 
import matplotlib.pyplot as plt 
import seaborn as sns
from matplotlib.colors import LinearSegmentedColormap
import fastf1.plotting 

## 01 - Data Collection

Enabling cache is highly encouraged to speed up the runtime when re-running scripts.

In [None]:
fastf1.Cache.enable_cache("../cache") # Enable caching to speed up data retrieval

The `get_session()` function is essential for data collection. This function allows us to grab sessions by inputting a particular season, grand prix, and session type (practice, qualifying, race). 

In [None]:
session = fastf1.get_session(2024, "Monaco", "Q") # Load Monaco 2024 Qualifying session
session.load()

## 02 - Cleaning and Processing Data

The dataset being used is the Monaco 2024 Qualifying data, which contains many potential outliers due to cooldown laps, inlaps, and outlaps. Although these features may have uses, the outliers should be removed from the dataset for this task. `pick_quicklaps()` is a function from FastF1's api that returns all lap times that are faster than a certain threshold (default is 107% of the best lap), this helps with dropping those non-flying laps.

In [None]:
# Drop laps that have been deleted (e.g. Track limits) and Laps that are not accurate (e.g. Outlap, Inlap)
quick_laps = session.laps.pick_quicklaps().copy()
laps = quick_laps[(quick_laps["IsAccurate"] == True) & (quick_laps["Deleted"] == False)]

In [None]:
# Convert time columns to timedelta for easier calculations
laps["LapTime"] = laps["LapTime"].dt.total_seconds()
laps["Sector1Time"] = laps["Sector1Time"].dt.total_seconds()
laps["Sector2Time"] = laps["Sector2Time"].dt.total_seconds()
laps["Sector3Time"] = laps["Sector3Time"].dt.total_seconds()
laps["TyreLife"] = laps["TyreLife"].astype(int)
laps.drop(columns=["PitInTime", "PitOutTime", "Position", "DeletedReason"], inplace=True) # Drop columns that are not needed

In [None]:
laps.rename(columns={"LapTime": "LapTime (s)", "Sector1Time": "Sector1Time (s)",
                   "Sector2Time": "Sector2Time (s)", "Sector3Time": "Sector3Time (s)"}, inplace=True)

The `describe()` is a great function to help analyze the dataset and make sure that there are no significant outliers skewing the data.

In [None]:
print(laps[["Sector1Time (s)", "Sector2Time (s)", "Sector3Time (s)", "LapTime (s)", "TyreLife"]].describe()) # Print statistics of the lap times and sectors

## 03 - Exploratory Data Analysis

### 3.1 Sector Time Distribution

In [None]:
plt.style.use("dark_background") # Enable dark background for plt

fig, ax = plt.subplots(1,3, figsize=(18, 5))
for ax_i in ax:
    ax_i.grid(color="gray", linestyle="--", linewidth=0.5) # Add grid to each subplot
    
sns.histplot(data = laps, x="Sector1Time (s)", bins=20, kde=True,color="red", ax=ax[0]) # kde = Kernel Desnity Estimate (Density Curve)
sns.histplot(data = laps, x="Sector2Time (s)", bins=20, kde=True, color="royalblue", ax=ax[1])
sns.histplot(data = laps, x="Sector3Time (s)", bins=20, kde=True, color="gold", ax=ax[2])

for i in range(3):
    ax[i].set_title(f"Sector {i + 1} Times")
    ax[i].set_xlabel(f"Sector {i + 1} Time (seconds)")
    ax[i].set_ylabel("Frequency")

plt.suptitle("Sector Times Distribution - Monaco 2024 Qualifying", fontsize=16)
plt.tight_layout() 
plt.show()


**Observation:**  
- The histograms show that Sector 1 and Sector 2 are slightly skewed to the right, whereas Sector 3 is much closer to a symmetric (normal) distribution.
- The right skewed distribution in Sectors 1 and 2 indicates that possible outliers still remain in our dataset. 
- It would be beneficial for us to look deeper and filter out the remaining outliers and re-plot the distribution to verify before working with the dataset.

### 3.2 Sector Times vs. Lap Time

In [None]:
fig, ax = plt.subplots(1, 3, figsize = (18, 5))

tyre_life__cmap = LinearSegmentedColormap.from_list("TyreLife", ["limegreen","yellow","orange", "red"]) # Create a colormap for tyre life

for i in range(3):
    sns.scatterplot(data = laps, x=f"Sector{i+1}Time (s)", y="LapTime (s)", hue="TyreLife", palette=tyre_life__cmap, ax=ax[i])
    ax[i].set_title(f"Lap Time vs. Sector {i + 1} Time ")
    ax[i].set_xlabel(f"Sector {i + 1} Time (seconds)")
    ax[i].set_ylabel("Lap Time (seconds)")

fig.suptitle("LapTime vs. Sector Time - Monaco 2024 Qualifying", fontsize=16)

plt.tight_layout()
plt.show()

**Observation:**  
- All three sectors show a positive relationship where a slower sector time corresponds to a slower lap time (this is expected).
- There are still a handful of outliers within all sectors, with a significant outlier in sector 3.
- Sector 3 is much more tight and is steeper than the other sectors.
- Interestingly, a longer tire life does not necessarily mean a slower lap time.

### 3.3 Tyre Life vs. Sector Times

In [None]:
fig, ax = plt.subplots(1, 3, figsize=(18, 5))

for i in range(3):
    sns.scatterplot(data = laps, x = "TyreLife", y = f"Sector{i + 1}Time (s)", hue="TyreLife", palette=tyre_life__cmap, ax=ax[i])
    ax[i].set_title(f"Sector {i + 1} Time vs. Tyre Life ")
    ax[i].set_xlabel("Tyre Life (laps)")
    ax[i].set_ylabel(f"Sector {i + 1} Time (seconds)")
    
fig.suptitle("Sector Times vs. Tyre Life - Monaco 2024 Qualifying", fontsize=16)
plt.tight_layout()
plt.show()

**Observation:**
- We can again see that the varied sector times come from an early tire life. Sector times get tighter as tire life extends.
- Could this be explained by optimal tire performance window? Track temperatures? Track rubber?
- Could it also be the teams' car peformance in the first two sectors with heavy braking corners?

### 3.4 Tyre Life vs. Lap Time

In [None]:
plt.figure(figsize=(18, 8))

sns.boxplot(data = laps, x="TyreLife", y="LapTime (s)",color="red", linecolor="white",
            boxprops =dict(facecolor= "red", alpha= 0.6, edgecolor="white"), 
            whiskerprops=dict(color="white"),
            medianprops=dict(color="grey"),
            capprops=dict(color="white")
            )

plt.suptitle("Lap Time vs. Tyre Life - Monaco 2024 Qualifying", fontsize=14)
plt.ylabel("Lap Time (seconds)")
plt.xlabel("Tyre Life (laps)")
plt.tight_layout()
plt.show()

**Observation:**
- We can see from the boxplot that from tyre life 2 $\rightarrow 3$ there is an actually decrease in the median lap time.
- From tyre life 3 the median lap time jumps around.

### 3.5 Team Pace Comparison

In [None]:
# Get the team order based on median lap times
team_order = (laps[["Team", "LapTime (s)"]]
              .groupby("Team")
              .median()["LapTime (s)"]
              .sort_values()
              .index
) 

# Create a color palette for the teams
team_palette = {team: fastf1.plotting.get_team_color(team, session=session)
                for team in team_order}

fig, ax = plt.subplots(figsize=(18, 8))
sns.boxplot(
    data=laps,
    x="Team",
    y="LapTime (s)",
    hue="Team",
    order=team_order,
    palette=team_palette,
    whiskerprops=dict(color="white"),
    boxprops=dict(edgecolor="white"),
    medianprops=dict(color="grey"),
    capprops=dict(color="white"),
)

plt.title("Team Pace Comparison - 2024 Monaco Qualifying ", fontsize=16)
plt.grid(visible=False)

ax.set(xlabel=None)
plt.tight_layout()
plt.show()

**Observation:**
- We can observe from this plot the different teams and their lap time performances.
- A tighter box shows a closer lap time performance between the drivers of the team whereas a large box explains a bigger difference.
- You can see team Kick Sauber is considered an outlier with their fastest lap time at around 72 seconds.
- We can use this to possible create a team offset feature for our model.
- This could also explain the outliers in the previous plots.

In [None]:
drivers = session.drivers
drivers = [session.get_driver(drivers)["Abbreviation"] for drivers in drivers]

In [None]:
sector_corr = laps[["Sector1Time (s)","Sector2Time (s)","Sector3Time (s)", "TyreLife","LapTime (s)"]].corr()

plt.figure(figsize=(18,7))
sns.heatmap(sector_corr, annot=True, cmap="vlag", center=0)
plt.title("Feature Correlation Matrix", fontsize=16)
plt.tight_layout()
plt.show()

**Observation:**
- From the correlation plot you can see that TyreLife, LapTime, and the sector times have a low Pearson's R score meaning that there is little to no relationship between the features.
- But sector times have  a Pearson's R of 0.9, 0.91, and 0.8 meaning that sector times have a relationship with lap time as expected
  $$LapTime = Sector1 + Sector2 + Sector3$$

In [None]:
speed_corr = laps[["SpeedI1", "SpeedI2", "SpeedFL", "SpeedST", "LapTime (s)", "Sector1Time (s)", "Sector2Time (s)", "Sector3Time (s)"]].corr()

plt.figure(figsize=(18,8))
sns.heatmap(speed_corr, annot=True, cmap="vlag", center=0)
plt.title("Speed Feature Correlation Matrix", fontsize=16)
plt.tight_layout()
plt.show()

**Observation:**
- In this correlation matrix I added the speed traps from sector 1, sector 2, finish line and the longest straight to see if there is any correlations with the other chosen features.
- There are no correlations between the speed traps with lap times or sector times.

## 04 - Feature Engineering

## 05 - Modeling

## 06 - Model Validation and Diagnostics