In [1]:
%run Functions.ipynb

# Fill in missing values
Last time, we saw that comparing lap time differences is meaningful. We also fixed the problem with missing laps. Instead of filling in those laps each time we need them, we will do it once and save it on csv files.

We shouldn't need to run this file again, since we already have the files.

In [3]:
# Get the 2018 races' ids
df_races = pd.read_csv("f1db_csv/races.csv")
df_results = pd.read_csv("f1db_csv/results.csv")

races18 = (df_races >> mask(X.year==2018)).raceId.values

We now go to each race and determine which drivers did not finish the race.

The status.csv file contains codes that describe if the driver finished the race, and if not, why. The most important codes for us are 1 (finished) and 11 thru 19 (+1 lap, +2 laps, ..., +9 laps). We will not touch drivers with code 1. We will use using fill_laps_behind for codes 11 to 19, and fill_laps for the others (this contains the penalties).

In [4]:
def fill_missing_laps(method=slowest_time, finishers_only=False):
    FilledLapTimes = lap_times.copy()
    for race in races18:
        # We filter the drivers that did not finish
        race_results = (df_results >>
                        mask(X.raceId==race, X.statusId != 1) >>
                        select(X.driverId, X.statusId))

        # If there are no entries, all the drivers finished the race
        if(race_results.size==0):
            continue

        # Otherwise...
        drivers_dnf = race_results.driverId.values
        status = race_results.statusId.values

        n = len(drivers_dnf)

        # We populate each driver's lap times
        for i in range(n):
            driver = drivers_dnf[i]
            s = int(status[i])

            # If the status is between 11 and 19, we fill laps behind
            if(11 <= s <= 19):
                FilledLapTimes = fill_laps_behind(FilledLapTimes, race, driver)
            # If not, we penalize
            # We also have a flag to indicate if we want to fill in for DNF'd drivers
            elif(not finishers_only):
                FilledLapTimes = fill_laps(FilledLapTimes, race, driver, method=method)
    
    # Sort the data frame as we need laps in increasing order (and nice looking df's are always good)
    FilledLapTimes = (FilledLapTimes >>
                      arrange(X.raceId, X.driverId, X.lap))
    
    return(FilledLapTimes)

## Fill in lap times with DNF's

In [25]:
SlowestLapTimes = fill_missing_laps(method=slowest_time)
print(SlowestLapTimes.shape)

(25280, 6)


In [4]:
# This kept crashing, but we still have to fix the same_position function.
# PositionLapTimes= fill_missing_laps(method=same_position)
# print(PositionLapTimes.shape)

In [27]:
AverageLapTimes = fill_missing_laps(method=average_time)
print(AverageLapTimes.shape)

(25280, 6)


Write the new databases to csv files!

In [28]:
SlowestLapTimes.to_csv("f1db_csv/SlowestLapTimes.csv")
AverageLapTimes.to_csv("f1db_csv/AverageLapTimes.csv")

## Fill in lap times only for the drivers that finished
Including those that were more than one lap behind.

In [5]:
FinishersLapTimes = fill_missing_laps(finishers_only=True)
FinishersLapTimes.to_csv("f1db_csv/FinishersLapTimes.csv")