# Looking for the best Formula 1 season

For my master's project, I'm making a piece about answering the question: **What championship winning team had the best Formula 1 season?**

To answer this question, I'll be checking three definitions of best:

1. most wins in a season
1. most podiums in a season
1. how close was the performance to perfect

To do this I was working with data provided by the [Ergast Developer API](https://ergast.com/mrd/). I noticed an error in the driver-constructor pairing for the 1950 season and wanted to verify things without moving forward. I was originally going to create a table of the driver-constructor pairs for each race, and then compare it with the data I had.

Instead I went straight to the source for F1 information, [formula1.com](https://formula1.com), and scraped race information for each race from 1950 to 2018. There were some holes with how disqualifications and withdrawal were recorded (or not, in this case) as we went back in time to earlier seasons.

Now I've gone and gotten data from [statsf1.com](https://www.statsf1.com/) which is tabulated in an easy to understand manner and is more complete than the formula1.com data.

In [1]:
import pandas as pd
import numpy as np

In [13]:
race_results = pd.read_csv("../data/from_scripts/statsf1_race_results.csv")
# race_results = pd.read_csv("../data/other/statsf1_race_results_v2.csv")

In [14]:
race_results.head(30)

Unnamed: 0,race_id,year,round,race_name,position,p0,driver,team,constructor_long,extra
0,1,1950,1,Britain,1,1.0,Giuseppe FARINA,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 13m 23.6s ( 146.378 km/h )
1,1,1950,1,Britain,2,2.0,Luigi FAGIOLI,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 13m 26.2s ( +02.6s )
2,1,1950,1,Britain,3,3.0,Reg PARNELL,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 14m 15.6s ( +52.0s )
3,1,1950,1,Britain,4,4.0,Yves GIRAUD-CABANTOUS,Talbot Lago,Talbot Lago Talbot,
4,1,1950,1,Britain,5,5.0,Louis ROSIER,Talbot Lago,Talbot Lago Talbot,
5,1,1950,1,Britain,6,6.0,Bob GERARD,ERA,ERA ERA,
6,1,1950,1,Britain,7,7.0,Cuth HARRISON,ERA,ERA ERA,
7,1,1950,1,Britain,8,8.0,Philippe ETANCELIN,Talbot Lago,Talbot Lago Talbot,
8,1,1950,1,Britain,9,9.0,David HAMPSHIRE,Maserati,Maserati Maserati,
9,1,1950,1,Britain,10,10.0,Joe FRY,Maserati,Maserati Maserati,


Let's verify that we have the right number of races. Between 1950 and the end of the 2018 season there were 997 races.

In [15]:
race_results.race_id.max()

997

Before we get to analysis, there is some processing that needs to be done. First I want to fill in the teams.

In [16]:
def update_teams(row):
    df = race_results
    if row.position == "&":
        return df.team.shift(-1)[0]
    else:
        return row.team
    
def update_constructor_long(row):
    df = race_results
    if row.position == "&":
        return df.constructor_long.shift(-1)[0]
    else:
        return row.constructor_long

In [17]:
race_results["team"] = race_results.apply(update_teams, axis=1)
race_results["constructor_long"] = race_results.apply(update_teams, axis = 1)

In [18]:
race_results.head(30)

Unnamed: 0,race_id,year,round,race_name,position,p0,driver,team,constructor_long,extra
0,1,1950,1,Britain,1,1.0,Giuseppe FARINA,Alfa Romeo,Alfa Romeo,2h 13m 23.6s ( 146.378 km/h )
1,1,1950,1,Britain,2,2.0,Luigi FAGIOLI,Alfa Romeo,Alfa Romeo,2h 13m 26.2s ( +02.6s )
2,1,1950,1,Britain,3,3.0,Reg PARNELL,Alfa Romeo,Alfa Romeo,2h 14m 15.6s ( +52.0s )
3,1,1950,1,Britain,4,4.0,Yves GIRAUD-CABANTOUS,Talbot Lago,Talbot Lago,
4,1,1950,1,Britain,5,5.0,Louis ROSIER,Talbot Lago,Talbot Lago,
5,1,1950,1,Britain,6,6.0,Bob GERARD,ERA,ERA,
6,1,1950,1,Britain,7,7.0,Cuth HARRISON,ERA,ERA,
7,1,1950,1,Britain,8,8.0,Philippe ETANCELIN,Talbot Lago,Talbot Lago,
8,1,1950,1,Britain,9,9.0,David HAMPSHIRE,Maserati,Maserati,
9,1,1950,1,Britain,10,10.0,Joe FRY,Maserati,Maserati,


Now we can look at processing the finishing order.  In scraping I had created a rough version of the final order, but now I want to refine it more.

The position column gives us information about how the driver fared in the race. There are several options:

* If the position is a number (in string form or otherwise) then that is the finishing position of the driver.
* If the position is `&` then that driver record is for a shared drive and the finishing position of that driver is the same as the record directly above it.
* If the position is `ab` then the driver retired. We will try two different interpretations: *leave the order as is* and, *change all the retired orders to the retired order for the race.*
* IF the position is `nc` the driver did not classify for the final positions, so we can make that the average of the retired drivers as well.
* If the position is `f` then the driver withdrew from a race. They will ranked as the last possible spot.
* If the position is `np` then the driver did not star the race, but was on the grid. They will be ranked as the last possible spot.
* If the position is `dsq`, the driver was disqualified and their finishing position will be the the last possible spot.
* If the position is `npq`, `nq`, or `exc` the driver's order will be ignored. 
* If the position is `tf` do nothing.

We'll do it in two parts, first updating everything but the shared drives.

In [8]:
def p1(row):
    race = race_results[race_results.race_id == row.race_id]
    last_place = race.p0.max()
    avg_retire = np.round(race[race.position.isin(["ab", "nc"])].p0.mean())
    
    if (row.position == "dsq") or (row.position == "f") or (row.position == "np"):
        return last_place
    elif (row.position == "ab") or (row.position == "nc"):
        return avg_retire        
    else:
        return row.p0

def p2(row):
    race = race_results[race_results.race_id == row.race_id]
    last_place = race.p0.max()
    avg_retire = np.round(race[race.position.isin(["ab", "nc"])].p0.mean())
    
    if (row.position == "dsq") or (row.position == "f") or (row.position == "np"):
        return last_place
    elif (row.position == "nc"):
        return avg_retire        
    else:
        return row.p0

And then updating the shared drives:

In [9]:
shared_drives = race_results.index[race_results.position == "&"].tolist()

def update_p1(row):
    prev = race_results.iloc[row.name -1]
    if row.name in shared_drives:
        return prev.p1
    else:
        return row.p1

In [10]:
race_results["p1"] = race_results.apply(p1, axis =1)
race_results["p1"] = race_results.apply(update_p1, axis=1)

In [11]:
race_results.head(30)

Unnamed: 0,result_id,race_id,year,round,race_name,position,p0,driver,team,constructor_long,extra,p1,p1_v2
0,1,1,1950,1,Britain,1,1,Giuseppe FARINA,Alfa Romeo,Alfa Romeo,2h 13m 23.6s ( 146.378 km/h ),1.0,1.0
1,2,1,1950,1,Britain,2,2,Luigi FAGIOLI,Alfa Romeo,Alfa Romeo,2h 13m 26.2s ( +02.6s ),2.0,2.0
2,3,1,1950,1,Britain,3,3,Reg PARNELL,Alfa Romeo,Alfa Romeo,2h 14m 15.6s ( +52.0s ),3.0,3.0
3,4,1,1950,1,Britain,4,4,Yves GIRAUD-CABANTOUS,Talbot Lago,Talbot Lago,,4.0,4.0
4,5,1,1950,1,Britain,5,5,Louis ROSIER,Talbot Lago,Talbot Lago,,5.0,5.0
5,6,1,1950,1,Britain,6,6,Bob GERARD,ERA,ERA,,6.0,6.0
6,7,1,1950,1,Britain,7,7,Cuth HARRISON,ERA,ERA,,7.0,7.0
7,8,1,1950,1,Britain,8,8,Philippe ETANCELIN,Talbot Lago,Talbot Lago,,8.0,8.0
8,9,1,1950,1,Britain,9,9,David HAMPSHIRE,Maserati,Maserati,,9.0,9.0
9,10,1,1950,1,Britain,10,10,Joe FRY,Maserati,Maserati,,10.0,10.0


In [12]:
race_results[race_results.race_id == 273]

Unnamed: 0,result_id,race_id,year,round,race_name,position,p0,driver,team,constructor_long,extra,p1,p1_v2
7174,7175,273,1976,9,Britain,dsq,-1,James HUNT,McLaren,McLaren,Started unofficially 1h 43m 27.61s,28.0,28.0
7175,7176,273,1976,9,Britain,1,1,Niki LAUDA,Ferrari,Ferrari,1h 44m 19.66s ( 183.881 km/h ),1.0,1.0
7176,7177,273,1976,9,Britain,2,2,Jody SCHECKTER,Tyrrell,Tyrrell,1h 44m 35.84s ( +16.18s ),2.0,2.0
7177,7178,273,1976,9,Britain,3,3,John WATSON,Penske,Penske,,3.0,3.0
7178,7179,273,1976,9,Britain,4,4,Tom PRYCE,Shadow,Shadow,,4.0,4.0
7179,7180,273,1976,9,Britain,5,5,Alan JONES,Surtees,Surtees,,5.0,5.0
7180,7181,273,1976,9,Britain,6,6,Emerson FITTIPALDI,Copersucar,Copersucar,,6.0,6.0
7181,7182,273,1976,9,Britain,7,7,Harald ERTL,Hesketh,Hesketh,,7.0,7.0
7182,7183,273,1976,9,Britain,8,8,Carlos PACE,Brabham,Brabham,,8.0,8.0
7183,7184,273,1976,9,Britain,9,9,Jean-Pierre JARIER,Shadow,Shadow,,9.0,9.0
