# Looking for the best Formula 1 season

For my master's project, I'm making a piece about answering the question: **What championship winning team had the best Formula 1 season?**

To do this I was working with data provided by the [Ergast Developer API](https://ergast.com/mrd/). I noticed an error in the driver-constructor pairing for the 1950 season and wanted to verify things without moving forward. I was originally going to create a table of the driver-constructor pairs for each race, and then compare it with the data I had.

Instead I went straight to the source for F1 information, [formula1.com](https://formula1.com), and scraped race information for each race from 1950 to 2018. There were some holes with how disqualifications and withdrawal were recorded (or not, in this case) as we went back in time to earlier seasons.

Now I've gone and gotten data from [statsf1.com](https://www.statsf1.com/) which is tabulated in an easy to understand manner and is more complete than the formula1.com data, and doesn't have the issues of the Ergast data.

In terms of defining best, I'll try three different things:

1. Looking at the number of wins each team got in their season
1. Looking at the podiums a team won during their season
1. Averaging finishing positions to see how close they got to a perfect season (the case where they have the lowest average for each race)

In [1]:
import pandas as pd
import numpy as np

In [2]:
race_results = pd.read_csv("../data/from_scripts/statsf1/race_results.csv")

In [3]:
race_results.head()

Unnamed: 0,race_id,year,round,race_name,position,order,driver,constructor,team,extra
0,1,1950,1,Britain,1,1,Giuseppe FARINA,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 13m 23.6s ( 146.378 km/h )
1,1,1950,1,Britain,2,2,Luigi FAGIOLI,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 13m 26.2s ( +02.6s )
2,1,1950,1,Britain,3,3,Reg PARNELL,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 14m 15.6s ( +52.0s )
3,1,1950,1,Britain,4,4,Yves GIRAUD-CABANTOUS,Talbot Lago,Talbot Lago Talbot,
4,1,1950,1,Britain,5,5,Louis ROSIER,Talbot Lago,Talbot Lago Talbot,


Quickly check for the races in this period (should be 997)

In [4]:
race_results.race_id.max()

997

I will work with a slice of this `race_results` dataFrame that only includes the team in their championship winning season. Let's make that slice now:

In [5]:
winning_teams = pd.read_csv("../data/other/winning_teams_statsf1.csv")

In [6]:
winning_teams.head()

Unnamed: 0,year,team
0,1950,Alfa Romeo Alfa Romeo
1,1951,Alfa Romeo Alfa Romeo
2,1952,Ferrari Ferrari
3,1953,Ferrari Ferrari
4,1954,Mercedes Mercedes


Now we combine this dataframe with the others:

In [7]:
combine = pd.merge(race_results, winning_teams, how="left", on=["year", "team"], indicator="keep")

In [12]:
combine[combine.race_id == 1]

Unnamed: 0,race_id,year,round,race_name,position,order,driver,constructor,team,extra,keep
0,1,1950,1,Britain,1,1,Giuseppe FARINA,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 13m 23.6s ( 146.378 km/h ),both
1,1,1950,1,Britain,2,2,Luigi FAGIOLI,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 13m 26.2s ( +02.6s ),both
2,1,1950,1,Britain,3,3,Reg PARNELL,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 14m 15.6s ( +52.0s ),both
3,1,1950,1,Britain,4,4,Yves GIRAUD-CABANTOUS,Talbot Lago,Talbot Lago Talbot,,left_only
4,1,1950,1,Britain,5,5,Louis ROSIER,Talbot Lago,Talbot Lago Talbot,,left_only
5,1,1950,1,Britain,6,6,Bob GERARD,ERA,ERA ERA,,left_only
6,1,1950,1,Britain,7,7,Cuth HARRISON,ERA,ERA ERA,,left_only
7,1,1950,1,Britain,8,8,Philippe ETANCELIN,Talbot Lago,Talbot Lago Talbot,,left_only
8,1,1950,1,Britain,9,9,David HAMPSHIRE,Maserati,Maserati Maserati,,left_only
9,1,1950,1,Britain,10,10,Joe FRY,Maserati,Maserati Maserati,,left_only


In [14]:
results = combine[combine.keep == "both"]

In [20]:
results[results.year == 1950]

Unnamed: 0,race_id,year,round,race_name,position,order,driver,constructor,team,extra,keep
0,1,1950,1,Britain,1,1,Giuseppe FARINA,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 13m 23.6s ( 146.378 km/h ),both
1,1,1950,1,Britain,2,2,Luigi FAGIOLI,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 13m 26.2s ( +02.6s ),both
2,1,1950,1,Britain,3,3,Reg PARNELL,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 14m 15.6s ( +52.0s ),both
12,1,1950,1,Britain,ab,12,Juan Manuel FANGIO,Alfa Romeo,Alfa Romeo Alfa Romeo,Oil line,both
25,2,1950,2,Monaco,1,1,Juan Manuel FANGIO,Alfa Romeo,Alfa Romeo Alfa Romeo,3h 13m 18.7s ( 98.701 km/h ),both
35,2,1950,2,Monaco,ab,11,Luigi FAGIOLI,Alfa Romeo,Alfa Romeo Alfa Romeo,Pile-up,both
36,2,1950,2,Monaco,ab,12,Giuseppe FARINA,Alfa Romeo,Alfa Romeo Alfa Romeo,Pile-up,both
77,3,1950,3,Indianapolis,&,25,Fred AGABASHIAN,Alfa Romeo,Alfa Romeo Alfa Romeo,Oil line,both
94,3,1950,3,Indianapolis,nq,41,Johnny MAURO,Alfa Romeo,Alfa Romeo Alfa Romeo,,both
127,4,1950,4,Switzerland,1,1,Giuseppe FARINA,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 02m 53.7s ( 149.279 km/h ),both


And those are all the result records for Alfa Romeo in 1950.

Now we can drop the `keep` column, save a copy of this data, and start doing the three analyses.

In [21]:
results = results.drop(columns=["keep"])

In [22]:
results.to_csv("../data/output/race_results_winners.csv", index=False)

---

## Method 01: Wins

Let's compare championship seasons by how many wins each team got in their season.


In [23]:
wins = results[results.position == "1"]

Just noticed that **the statsf1 data has the same alberto ascari error as the ergast data**...

**I know that Ergast fixed their so I'll go back to that**