# Looking for the best Formula 1 season

For my master's project, I'm making a piece about answering the question: **What championship winning team had the best Formula 1 season?**

To do this I was working with data provided by the [Ergast Developer API](https://ergast.com/mrd/). I noticed an error in the driver-constructor pairing for the 1950 season and wanted to verify things without moving forward. I was originally going to create a table of the driver-constructor pairs for each race, and then compare it with the data I had.

Instead I've chosen to go straight to the source for F1 information ([formula1.com](https://formula1.com)) and scrape the race results for each race. I did this scraping on 2019-06-21 and 2019-06-22 and I'll be now working with that data to do my analysis.

Because it is data from a primary-source, I have some more confidence in it.

In [1]:
import pandas as pd
import numpy as np

The first thing to do is to import the data

In [2]:
race_results = pd.read_csv("../formula1-data/results_all.csv")

In [3]:
race_results.head()

Unnamed: 0,raceId,year,raceRound,date,prix,driverFirstName,driverLastName,driverCode,constructor,finishingPosition,positionOrder,laps,time,points
0,1,1950,1,13 May 1950,Great Britain,Nino,Farina,FAR,Alfa Romeo,1,1,70.0,2:13:23.600,9.0
1,1,1950,1,13 May 1950,Great Britain,Luigi,Fagioli,FAG,Alfa Romeo,2,2,70.0,+2.600s,6.0
2,1,1950,1,13 May 1950,Great Britain,Reg,Parnell,PAR,Alfa Romeo,3,3,70.0,+52.000s,4.0
3,1,1950,1,13 May 1950,Great Britain,Yves Giraud,Cabantous,CAB,Talbot-Lago,4,4,68.0,+2 laps,3.0
4,1,1950,1,13 May 1950,Great Britain,Louis,Rosier,ROS,Talbot-Lago,5,5,68.0,+2 laps,2.0


In [4]:
race_results.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22598 entries, 0 to 22597
Data columns (total 14 columns):
raceId               22598 non-null int64
year                 22598 non-null int64
raceRound            22598 non-null int64
date                 22598 non-null object
prix                 22598 non-null object
driverFirstName      22598 non-null object
driverLastName       22598 non-null object
driverCode           22598 non-null object
constructor          22572 non-null object
finishingPosition    22598 non-null object
positionOrder        22598 non-null int64
laps                 22365 non-null float64
time                 22590 non-null object
points               22598 non-null float64
dtypes: float64(2), int64(4), object(8)
memory usage: 2.4+ MB


Let's also check how many races we have:

In [6]:
race_results.raceId.max()

1007

Things seem to be in good order. For most of the work, I won't particularly care for a few of the columns, namely:

* laps
* points
* driverCode
* date

So, let's drop those:

In [20]:
results = race_results.copy().drop(columns=["date", "laps", "points", "driverCode"])
results = results[results.year < 2019]

In [21]:
results.head()

Unnamed: 0,raceId,year,raceRound,prix,driverFirstName,driverLastName,constructor,finishingPosition,positionOrder,time
0,1,1950,1,Great Britain,Nino,Farina,Alfa Romeo,1,1,2:13:23.600
1,1,1950,1,Great Britain,Luigi,Fagioli,Alfa Romeo,2,2,+2.600s
2,1,1950,1,Great Britain,Reg,Parnell,Alfa Romeo,3,3,+52.000s
3,1,1950,1,Great Britain,Yves Giraud,Cabantous,Talbot-Lago,4,4,+2 laps
4,1,1950,1,Great Britain,Louis,Rosier,Talbot-Lago,5,5,+2 laps


For most of my analysis, I'm looking only at the teams that won championships, so let's slice the results table and keep only the different championship runs.

In [24]:
teams = pd.read_csv("../formula1-data/championship_teams.csv")

In [26]:
teams.head()

Unnamed: 0,year,constructor
0,1950,Alfa Romeo
1,1951,Alfa Romeo
2,1952,Ferrari
3,1953,Ferrari
4,1954,Mercedes


In [28]:
results.constructor.unique()

array(['Alfa Romeo', 'Talbot-Lago', 'ERA', 'Maserati', 'Alta', 'Ferrari',
       'Simca-Gordini', 'Cooper JAP', 'Kurtis Kraft Offenhauser',
       'Deidt Offenhauser', 'Moore Offenhauser', 'Lesovsky Offenhauser',
       'Nichels Offenhauser', 'Marchese Offenhauser',
       'Stevens Offenhauser', 'Langley Offenhauser', 'Ewing Offenhauser',
       'Maserati Offenhauser', 'Rae Offenhauser', 'Olson Offenhauser',
       'Wetteroth Offerhauser', 'Snowberger Offenhauser',
       'Adams Offenhauser', 'Kurtis Kraft Cummins', 'Watson Offenhauser',
       'Maserati Milano', 'Ferrari Jaguar', 'HWM Alta', 'Veritas',
       'Sherman Offenhauser', 'Schroeder Offenhauser',
       'Kurtis Kraft Novi', 'Kuzma Offenhauser', 'Pawl Offenhauser',
       'Hall Offenhauser', 'Bromme Offenhauser', 'Trevis Offenhauser',
       'Maserati-Offenhauser', 'Thin Wall Ferrari', 'BRM', 'OSCA',
       'Maserati OSCA', 'Gordini', 'Frazer Nash', 'Cooper Bristol',
       'Maserati Plate', 'AFM Kuchen', 'Aston Butterworth',