# Analyzing F1 results

The question I'm trying to answer with my visualization project is: **"Who had the best championship season?"** 
To narrow things, I'm looking at constructors in the year they won a championship. Once I get an idea of that, I'll look a number of years before and after they won the championship to gauge their championship performance.

To answer this, I'm looking at:

1. Wins in the season
1. Overall Podiums in the season
1. One-Two finishes

Of the three, the One-Two finishes give the best idea of performance because they take into account the performance of the cars, the drivers, and the team at a race.

---

In [1]:
import pandas as pd
import numpy as np

## IDEA 1: Wins in the season

I think this is the roughest way to look at the results. In this one I'm looking at the cases where `positionOrder == 1`.

In [2]:
results = pd.read_csv("../data/working/master_results.csv")

In [3]:
results.head()

Unnamed: 0,raceId,year,round,prixName,constructor,constructorRef,forename,surname,grid,position,positionText,positionOrder,points,status
0,833,1950,1,British Grand Prix,Alfa Romeo,alfa,Nino,Farina,1,1.0,1,1,9.0,Finished
1,833,1950,1,British Grand Prix,Alfa Romeo,alfa,Luigi,Fagioli,2,2.0,2,2,6.0,Finished
2,833,1950,1,British Grand Prix,Alfa Romeo,alfa,Reg,Parnell,4,3.0,3,3,4.0,Finished
3,833,1950,1,British Grand Prix,Talbot-Lago,lago,Yves,Cabantous,6,4.0,4,4,3.0,+2 Laps
4,833,1950,1,British Grand Prix,Talbot-Lago,lago,Louis,Rosier,9,5.0,5,5,2.0,+2 Laps


First let's check that we have the right number of races.

In [4]:
races1 = results.groupby(["year","round"])

In [5]:
len(races1)

1004

In [6]:
races2 = results.groupby("raceId")

In [7]:
len(races2)

1004

Counting them two different ways we do end up with the total number of races that have happened in F1.

In [8]:
wins = results[(results.positionOrder == 1)]

In [9]:
wins.head()

Unnamed: 0,raceId,year,round,prixName,constructor,constructorRef,forename,surname,grid,position,positionText,positionOrder,points,status
0,833,1950,1,British Grand Prix,Alfa Romeo,alfa,Nino,Farina,1,1.0,1,1,9.0,Finished
23,834,1950,2,Monaco Grand Prix,Alfa Romeo,alfa,Juan,Fangio,1,1.0,1,1,9.0,Finished
44,835,1950,3,Indianapolis 500,Kurtis Kraft,kurtis_kraft,Johnnie,Parsons,5,1.0,1,1,9.0,Finished
79,836,1950,4,Swiss Grand Prix,Alfa Romeo,alfa,Nino,Farina,2,1.0,1,1,9.0,Finished
97,837,1950,5,Belgian Grand Prix,Alfa Romeo,alfa,Juan,Fangio,2,1.0,1,1,8.0,Finished


In [10]:
wins.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1007 entries, 0 to 24320
Data columns (total 14 columns):
raceId            1007 non-null int64
year              1007 non-null int64
round             1007 non-null int64
prixName          1007 non-null object
constructor       1007 non-null object
constructorRef    1007 non-null object
forename          1007 non-null object
surname           1007 non-null object
grid              1007 non-null int64
position          1007 non-null float64
positionText      1007 non-null object
positionOrder     1007 non-null int64
points            1007 non-null float64
status            1007 non-null object
dtypes: float64(2), int64(5), object(7)
memory usage: 118.0+ KB


There is a discrepancy where we have 1007 winners and only 1004 races.

---

## Finding: Some early races had two winners

Exploring the above discrepancy between number of winners and number of races. Let's get some "full prix names" by combining `year` and `prixName` columns and seeing which instances are listed twice.

In [11]:
fullPrix = wins["year"].map(str) + " " + wins["prixName"]

In [12]:
fullPrix.value_counts() > 1 

1951 French Grand Prix            True
1957 British Grand Prix           True
1956 Argentine Grand Prix         True
1997 Italian Grand Prix          False
1971 French Grand Prix           False
1975 British Grand Prix          False
1975 Brazilian Grand Prix        False
2019 Monaco Grand Prix           False
1987 Australian Grand Prix       False
1984 Monaco Grand Prix           False
1956 Indianapolis 500            False
1970 Monaco Grand Prix           False
2017 Canadian Grand Prix         False
1969 Mexican Grand Prix          False
2005 Bahrain Grand Prix          False
1977 Belgian Grand Prix          False
2015 Spanish Grand Prix          False
2016 Russian Grand Prix          False
2005 Belgian Grand Prix          False
1960 Portuguese Grand Prix       False
2004 Malaysian Grand Prix        False
1979 Spanish Grand Prix          False
2013 Canadian Grand Prix         False
1964 French Grand Prix           False
2008 Japanese Grand Prix         False
1968 British Grand Prix  

Let's look more closely at these three races:

* 1951 French Grand Prix
* 1956 Argentine Grand Prix
* 1957 British Grand Prix

In [13]:
frenchGP51 = wins[(wins.year == 1951) & (wins.prixName == "French Grand Prix")]

In [14]:
frenchGP51.head()

Unnamed: 0,raceId,year,round,prixName,constructor,constructorRef,forename,surname,grid,position,positionText,positionOrder,points,status
228,828,1951,4,French Grand Prix,Alfa Romeo,alfa,Juan,Fangio,7,1.0,1,1,5.0,Finished
229,828,1951,4,French Grand Prix,Alfa Romeo,alfa,Luigi,Fagioli,7,1.0,1,1,4.0,Finished


Looking at the [wikipedia page for this race](https://en.wikipedia.org/wiki/1951_French_Grand_Prix), Luigi Fagioli finished the race (40 laps) in the car that Juan Fangio had started in.

In [15]:
argentineGP56 = wins[(wins.year == 1956) & (wins.prixName == "Argentine Grand Prix")]
britishGP57 = wins[(wins.year == 1957) & (wins.prixName == "British Grand Prix")]

In [16]:
argentineGP56.head()

Unnamed: 0,raceId,year,round,prixName,constructor,constructorRef,forename,surname,grid,position,positionText,positionOrder,points,status
1210,784,1956,1,Argentine Grand Prix,Ferrari,ferrari,Luigi,Musso,3,1.0,1,1,5.0,Finished
1211,784,1956,1,Argentine Grand Prix,Ferrari,ferrari,Juan,Fangio,3,1.0,1,1,5.0,Finished


In [17]:
britishGP57.head()

Unnamed: 0,raceId,year,round,prixName,constructor,constructorRef,forename,surname,grid,position,positionText,positionOrder,points,status
1489,780,1957,5,British Grand Prix,Vanwall,vanwall,Stirling,Moss,3,1.0,1,1,5.0,Finished
1490,780,1957,5,British Grand Prix,Vanwall,vanwall,Tony,Brooks,3,1.0,1,1,4.0,Finished


In Argentina, Musso and Fangio shared a car and shared the first place points. Moss and Brooks also shared a car in the ’57 British Grand Prix. In each othese cases, the drivers split the points for first place. These are the only three races where this happened.

---

To handle these wins, let's create a slice of the wins that doesn't include driver info.

In [18]:
constructorWins = wins[["year","round","prixName", "constructor", "position"]]

In [19]:
constructorWins.head()

Unnamed: 0,year,round,prixName,constructor,position
0,1950,1,British Grand Prix,Alfa Romeo,1.0
23,1950,2,Monaco Grand Prix,Alfa Romeo,1.0
44,1950,3,Indianapolis 500,Kurtis Kraft,1.0
79,1950,4,Swiss Grand Prix,Alfa Romeo,1.0
97,1950,5,Belgian Grand Prix,Alfa Romeo,1.0


Now we can drop the duplicate rows without worry.

In [20]:
constructorWins = constructorWins.drop_duplicates()

In [21]:
constructorWins.duplicated().value_counts()

False    1004
dtype: int64

This corresponds to the number of races we have at the beginning. Now we can start grouping and summing to see who had the most wins

In [22]:
groupedWins = constructorWins.groupby(["year","constructor"]).position.sum()

In [23]:
groupedWins = groupedWins.rename("wins").reset_index().sort_values("year")

In [24]:
groupedWins.head(10)

Unnamed: 0,year,constructor,wins
0,1950,Alfa Romeo,6.0
1,1950,Kurtis Kraft,1.0
2,1951,Alfa Romeo,4.0
3,1951,Ferrari,3.0
4,1951,Kurtis Kraft,1.0
5,1952,Ferrari,7.0
6,1952,Kuzma,1.0
7,1953,Ferrari,7.0
8,1953,Kurtis Kraft,1.0
9,1953,Maserati,1.0


This `groupedWins` dataFrame is the count of wins a constructor had in a given season if they had won at least one race. For my analysis, I need to now filter this to only include the teams whose drivers won championships. I previously compiled the `championship_teams.csv` and have put it in the `data/working` folder.

In [25]:
championTeams = pd.read_csv("../data/working/championship_teams.csv")

In [26]:
championTeams.head()

Unnamed: 0,year,constructor
0,1950,Alfa Romeo
1,1951,Alfa Romeo
2,1952,Ferrari
3,1953,Ferrari
4,1954,Mercedes


In [27]:
championTeams.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 69 entries, 0 to 68
Data columns (total 2 columns):
year           69 non-null int64
constructor    69 non-null object
dtypes: int64(1), object(1)
memory usage: 1.2+ KB


Now let's check where the matches are:

In [28]:
comparison = pd.merge(groupedWins, championTeams, on=["year","constructor"], how="left", indicator="Winner")

In [29]:
comparison.head()

Unnamed: 0,year,constructor,wins,Winner
0,1950,Alfa Romeo,6.0,both
1,1950,Kurtis Kraft,1.0,left_only
2,1951,Alfa Romeo,4.0,both
3,1951,Ferrari,3.0,left_only
4,1951,Kurtis Kraft,1.0,left_only


In [30]:
championWins = comparison[comparison.Winner == "both"]

In [31]:
championWins.drop(columns=["Winner"]).head()

Unnamed: 0,year,constructor,wins
0,1950,Alfa Romeo,6.0
2,1951,Alfa Romeo,4.0
5,1952,Ferrari,7.0
7,1953,Ferrari,7.0
11,1954,Mercedes,4.0


In [32]:
championWins = championWins.drop(columns=["Winner"])

In [33]:
championWins.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 69 entries, 0 to 254
Data columns (total 3 columns):
year           69 non-null int64
constructor    69 non-null object
wins           69 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 2.2+ KB


We have the same length in championTeams and championWins so things seem to be working out. I will save this csv for plotting and further comparison.

In [34]:
championWins.to_csv("../data/processed/season_wins.csv", index=False, mode="w+")

### Normalizing Wins in season

Because the number of races in each season change, we should normalize the number of wins by the number of races in each season. 

In terms of implementing this, a function should take a row from `championWins`, take the `year`, and then find the max of rounds from the `results` dataFrame.

In [35]:
normWins = championWins

In [36]:
def normalize_wins(row):
    season = int(row.year)
    races = results[results.year == season]["round"].max()
    return (row.wins / float(races))

In [37]:
normWins["normalizedWins"] = normWins.apply(normalize_wins, axis="columns")

In [38]:
normWins.head()

Unnamed: 0,year,constructor,wins,normalizedWins
0,1950,Alfa Romeo,6.0,0.857143
2,1951,Alfa Romeo,4.0,0.5
5,1952,Ferrari,7.0,0.875
7,1953,Ferrari,7.0,0.777778
11,1954,Mercedes,4.0,0.444444


In [39]:
normWins.to_csv("../data/processed/normalized_wins.csv", index=False, mode="w+")

---

## IDEA 2: Podiums in a season

Expanding from just the wins, we can look at the number of podiums a team got in a season. The Podium is the top three places in a race, and a team that consistently gets both drivers on the podium in each race is strong. This corresponds to `positionOrder < 4`. Let's work with the slice of the master results I made with the champions.

In [40]:
results_champions = pd.read_csv("../data/working/master_champions.csv")

In [41]:
results_champions.head()

Unnamed: 0,raceId,year,round,prixName,constructor,constructorRef,forename,surname,grid,position,positionText,positionOrder,points,status
0,833,1950,1,British Grand Prix,Alfa Romeo,alfa,Nino,Farina,1,1.0,1,1,9.0,Finished
1,833,1950,1,British Grand Prix,Alfa Romeo,alfa,Luigi,Fagioli,2,2.0,2,2,6.0,Finished
2,833,1950,1,British Grand Prix,Alfa Romeo,alfa,Reg,Parnell,4,3.0,3,3,4.0,Finished
3,833,1950,1,British Grand Prix,Alfa Romeo,alfa,Juan,Fangio,3,,R,12,0.0,Oil leak
4,834,1950,2,Monaco Grand Prix,Alfa Romeo,alfa,Juan,Fangio,1,1.0,1,1,9.0,Finished


In [42]:
results_champions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2371 entries, 0 to 2370
Data columns (total 14 columns):
raceId            2371 non-null int64
year              2371 non-null int64
round             2371 non-null int64
prixName          2371 non-null object
constructor       2371 non-null object
constructorRef    2371 non-null object
forename          2371 non-null object
surname           2371 non-null object
grid              2371 non-null int64
position          1723 non-null float64
positionText      2371 non-null object
positionOrder     2371 non-null int64
points            2371 non-null float64
status            2371 non-null object
dtypes: float64(2), int64(5), object(7)
memory usage: 259.4+ KB


In [43]:
podiums = results_champions[results_champions.positionOrder < 4]

In [44]:
podiums.head()

Unnamed: 0,raceId,year,round,prixName,constructor,constructorRef,forename,surname,grid,position,positionText,positionOrder,points,status
0,833,1950,1,British Grand Prix,Alfa Romeo,alfa,Nino,Farina,1,1.0,1,1,9.0,Finished
1,833,1950,1,British Grand Prix,Alfa Romeo,alfa,Luigi,Fagioli,2,2.0,2,2,6.0,Finished
2,833,1950,1,British Grand Prix,Alfa Romeo,alfa,Reg,Parnell,4,3.0,3,3,4.0,Finished
4,834,1950,2,Monaco Grand Prix,Alfa Romeo,alfa,Juan,Fangio,1,1.0,1,1,9.0,Finished
7,836,1950,4,Swiss Grand Prix,Alfa Romeo,alfa,Nino,Farina,2,1.0,1,1,9.0,Finished


In [45]:
podiums.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1104 entries, 0 to 2369
Data columns (total 14 columns):
raceId            1104 non-null int64
year              1104 non-null int64
round             1104 non-null int64
prixName          1104 non-null object
constructor       1104 non-null object
constructorRef    1104 non-null object
forename          1104 non-null object
surname           1104 non-null object
grid              1104 non-null int64
position          1104 non-null float64
positionText      1104 non-null object
positionOrder     1104 non-null int64
points            1104 non-null float64
status            1104 non-null object
dtypes: float64(2), int64(5), object(7)
memory usage: 129.4+ KB


In [46]:
podiums[podiums.year==1951]

Unnamed: 0,raceId,year,round,prixName,constructor,constructorRef,forename,surname,grid,position,positionText,positionOrder,points,status
23,825,1951,1,Swiss Grand Prix,Alfa Romeo,alfa,Juan,Fangio,1,1.0,1,1,9.0,Finished
24,825,1951,1,Swiss Grand Prix,Alfa Romeo,alfa,Nino,Farina,2,3.0,3,3,4.0,Finished
27,827,1951,3,Belgian Grand Prix,Alfa Romeo,alfa,Nino,Farina,2,1.0,1,1,8.0,Finished
30,828,1951,4,French Grand Prix,Alfa Romeo,alfa,Juan,Fangio,7,1.0,1,1,5.0,Finished
31,828,1951,4,French Grand Prix,Alfa Romeo,alfa,Luigi,Fagioli,7,1.0,1,1,4.0,Finished
36,829,1951,5,British Grand Prix,Alfa Romeo,alfa,Juan,Fangio,2,2.0,2,2,6.0,Finished
40,830,1951,6,German Grand Prix,Alfa Romeo,alfa,Juan,Fangio,3,2.0,2,2,7.0,Finished
44,831,1951,7,Italian Grand Prix,Alfa Romeo,alfa,Nino,Farina,7,3.0,3,3,3.0,+1 Lap
45,831,1951,7,Italian Grand Prix,Alfa Romeo,alfa,Felice,Bonetto,7,3.0,3,3,2.0,+1 Lap
49,832,1951,8,Spanish Grand Prix,Alfa Romeo,alfa,Juan,Fangio,2,1.0,1,1,9.0,Finished


These are indeed the podiums! Let's get rid of the driver information so we can remove the duplicates.

In [47]:
podiums_noDriver = podiums.copy().drop(columns=["forename", "surname", "grid","points", "status"])

In [48]:
podiums_noDriver.head()

Unnamed: 0,raceId,year,round,prixName,constructor,constructorRef,position,positionText,positionOrder
0,833,1950,1,British Grand Prix,Alfa Romeo,alfa,1.0,1,1
1,833,1950,1,British Grand Prix,Alfa Romeo,alfa,2.0,2,2
2,833,1950,1,British Grand Prix,Alfa Romeo,alfa,3.0,3,3
4,834,1950,2,Monaco Grand Prix,Alfa Romeo,alfa,1.0,1,1
7,836,1950,4,Swiss Grand Prix,Alfa Romeo,alfa,1.0,1,1


In [49]:
podiums_noDriver.duplicated().value_counts()

False    1097
True        7
dtype: int64

In [50]:
mirror = podiums_noDriver.copy()

In [51]:
mirror["duplicate"] = mirror.duplicated().map({True:'Yes', False:'No'})

In [52]:
mirror[mirror.duplicate == "Yes"]

Unnamed: 0,raceId,year,round,prixName,constructor,constructorRef,position,positionText,positionOrder,duplicate
31,828,1951,4,French Grand Prix,Alfa Romeo,alfa,1.0,1,1,Yes
45,831,1951,7,Italian Grand Prix,Alfa Romeo,alfa,3.0,3,3,Yes
193,784,1956,1,Argentine Grand Prix,Ferrari,ferrari,1.0,1,1,Yes
199,785,1956,2,Monaco Grand Prix,Ferrari,ferrari,2.0,2,2,Yes
217,789,1956,6,British Grand Prix,Ferrari,ferrari,2.0,2,2,Yes
228,791,1956,8,Italian Grand Prix,Ferrari,ferrari,2.0,2,2,Yes
381,746,1960,1,Argentine Grand Prix,Cooper-Climax,cooper-climax,3.0,3,3,Yes


These are all the duplicates in the podium dataset (of the championship runs). Now we get rid of them:

In [53]:
podiums_noDriver_noDuplicate = mirror.copy()

In [54]:
podiums_noDriver_noDuplicate = podiums_noDriver_noDuplicate.drop_duplicates().drop(columns=["duplicate"])

In [55]:
podiums_noDriver_noDuplicate.head(20)

Unnamed: 0,raceId,year,round,prixName,constructor,constructorRef,position,positionText,positionOrder
0,833,1950,1,British Grand Prix,Alfa Romeo,alfa,1.0,1,1
1,833,1950,1,British Grand Prix,Alfa Romeo,alfa,2.0,2,2
2,833,1950,1,British Grand Prix,Alfa Romeo,alfa,3.0,3,3
4,834,1950,2,Monaco Grand Prix,Alfa Romeo,alfa,1.0,1,1
7,836,1950,4,Swiss Grand Prix,Alfa Romeo,alfa,1.0,1,1
8,836,1950,4,Swiss Grand Prix,Alfa Romeo,alfa,2.0,2,2
10,837,1950,5,Belgian Grand Prix,Alfa Romeo,alfa,1.0,1,1
11,837,1950,5,Belgian Grand Prix,Alfa Romeo,alfa,2.0,2,2
13,838,1950,6,French Grand Prix,Alfa Romeo,alfa,1.0,1,1
14,838,1950,6,French Grand Prix,Alfa Romeo,alfa,2.0,2,2


I will save this spreadsheet for later use to know what the podiums where for the races.

In [56]:
podiums_noDriver_noDuplicate.to_csv("../data/working/podiums_champions_clean.csv", index=False)

Now I'll work to count how many podiums each team got their winning season

In [57]:
countPodiums = podiums_noDriver_noDuplicate.copy()

In [58]:
countPodiums.head()

Unnamed: 0,raceId,year,round,prixName,constructor,constructorRef,position,positionText,positionOrder
0,833,1950,1,British Grand Prix,Alfa Romeo,alfa,1.0,1,1
1,833,1950,1,British Grand Prix,Alfa Romeo,alfa,2.0,2,2
2,833,1950,1,British Grand Prix,Alfa Romeo,alfa,3.0,3,3
4,834,1950,2,Monaco Grand Prix,Alfa Romeo,alfa,1.0,1,1
7,836,1950,4,Swiss Grand Prix,Alfa Romeo,alfa,1.0,1,1


In [59]:
groupedPodiums = countPodiums.copy().groupby(["year","constructor"])

In [60]:
groupedPodiums.constructor.count()

year  constructor  
1950  Alfa Romeo       13
1951  Alfa Romeo       11
1952  Ferrari          17
1953  Ferrari          16
1954  Mercedes          7
1955  Mercedes         10
1956  Ferrari          14
1957  Maserati         10
1958  Ferrari          14
1959  Cooper-Climax    13
1960  Cooper-Climax    15
1961  Ferrari          14
1962  BRM               8
1963  Lotus-Climax      9
1964  Ferrari          10
1965  Lotus-Climax      7
1966  Brabham-Repco     9
1967  Brabham-Repco    14
1968  Lotus-Ford        9
1969  Matra-Ford       10
1970  Team Lotus        7
1971  Tyrrell          11
1972  Team Lotus        8
1973  Tyrrell          15
1974  McLaren          10
1975  Ferrari          11
1976  McLaren          10
1977  Ferrari          16
1978  Team Lotus       14
1979  Ferrari          13
                       ..
1989  McLaren          18
1990  McLaren          18
1991  McLaren          18
1992  Williams         21
1993  Williams         22
1994  Benetton         12
1995  Benetton    

Let's check up on this... Let's look at alfa in 1950:

In [61]:
alfa1950 = countPodiums[countPodiums.year == 1950]

In [62]:
alfa1950

Unnamed: 0,raceId,year,round,prixName,constructor,constructorRef,position,positionText,positionOrder
0,833,1950,1,British Grand Prix,Alfa Romeo,alfa,1.0,1,1
1,833,1950,1,British Grand Prix,Alfa Romeo,alfa,2.0,2,2
2,833,1950,1,British Grand Prix,Alfa Romeo,alfa,3.0,3,3
4,834,1950,2,Monaco Grand Prix,Alfa Romeo,alfa,1.0,1,1
7,836,1950,4,Swiss Grand Prix,Alfa Romeo,alfa,1.0,1,1
8,836,1950,4,Swiss Grand Prix,Alfa Romeo,alfa,2.0,2,2
10,837,1950,5,Belgian Grand Prix,Alfa Romeo,alfa,1.0,1,1
11,837,1950,5,Belgian Grand Prix,Alfa Romeo,alfa,2.0,2,2
13,838,1950,6,French Grand Prix,Alfa Romeo,alfa,1.0,1,1
14,838,1950,6,French Grand Prix,Alfa Romeo,alfa,2.0,2,2


In [63]:
alfa1950.shape

(13, 9)

This checks out with the `groupedPodiums` calculation. Let's look at the `podiums` dataframe as well

In [64]:
alfa = podiums[podiums.year == 1950]

In [65]:
alfa

Unnamed: 0,raceId,year,round,prixName,constructor,constructorRef,forename,surname,grid,position,positionText,positionOrder,points,status
0,833,1950,1,British Grand Prix,Alfa Romeo,alfa,Nino,Farina,1,1.0,1,1,9.0,Finished
1,833,1950,1,British Grand Prix,Alfa Romeo,alfa,Luigi,Fagioli,2,2.0,2,2,6.0,Finished
2,833,1950,1,British Grand Prix,Alfa Romeo,alfa,Reg,Parnell,4,3.0,3,3,4.0,Finished
4,834,1950,2,Monaco Grand Prix,Alfa Romeo,alfa,Juan,Fangio,1,1.0,1,1,9.0,Finished
7,836,1950,4,Swiss Grand Prix,Alfa Romeo,alfa,Nino,Farina,2,1.0,1,1,9.0,Finished
8,836,1950,4,Swiss Grand Prix,Alfa Romeo,alfa,Luigi,Fagioli,3,2.0,2,2,6.0,Finished
10,837,1950,5,Belgian Grand Prix,Alfa Romeo,alfa,Juan,Fangio,2,1.0,1,1,8.0,Finished
11,837,1950,5,Belgian Grand Prix,Alfa Romeo,alfa,Luigi,Fagioli,3,2.0,2,2,6.0,Finished
13,838,1950,6,French Grand Prix,Alfa Romeo,alfa,Juan,Fangio,1,1.0,1,1,9.0,Finished
14,838,1950,6,French Grand Prix,Alfa Romeo,alfa,Luigi,Fagioli,3,2.0,2,2,6.0,Finished


In [66]:
alfa.shape

(13, 14)

Things work out numerically. Let me go and check the 1950 season against the Wikipedia pages for each of the Grand Prix:

* [British GP](https://en.wikipedia.org/wiki/1950_British_Grand_Prix)
* [Monaco GP](https://en.wikipedia.org/wiki/1950_Monaco_Grand_Prix)
* [Indy 500](https://en.wikipedia.org/wiki/1950_Indianapolis_500)
* [Swiss GP](https://en.wikipedia.org/wiki/1950_Swiss_Grand_Prix)
* [Belgian GP](https://en.wikipedia.org/wiki/1950_Belgian_Grand_Prix)
* [French GP](https://en.wikipedia.org/wiki/1950_French_Grand_Prix)
* [Italian GP](https://en.wikipedia.org/wiki/1950_Italian_Grand_Prix)

And I could also look at the [page on the Alfa Romeo 158](https://en.wikipedia.org/wiki/Alfa_Romeo_158/159_Alfetta) which was the car used by all Alfa Romeo drivers in 1950.  

British: 1;2;3  
Monaco: 1;  
Indy 500: --  
Swiss: 1;2  
Belgian: 1;2  
French: 1;2  
Italian: 1,3  

Well, look at that, there's an issue in the data. Ascari shows up for Alfa Romeo in 1950 when he was racing for Ferrari that year. 
I can go in and delete the row, but how do I check that all the drivers are properly attributed in the results. I think I'll need to do a comparison of `(year, constructor, forename, surname)` between the results table and my own generated list.

But first let me see if Ascari is listed as an Alfa driver in any other 1950 race:

In [67]:
ascari = results[(results.year == 1950) & (results.surname == "Ascari")]

In [68]:
ascari

Unnamed: 0,raceId,year,round,prixName,constructor,constructorRef,forename,surname,grid,position,positionText,positionOrder,points,status
24,834,1950,2,Monaco Grand Prix,Ferrari,ferrari,Alberto,Ascari,7,2.0,2,2,6.0,+1 Lap
95,836,1950,4,Swiss Grand Prix,Ferrari,ferrari,Alberto,Ascari,5,,R,17,0.0,Oil pump
101,837,1950,5,Belgian Grand Prix,Ferrari,ferrari,Alberto,Ascari,7,5.0,5,5,2.0,+1 Lap
133,839,1950,7,Italian Grand Prix,Alfa Romeo,alfa,Alberto,Ascari,6,2.0,2,2,3.0,Finished
149,839,1950,7,Italian Grand Prix,Ferrari,ferrari,Alberto,Ascari,2,,R,17,0.0,Engine


Let's see if this is widespread. Let's make a new column that has the full driver name and then group the data by year + driver name and then count instances of constructors

In [69]:
results2 = results.copy()

In [70]:
results2.head()

Unnamed: 0,raceId,year,round,prixName,constructor,constructorRef,forename,surname,grid,position,positionText,positionOrder,points,status
0,833,1950,1,British Grand Prix,Alfa Romeo,alfa,Nino,Farina,1,1.0,1,1,9.0,Finished
1,833,1950,1,British Grand Prix,Alfa Romeo,alfa,Luigi,Fagioli,2,2.0,2,2,6.0,Finished
2,833,1950,1,British Grand Prix,Alfa Romeo,alfa,Reg,Parnell,4,3.0,3,3,4.0,Finished
3,833,1950,1,British Grand Prix,Talbot-Lago,lago,Yves,Cabantous,6,4.0,4,4,3.0,+2 Laps
4,833,1950,1,British Grand Prix,Talbot-Lago,lago,Louis,Rosier,9,5.0,5,5,2.0,+2 Laps


In [71]:
results2["driver"] = results2["forename"] + " " + results2["surname"]

In [72]:
results2.head()

Unnamed: 0,raceId,year,round,prixName,constructor,constructorRef,forename,surname,grid,position,positionText,positionOrder,points,status,driver
0,833,1950,1,British Grand Prix,Alfa Romeo,alfa,Nino,Farina,1,1.0,1,1,9.0,Finished,Nino Farina
1,833,1950,1,British Grand Prix,Alfa Romeo,alfa,Luigi,Fagioli,2,2.0,2,2,6.0,Finished,Luigi Fagioli
2,833,1950,1,British Grand Prix,Alfa Romeo,alfa,Reg,Parnell,4,3.0,3,3,4.0,Finished,Reg Parnell
3,833,1950,1,British Grand Prix,Talbot-Lago,lago,Yves,Cabantous,6,4.0,4,4,3.0,+2 Laps,Yves Cabantous
4,833,1950,1,British Grand Prix,Talbot-Lago,lago,Louis,Rosier,9,5.0,5,5,2.0,+2 Laps,Louis Rosier


In [73]:
driverConstructorCounts = results2.groupby(["year", "driver","constructor"]).constructor.count()

In [74]:
driverConstructorCounts

year  driver              constructor 
1950  Alberto Ascari      Alfa Romeo       1
                          Ferrari          4
      Alfredo Pián        Maserati         1
      Bayliss Levrett     Adams            1
      Bill Cantrell       Adams            1
      Bill Holland        Deidt            1
      Bill Schindler      Snowberger       1
      Bob Gerard          ERA              2
      Brian Shawe Taylor  Maserati         1
      Cecil Green         Kurtis Kraft     1
      Charles Pozzi       Talbot-Lago      1
      Clemente Biondetti  Ferrari          1
      Consalvo Sanesi     Alfa Romeo       1
      Cuth Harrison       ERA              3
      David Hampshire     Maserati         2
      David Murray        Maserati         2
      Dick Rathmann       Watson           1
      Dorino Serafini     Ferrari          1
      Duane Carter        Stevens          1
      Duke Dinsmore       Kurtis Kraft     1
      Eugène Chaboud      Talbot-Lago      2
      Eugène Mar

In [75]:
driverCounts = results2.groupby(["year","constructor","driver"]).constructor.count()

In [76]:
driverCounts 

year  constructor   driver            
1950  Adams         Bayliss Levrett        1
                    Bill Cantrell          1
      Alfa Romeo    Alberto Ascari         1
                    Consalvo Sanesi        1
                    Juan Fangio            7
                    Luigi Fagioli          6
                    Nino Farina            6
                    Piero Taruffi          1
                    Reg Parnell            1
      Alta          Geoff Crossley         2
                    Joe Kelly              1
      Cooper        Harry Schell           1
      Deidt         Bill Holland           1
                    Mauri Rose             1
                    Tony Bettenhausen      1
      ERA           Bob Gerard             2
                    Cuth Harrison          3
                    Leslie Johnson         1
                    Peter Walker           1
                    Tony Rolt              1
      Ewing         Jimmy Davies           1
      Ferrari   

Of the two, the second one seems to make more sense, so let's save that for talking about tomorrow:

In [77]:
driverCounts = driverCounts.to_frame(name="counts").reset_index()

In [78]:
driverCounts.head()

Unnamed: 0,year,constructor,driver,counts
0,1950,Adams,Bayliss Levrett,1
1,1950,Adams,Bill Cantrell,1
2,1950,Alfa Romeo,Alberto Ascari,1
3,1950,Alfa Romeo,Consalvo Sanesi,1
4,1950,Alfa Romeo,Juan Fangio,7


In [79]:
driverCounts.to_csv("../data/working/constructor-driver_count.csv", index=False)