# Looking for 1-2 Finishes

Let's look at the F1 race data and try to find the 1-2 Finishes in the sport. I have two ideas for doing this:

1. Looking for first and second place finishes for a team at each race. 
1. Look at the average race finish for each team at each race. a 1-2 finish gives an average of 1.5. Or, in other words, the smaller the average, the better the team did in that race.


## 1. Looking for all the first and second place finishes

In [1]:
import pandas as pd
import numpy as np

In [2]:
races = pd.read_csv("../data/f1db_results.csv")

In [3]:
races.head()

Unnamed: 0,raceId2,prixName,year,round,prixDate,constructorName,driverName,grid,positionText,positionOrder,points,status
0,1,British Grand Prix,1950,1,1950-05-13,Alfa Romeo,Nino Farina,1,1,1,9.0,Finished
1,1,British Grand Prix,1950,1,1950-05-13,Alfa Romeo,Luigi Fagioli,2,2,2,6.0,Finished
2,1,British Grand Prix,1950,1,1950-05-13,Alfa Romeo,Reg Parnell,4,3,3,4.0,Finished
3,1,British Grand Prix,1950,1,1950-05-13,Talbot-Lago,Yves Cabantous,6,4,4,3.0,+2 Laps
4,1,British Grand Prix,1950,1,1950-05-13,Talbot-Lago,Louis Rosier,9,5,5,2.0,+2 Laps


In [4]:
races.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24277 entries, 0 to 24276
Data columns (total 12 columns):
raceId2            24277 non-null int64
prixName           24277 non-null object
year               24277 non-null int64
round              24277 non-null int64
prixDate           24277 non-null object
constructorName    24277 non-null object
driverName         24277 non-null object
grid               24277 non-null int64
positionText       24277 non-null object
positionOrder      24277 non-null int64
points             24277 non-null float64
status             24277 non-null object
dtypes: float64(1), int64(5), object(6)
memory usage: 2.2+ MB


In [5]:
one_two = races[races.positionOrder <= 2]

In [6]:
one_two.head(10000)

Unnamed: 0,raceId2,prixName,year,round,prixDate,constructorName,driverName,grid,positionText,positionOrder,points,status
0,1,British Grand Prix,1950,1,1950-05-13,Alfa Romeo,Nino Farina,1,1,1,9.0,Finished
1,1,British Grand Prix,1950,1,1950-05-13,Alfa Romeo,Luigi Fagioli,2,2,2,6.0,Finished
23,2,Monaco Grand Prix,1950,2,1950-05-21,Alfa Romeo,Juan Fangio,1,1,1,9.0,Finished
24,2,Monaco Grand Prix,1950,2,1950-05-21,Ferrari,Alberto Ascari,7,2,2,6.0,+1 Lap
44,3,Indianapolis 500,1950,3,1950-05-30,Kurtis Kraft,Johnnie Parsons,5,1,1,9.0,Finished
45,3,Indianapolis 500,1950,3,1950-05-30,Deidt,Bill Holland,10,2,2,6.0,+1 Lap
79,4,Swiss Grand Prix,1950,4,1950-06-04,Alfa Romeo,Nino Farina,2,1,1,9.0,Finished
80,4,Swiss Grand Prix,1950,4,1950-06-04,Alfa Romeo,Luigi Fagioli,3,2,2,6.0,Finished
97,5,Belgian Grand Prix,1950,5,1950-06-18,Alfa Romeo,Juan Fangio,2,1,1,8.0,Finished
98,5,Belgian Grand Prix,1950,5,1950-06-18,Alfa Romeo,Luigi Fagioli,3,2,2,6.0,Finished


In [7]:
groups = one_two.groupby("raceId2")

In [8]:
range(1002)[-1]

1001

In [9]:
groups

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x1145daf28>

In [10]:
group = groups.get_group(1).reset_index()
group

Unnamed: 0,index,raceId2,prixName,year,round,prixDate,constructorName,driverName,grid,positionText,positionOrder,points,status
0,0,1,British Grand Prix,1950,1,1950-05-13,Alfa Romeo,Nino Farina,1,1,1,9.0,Finished
1,1,1,British Grand Prix,1950,1,1950-05-13,Alfa Romeo,Luigi Fagioli,2,2,2,6.0,Finished


In [11]:
group.columns

Index(['index', 'raceId2', 'prixName', 'year', 'round', 'prixDate',
       'constructorName', 'driverName', 'grid', 'positionText',
       'positionOrder', 'points', 'status'],
      dtype='object')

In [12]:
len(group)

2

In [13]:
group.loc[0].constructorName

'Alfa Romeo'

In [14]:
first = group.loc[0]
second = group.loc[1]

first.constructorName == second.constructorName

True

Let's turn all of this into a function.or at least a more cohesive script

In [15]:
temp_df = pd.DataFrame()
groups = one_two.groupby("raceId2")
for i in range(1,1002):
    group = groups.get_group(i).reset_index()
    first = group.loc[0]
    second = group.loc[1]
    if (first.constructorName == second.constructorName):
        print(i, first.constructorName == second.constructorName)
        temp_df = temp_df.append(first)

1 True
4 True
5 True
6 True
7 True
9 True
11 True
14 True
16 True
18 True
19 True
20 True
21 True
22 True
24 True
25 True
26 True
27 True
31 True
36 True
37 True
44 True
45 True
46 True
47 True
48 True
49 True
52 True
53 True
54 True
57 True
59 True
61 True
68 True
71 True
77 True
79 True
81 True
82 True
84 True
87 True
89 True
90 True
92 True
93 True
94 True
96 True
97 True
99 True
109 True
112 True
119 True
122 True
139 True
145 True
148 True
155 True
157 True
158 True
160 True
162 True
171 True
178 True
182 True
193 True
195 True
197 True
202 True
204 True
216 True
220 True
225 True
230 True
231 True
233 True
239 True
242 True
243 True
250 True
267 True
269 True
271 True
303 True
304 True
306 True
310 True
311 True
315 True
316 True
317 True
323 True
326 True
341 True
342 True
343 True
344 True
361 True
365 True
368 True
375 True
385 True
390 True
399 True
401 True
404 True
409 True
415 True
419 True
424 True
429 True
433 True
439 True
442 True
443 True
446 True
450 True
452 True
45

In [16]:
temp_df

Unnamed: 0,constructorName,driverName,grid,index,points,positionOrder,positionText,prixDate,prixName,raceId2,round,status,year
0,Alfa Romeo,Nino Farina,1.0,0.0,9.00,1.0,1,1950-05-13,British Grand Prix,1.0,1.0,Finished,1950.0
0,Alfa Romeo,Nino Farina,2.0,79.0,9.00,1.0,1,1950-06-04,Swiss Grand Prix,4.0,4.0,Finished,1950.0
0,Alfa Romeo,Juan Fangio,2.0,97.0,8.00,1.0,1,1950-06-18,Belgian Grand Prix,5.0,5.0,Finished,1950.0
0,Alfa Romeo,Juan Fangio,1.0,111.0,9.00,1.0,1,1950-07-02,French Grand Prix,6.0,6.0,Finished,1950.0
0,Alfa Romeo,Nino Farina,3.0,131.0,8.00,1.0,1,1950-09-03,Italian Grand Prix,7.0,7.0,Finished,1950.0
0,Kurtis Kraft,Lee Wallard,2.0,181.0,9.00,1.0,1,1951-05-30,Indianapolis 500,9.0,2.0,Finished,1951.0
0,Alfa Romeo,Luigi Fagioli,7.0,228.0,4.00,1.0,1,1951-07-01,French Grand Prix,11.0,4.0,Finished,1951.0
0,Ferrari,Alberto Ascari,3.0,296.0,8.00,1.0,1,1951-09-16,Italian Grand Prix,14.0,7.0,Finished,1951.0
0,Ferrari,Piero Taruffi,2.0,339.0,9.00,1.0,1,1952-05-18,Swiss Grand Prix,16.0,1.0,Finished,1952.0
0,Ferrari,Alberto Ascari,1.0,394.0,9.00,1.0,1,1952-06-22,Belgian Grand Prix,18.0,3.0,Finished,1952.0


In [17]:
temp_df.head()

Unnamed: 0,constructorName,driverName,grid,index,points,positionOrder,positionText,prixDate,prixName,raceId2,round,status,year
0,Alfa Romeo,Nino Farina,1.0,0.0,9.0,1.0,1,1950-05-13,British Grand Prix,1.0,1.0,Finished,1950.0
0,Alfa Romeo,Nino Farina,2.0,79.0,9.0,1.0,1,1950-06-04,Swiss Grand Prix,4.0,4.0,Finished,1950.0
0,Alfa Romeo,Juan Fangio,2.0,97.0,8.0,1.0,1,1950-06-18,Belgian Grand Prix,5.0,5.0,Finished,1950.0
0,Alfa Romeo,Juan Fangio,1.0,111.0,9.0,1.0,1,1950-07-02,French Grand Prix,6.0,6.0,Finished,1950.0
0,Alfa Romeo,Nino Farina,3.0,131.0,8.0,1.0,1,1950-09-03,Italian Grand Prix,7.0,7.0,Finished,1950.0


In [18]:
onetwos = temp_df[["raceId2", "year", "prixName", "prixDate", "round", "constructorName", "driverName", "grid", "positionOrder", "positionText", "points", "status"]]
onetwos.head()

Unnamed: 0,raceId2,year,prixName,prixDate,round,constructorName,driverName,grid,positionOrder,positionText,points,status
0,1.0,1950.0,British Grand Prix,1950-05-13,1.0,Alfa Romeo,Nino Farina,1.0,1.0,1,9.0,Finished
0,4.0,1950.0,Swiss Grand Prix,1950-06-04,4.0,Alfa Romeo,Nino Farina,2.0,1.0,1,9.0,Finished
0,5.0,1950.0,Belgian Grand Prix,1950-06-18,5.0,Alfa Romeo,Juan Fangio,2.0,1.0,1,8.0,Finished
0,6.0,1950.0,French Grand Prix,1950-07-02,6.0,Alfa Romeo,Juan Fangio,1.0,1.0,1,9.0,Finished
0,7.0,1950.0,Italian Grand Prix,1950-09-03,7.0,Alfa Romeo,Nino Farina,3.0,1.0,1,8.0,Finished


In [19]:
a = onetwos

In [20]:
a

Unnamed: 0,raceId2,year,prixName,prixDate,round,constructorName,driverName,grid,positionOrder,positionText,points,status
0,1.0,1950.0,British Grand Prix,1950-05-13,1.0,Alfa Romeo,Nino Farina,1.0,1.0,1,9.00,Finished
0,4.0,1950.0,Swiss Grand Prix,1950-06-04,4.0,Alfa Romeo,Nino Farina,2.0,1.0,1,9.00,Finished
0,5.0,1950.0,Belgian Grand Prix,1950-06-18,5.0,Alfa Romeo,Juan Fangio,2.0,1.0,1,8.00,Finished
0,6.0,1950.0,French Grand Prix,1950-07-02,6.0,Alfa Romeo,Juan Fangio,1.0,1.0,1,9.00,Finished
0,7.0,1950.0,Italian Grand Prix,1950-09-03,7.0,Alfa Romeo,Nino Farina,3.0,1.0,1,8.00,Finished
0,9.0,1951.0,Indianapolis 500,1951-05-30,2.0,Kurtis Kraft,Lee Wallard,2.0,1.0,1,9.00,Finished
0,11.0,1951.0,French Grand Prix,1951-07-01,4.0,Alfa Romeo,Luigi Fagioli,7.0,1.0,1,4.00,Finished
0,14.0,1951.0,Italian Grand Prix,1951-09-16,7.0,Ferrari,Alberto Ascari,3.0,1.0,1,8.00,Finished
0,16.0,1952.0,Swiss Grand Prix,1952-05-18,1.0,Ferrari,Piero Taruffi,2.0,1.0,1,9.00,Finished
0,18.0,1952.0,Belgian Grand Prix,1952-06-22,3.0,Ferrari,Alberto Ascari,1.0,1.0,1,9.00,Finished


In [21]:
a["raceId2"].astype(int).head()

0    1
0    4
0    5
0    6
0    7
Name: raceId2, dtype: int64

In [22]:
a.head(20)

Unnamed: 0,raceId2,year,prixName,prixDate,round,constructorName,driverName,grid,positionOrder,positionText,points,status
0,1.0,1950.0,British Grand Prix,1950-05-13,1.0,Alfa Romeo,Nino Farina,1.0,1.0,1,9.0,Finished
0,4.0,1950.0,Swiss Grand Prix,1950-06-04,4.0,Alfa Romeo,Nino Farina,2.0,1.0,1,9.0,Finished
0,5.0,1950.0,Belgian Grand Prix,1950-06-18,5.0,Alfa Romeo,Juan Fangio,2.0,1.0,1,8.0,Finished
0,6.0,1950.0,French Grand Prix,1950-07-02,6.0,Alfa Romeo,Juan Fangio,1.0,1.0,1,9.0,Finished
0,7.0,1950.0,Italian Grand Prix,1950-09-03,7.0,Alfa Romeo,Nino Farina,3.0,1.0,1,8.0,Finished
0,9.0,1951.0,Indianapolis 500,1951-05-30,2.0,Kurtis Kraft,Lee Wallard,2.0,1.0,1,9.0,Finished
0,11.0,1951.0,French Grand Prix,1951-07-01,4.0,Alfa Romeo,Luigi Fagioli,7.0,1.0,1,4.0,Finished
0,14.0,1951.0,Italian Grand Prix,1951-09-16,7.0,Ferrari,Alberto Ascari,3.0,1.0,1,8.0,Finished
0,16.0,1952.0,Swiss Grand Prix,1952-05-18,1.0,Ferrari,Piero Taruffi,2.0,1.0,1,9.0,Finished
0,18.0,1952.0,Belgian Grand Prix,1952-06-22,3.0,Ferrari,Alberto Ascari,1.0,1.0,1,9.0,Finished


In [23]:
onetwos[["raceId2", "year", "round", "grid", "positionOrder", "points"]].astype(int).head()

Unnamed: 0,raceId2,year,round,grid,positionOrder,points
0,1,1950,1,1,1,9
0,4,1950,4,2,1,9
0,5,1950,5,2,1,8
0,6,1950,6,1,1,9
0,7,1950,7,3,1,8


In [24]:
onetwos.head(30)

Unnamed: 0,raceId2,year,prixName,prixDate,round,constructorName,driverName,grid,positionOrder,positionText,points,status
0,1.0,1950.0,British Grand Prix,1950-05-13,1.0,Alfa Romeo,Nino Farina,1.0,1.0,1,9.0,Finished
0,4.0,1950.0,Swiss Grand Prix,1950-06-04,4.0,Alfa Romeo,Nino Farina,2.0,1.0,1,9.0,Finished
0,5.0,1950.0,Belgian Grand Prix,1950-06-18,5.0,Alfa Romeo,Juan Fangio,2.0,1.0,1,8.0,Finished
0,6.0,1950.0,French Grand Prix,1950-07-02,6.0,Alfa Romeo,Juan Fangio,1.0,1.0,1,9.0,Finished
0,7.0,1950.0,Italian Grand Prix,1950-09-03,7.0,Alfa Romeo,Nino Farina,3.0,1.0,1,8.0,Finished
0,9.0,1951.0,Indianapolis 500,1951-05-30,2.0,Kurtis Kraft,Lee Wallard,2.0,1.0,1,9.0,Finished
0,11.0,1951.0,French Grand Prix,1951-07-01,4.0,Alfa Romeo,Luigi Fagioli,7.0,1.0,1,4.0,Finished
0,14.0,1951.0,Italian Grand Prix,1951-09-16,7.0,Ferrari,Alberto Ascari,3.0,1.0,1,8.0,Finished
0,16.0,1952.0,Swiss Grand Prix,1952-05-18,1.0,Ferrari,Piero Taruffi,2.0,1.0,1,9.0,Finished
0,18.0,1952.0,Belgian Grand Prix,1952-06-22,3.0,Ferrari,Alberto Ascari,1.0,1.0,1,9.0,Finished


Let's go ahead and get rid of that old index

In [25]:
onetwos.reset_index()

Unnamed: 0,index,raceId2,year,prixName,prixDate,round,constructorName,driverName,grid,positionOrder,positionText,points,status
0,0,1.0,1950.0,British Grand Prix,1950-05-13,1.0,Alfa Romeo,Nino Farina,1.0,1.0,1,9.00,Finished
1,0,4.0,1950.0,Swiss Grand Prix,1950-06-04,4.0,Alfa Romeo,Nino Farina,2.0,1.0,1,9.00,Finished
2,0,5.0,1950.0,Belgian Grand Prix,1950-06-18,5.0,Alfa Romeo,Juan Fangio,2.0,1.0,1,8.00,Finished
3,0,6.0,1950.0,French Grand Prix,1950-07-02,6.0,Alfa Romeo,Juan Fangio,1.0,1.0,1,9.00,Finished
4,0,7.0,1950.0,Italian Grand Prix,1950-09-03,7.0,Alfa Romeo,Nino Farina,3.0,1.0,1,8.00,Finished
5,0,9.0,1951.0,Indianapolis 500,1951-05-30,2.0,Kurtis Kraft,Lee Wallard,2.0,1.0,1,9.00,Finished
6,0,11.0,1951.0,French Grand Prix,1951-07-01,4.0,Alfa Romeo,Luigi Fagioli,7.0,1.0,1,4.00,Finished
7,0,14.0,1951.0,Italian Grand Prix,1951-09-16,7.0,Ferrari,Alberto Ascari,3.0,1.0,1,8.00,Finished
8,0,16.0,1952.0,Swiss Grand Prix,1952-05-18,1.0,Ferrari,Piero Taruffi,2.0,1.0,1,9.00,Finished
9,0,18.0,1952.0,Belgian Grand Prix,1952-06-22,3.0,Ferrari,Alberto Ascari,1.0,1.0,1,9.00,Finished


In [26]:
onetwoslist = onetwos.reset_index()

In [27]:
onetwoslist.head()

Unnamed: 0,index,raceId2,year,prixName,prixDate,round,constructorName,driverName,grid,positionOrder,positionText,points,status
0,0,1.0,1950.0,British Grand Prix,1950-05-13,1.0,Alfa Romeo,Nino Farina,1.0,1.0,1,9.0,Finished
1,0,4.0,1950.0,Swiss Grand Prix,1950-06-04,4.0,Alfa Romeo,Nino Farina,2.0,1.0,1,9.0,Finished
2,0,5.0,1950.0,Belgian Grand Prix,1950-06-18,5.0,Alfa Romeo,Juan Fangio,2.0,1.0,1,8.0,Finished
3,0,6.0,1950.0,French Grand Prix,1950-07-02,6.0,Alfa Romeo,Juan Fangio,1.0,1.0,1,9.0,Finished
4,0,7.0,1950.0,Italian Grand Prix,1950-09-03,7.0,Alfa Romeo,Nino Farina,3.0,1.0,1,8.0,Finished


In [28]:
onetwoslist.columns

Index(['index', 'raceId2', 'year', 'prixName', 'prixDate', 'round',
       'constructorName', 'driverName', 'grid', 'positionOrder',
       'positionText', 'points', 'status'],
      dtype='object')

In [29]:
onetwoslist[["raceId2", "year", "prixName", "prixDate", "round", "constructorName", "driverName"]].to_csv("../data/onetwofinishes.csv", index=False)

We'll use this list to verify things later. 

Now we'll move to the race averages. 

## 2. Average finishing position for each race.

The other way I was thinking of calculating 1-2 finishes and comparing constructors was to look at the average race finish. It's a bit messier before each constructor only had 2 driver, but the idea is to average the finishing positions of each team's driver to compare performance. A one-two finish would correlate to an average of 1.5 which is the dream. Other good finishes:

* 1-3, avg: 2
* 2-3, avg: 2.5

I will start by understanding `positionText` and how that translates to `positionOrder`. I will try to find my race averages with the positionOrder and then try two or three ideas depending on how easy they are to implement: Find the overall average retirement position and use that as the Retirement order across the board; find the average retirement position per year and apply it as the retiremente order for that year

In [30]:
races.head()

Unnamed: 0,raceId2,prixName,year,round,prixDate,constructorName,driverName,grid,positionText,positionOrder,points,status
0,1,British Grand Prix,1950,1,1950-05-13,Alfa Romeo,Nino Farina,1,1,1,9.0,Finished
1,1,British Grand Prix,1950,1,1950-05-13,Alfa Romeo,Luigi Fagioli,2,2,2,6.0,Finished
2,1,British Grand Prix,1950,1,1950-05-13,Alfa Romeo,Reg Parnell,4,3,3,4.0,Finished
3,1,British Grand Prix,1950,1,1950-05-13,Talbot-Lago,Yves Cabantous,6,4,4,3.0,+2 Laps
4,1,British Grand Prix,1950,1,1950-05-13,Talbot-Lago,Louis Rosier,9,5,5,2.0,+2 Laps


In [31]:
races.groupby(["year","constructorName"]).positionText.value_counts()

year  constructorName  positionText
1950  Adams            R               2
      Alfa Romeo       R               8
                       1               6
                       2               5
                       3               2
                       4               1
                       7               1
      Alta             9               1
                       N               1
                       R               1
      Cooper           R               1
      Deidt            2               1
                       3               1
                       R               1
      ERA              R               5
                       6               2
                       7               1
      Ewing            17              1
      Ferrari          R               6
                       2               2
                       3               1
                       4               1
                       5               1
                     

So this is what each team has earned each year... useful but not what I wanted.

In [32]:
races.positionText.unique()

array(['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', 'R', 'N',
       'W', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21',
       '22', '23', '24', 'D', 'F', '25', '26', '27', '28', '29', '30',
       '31', '32', '33', 'E'], dtype=object)

**This** is what I wanted! From this I see that we have numbers and a few letters I need to understand:

* R
* N
* W
* D
* F
* E

To understand these, I'll find rows that contain them and look at the status and the race to see what they mean. 


### Understanding 'N'

In [33]:
races.head()

Unnamed: 0,raceId2,prixName,year,round,prixDate,constructorName,driverName,grid,positionText,positionOrder,points,status
0,1,British Grand Prix,1950,1,1950-05-13,Alfa Romeo,Nino Farina,1,1,1,9.0,Finished
1,1,British Grand Prix,1950,1,1950-05-13,Alfa Romeo,Luigi Fagioli,2,2,2,6.0,Finished
2,1,British Grand Prix,1950,1,1950-05-13,Alfa Romeo,Reg Parnell,4,3,3,4.0,Finished
3,1,British Grand Prix,1950,1,1950-05-13,Talbot-Lago,Yves Cabantous,6,4,4,3.0,+2 Laps
4,1,British Grand Prix,1950,1,1950-05-13,Talbot-Lago,Louis Rosier,9,5,5,2.0,+2 Laps


In [34]:
races[races.positionText == "N"]

Unnamed: 0,raceId2,prixName,year,round,prixDate,constructorName,driverName,grid,positionText,positionOrder,points,status
13,1,British Grand Prix,1950,1,1950-05-13,Alta,Joe Kelly,19,N,13,0.0,Not classified
268,12,British Grand Prix,1951,5,1951-07-14,Alta,Joe Kelly,18,N,15,0.0,Not classified
512,22,Dutch Grand Prix,1952,7,1952-08-17,HWM,Dries van der Lof,14,N,12,0.0,Not classified
537,23,Italian Grand Prix,1952,8,1952-09-07,Cooper,Mike Hawthorn,12,N,19,0.0,Not classified
631,26,Dutch Grand Prix,1953,3,1953-06-07,Connaught,Johnny Claes,17,N,13,0.0,Not classified
790,32,Italian Grand Prix,1953,9,1953-09-13,Connaught,Jack Fairman,22,N,21,0.0,Not classified
791,32,Italian Grand Prix,1953,9,1953-09-13,Cooper,Ken Wharton,19,N,22,0.0,Not classified
792,32,Italian Grand Prix,1953,9,1953-09-13,Connaught,Kenneth McAlpine,18,N,23,0.0,Not classified
1355,55,German Grand Prix,1956,7,1956-08-05,Maserati,Ottorino Volonterio,19,N,7,0.0,+6 Laps
2375,100,German Grand Prix,1961,6,1961-08-06,Cooper-Climax,Bernard Collomb,26,N,19,0.0,Not classified


Checking the 2011 italian Grand Prix ([wikipedia](https://en.wikipedia.org/wiki/2011_Italian_Grand_Prix#endnote_1)), I can confirm that 'N' means 'Not Classified'. Not exactly sure how I'll deal with this because the "Order" puts them after everyone who placed but didn't retire (so far)

### Understanding 'W'

In [35]:
races[races.positionText == "W"]

Unnamed: 0,raceId2,prixName,year,round,prixDate,constructorName,driverName,grid,positionText,positionOrder,points,status
42,2,Monaco Grand Prix,1950,2,1950-05-21,Ferrari,Peter Whitehead,21,W,20,0.0,Engine
43,2,Monaco Grand Prix,1950,2,1950-05-21,Maserati,Alfredo Pián,18,W,21,0.0,Accident
159,7,Italian Grand Prix,1950,7,1950-09-03,Milano,Felice Bonetto,23,W,27,0.0,Withdrew
317,14,Italian Grand Prix,1951,7,1951-09-16,BRM,Reg Parnell,8,W,21,0.0,Withdrew
318,14,Italian Grand Prix,1951,7,1951-09-16,BRM,Ken Richardson,10,W,22,0.0,Withdrew
338,15,Spanish Grand Prix,1951,8,1951-10-28,Maserati,Juan Jover,18,W,20,0.0,Engine
351,16,Swiss Grand Prix,1952,1,1952-05-18,HWM,Stirling Moss,9,W,12,0.0,Withdrew
352,16,Swiss Grand Prix,1952,1,1952-05-18,HWM,Lance Macklin,12,W,13,0.0,Withdrew
469,20,British Grand Prix,1952,5,1952-07-19,Aston Butterworth,Bill Aston,0,W,32,0.0,Withdrew
816,33,Argentine Grand Prix,1954,1,1954-01-17,Maserati,Luigi Musso,7,W,17,0.0,Withdrew


'W' seems to largely correspond to withdrawals, but sometimes a more specific status is mentioned. Let me check Grosjean and Alonso in 2016 and 2017. Grosjean crashed in Brazil pit lane on the way to start and in Singapore had a new gearbox that penalized him; he didn't change it so he didn't race. In 2017, Alonso had power unit issues so he withdrew from the race... so W= "Withdrew".



### Understanding 'D'

In [36]:
races[races.positionText == "D"]

Unnamed: 0,raceId2,prixName,year,round,prixDate,constructorName,driverName,grid,positionText,positionOrder,points,status
494,21,German Grand Prix,1952,6,1952-08-03,Maserati,Felice Bonetto,10,D,25,0.0,Disqualified
809,33,Argentine Grand Prix,1954,1,1954-01-17,Gordini,Jean Behra,17,D,10,0.0,Disqualified
810,33,Argentine Grand Prix,1954,1,1954-01-17,Ferrari,Mike Hawthorn,4,D,11,0.0,Disqualified
1354,55,German Grand Prix,1956,7,1956-08-05,Maserati,Bruce Halford,11,D,6,0.0,Disqualified
1646,68,Indianapolis 500,1958,4,1958-05-30,Kurtis Kraft,Mike Magill,31,D,17,0.0,Disqualified
1895,79,French Grand Prix,1959,4,1959-07-05,BRM,Stirling Moss,4,D,12,1.0,Disqualified
2040,86,Monaco Grand Prix,1960,2,1960-05-29,Cooper-Climax,Jack Brabham,2,D,11,0.0,Disqualified
2182,92,Portuguese Grand Prix,1960,8,1960-08-14,Team Lotus,Stirling Moss,4,D,8,0.0,Disqualified
2343,99,British Grand Prix,1961,5,1961-07-15,Ferguson,Stirling Moss,20,D,19,0.0,Disqualified
2344,99,British Grand Prix,1961,5,1961-07-15,Ferguson,Jack Fairman,20,D,19,0.0,Disqualified


D is outright for Disqualified.

### Understanding 'F'

In [37]:
races[races.positionText == "F"]

Unnamed: 0,raceId2,prixName,year,round,prixDate,constructorName,driverName,grid,positionText,positionOrder,points,status
543,23,Italian Grand Prix,1952,8,1952-09-07,Ferrari,Charles de Tornaco,0,F,25,0.0,Did not qualify
544,23,Italian Grand Prix,1952,8,1952-09-07,Maserati,Alberto Crespo,0,F,26,0.0,Did not qualify
545,23,Italian Grand Prix,1952,8,1952-09-07,Maserati,Toulo de Graffenried,0,F,27,0.0,Did not qualify
546,23,Italian Grand Prix,1952,8,1952-09-07,HWM,Peter Collins,0,F,28,0.0,Did not qualify
547,23,Italian Grand Prix,1952,8,1952-09-07,Ferrari,Peter Whitehead,0,F,29,0.0,Did not qualify
548,23,Italian Grand Prix,1952,8,1952-09-07,HWM,Tony Gaze,0,F,30,0.0,Did not qualify
549,23,Italian Grand Prix,1952,8,1952-09-07,Aston Butterworth,Bill Aston,0,F,31,0.0,Did not qualify
550,23,Italian Grand Prix,1952,8,1952-09-07,HWM,Lance Macklin,0,F,32,0.0,Did not qualify
551,23,Italian Grand Prix,1952,8,1952-09-07,Ferrari,Hans von Stuck,0,F,33,0.0,Did not qualify
552,23,Italian Grand Prix,1952,8,1952-09-07,Cisitalia,Piero Dusio,0,F,34,0.0,Did not qualify


'F' is for 'Failed to Qualify"

### Understanding 'E'

In [38]:
races[races.positionText == "E"]

Unnamed: 0,raceId2,prixName,year,round,prixDate,constructorName,driverName,grid,positionText,positionOrder,points,status
10994,440,Monaco Grand Prix,1987,4,1987-05-31,Zakspeed,Christian Danner,0,E,26,0.0,Excluded
11072,443,British Grand Prix,1987,7,1987-07-12,Ligier,Piercarlo Ghinzani,19,E,26,0.0,Excluded
11369,454,San Marino Grand Prix,1988,2,1988-05-01,Osella,Nicola Larini,0,E,27,0.0,Excluded
11403,455,Monaco Grand Prix,1988,3,1988-05-15,Euro Brun,Stefano Modena,0,E,31,0.0,Excluded
11434,456,Mexican Grand Prix,1988,4,1988-05-29,Euro Brun,Stefano Modena,0,E,31,0.0,Excluded
11527,459,French Grand Prix,1988,7,1988-07-03,Zakspeed,Piercarlo Ghinzani,0,E,31,0.0,Excluded
13818,526,German Grand Prix,1992,10,1992-07-26,Andrea Moda,Perry McCarthy,0,E,32,0.0,Excluded
15909,611,Austrian Grand Prix,1997,14,1997-09-21,Minardi,Tarso Marques,0,E,22,0.0,Underweight
22894,934,Brazilian Grand Prix,2015,18,2015-11-15,Williams,Felipe Massa,8,E,20,0.0,Excluded


E is for exclusions! something happened that they were excluded in the final results. 

---
Putting it all together:

* R: Retirement
* N: Not classified
* W: Withdrew
* D: Disqualified
* F: Did not Qualify
* E: Excluded

I have an idea for dealing with retirements, but for the rest, I'm not sure how to classify in terms of placing for the averages. For now, let's move on to getting race averages based on the existing positionOrder

In [39]:
races.head()

Unnamed: 0,raceId2,prixName,year,round,prixDate,constructorName,driverName,grid,positionText,positionOrder,points,status
0,1,British Grand Prix,1950,1,1950-05-13,Alfa Romeo,Nino Farina,1,1,1,9.0,Finished
1,1,British Grand Prix,1950,1,1950-05-13,Alfa Romeo,Luigi Fagioli,2,2,2,6.0,Finished
2,1,British Grand Prix,1950,1,1950-05-13,Alfa Romeo,Reg Parnell,4,3,3,4.0,Finished
3,1,British Grand Prix,1950,1,1950-05-13,Talbot-Lago,Yves Cabantous,6,4,4,3.0,+2 Laps
4,1,British Grand Prix,1950,1,1950-05-13,Talbot-Lago,Louis Rosier,9,5,5,2.0,+2 Laps


In [40]:
races.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24277 entries, 0 to 24276
Data columns (total 12 columns):
raceId2            24277 non-null int64
prixName           24277 non-null object
year               24277 non-null int64
round              24277 non-null int64
prixDate           24277 non-null object
constructorName    24277 non-null object
driverName         24277 non-null object
grid               24277 non-null int64
positionText       24277 non-null object
positionOrder      24277 non-null int64
points             24277 non-null float64
status             24277 non-null object
dtypes: float64(1), int64(5), object(6)
memory usage: 2.2+ MB


In [41]:
races.groupby(["year","constructorName","round"]).head()

Unnamed: 0,raceId2,prixName,year,round,prixDate,constructorName,driverName,grid,positionText,positionOrder,points,status
0,1,British Grand Prix,1950,1,1950-05-13,Alfa Romeo,Nino Farina,1,1,1,9.0,Finished
1,1,British Grand Prix,1950,1,1950-05-13,Alfa Romeo,Luigi Fagioli,2,2,2,6.0,Finished
2,1,British Grand Prix,1950,1,1950-05-13,Alfa Romeo,Reg Parnell,4,3,3,4.0,Finished
3,1,British Grand Prix,1950,1,1950-05-13,Talbot-Lago,Yves Cabantous,6,4,4,3.0,+2 Laps
4,1,British Grand Prix,1950,1,1950-05-13,Talbot-Lago,Louis Rosier,9,5,5,2.0,+2 Laps
5,1,British Grand Prix,1950,1,1950-05-13,ERA,Bob Gerard,13,6,6,0.0,+3 Laps
6,1,British Grand Prix,1950,1,1950-05-13,ERA,Cuth Harrison,15,7,7,0.0,+3 Laps
7,1,British Grand Prix,1950,1,1950-05-13,Talbot-Lago,Philippe Étancelin,14,8,8,0.0,+5 Laps
8,1,British Grand Prix,1950,1,1950-05-13,Maserati,David Hampshire,16,9,9,0.0,+6 Laps
9,1,British Grand Prix,1950,1,1950-05-13,Maserati,Joe Fry,20,10,10,0.0,+6 Laps


In [42]:
races.groupby(["year","constructorName","round"]).positionOrder.mean()

year  constructorName  round
1950  Adams            3        27.000000
      Alfa Romeo       1         4.500000
                       2         8.000000
                       4         5.000000
                       5         2.333333
                       6         3.333333
                       7        10.000000
      Alta             1        14.500000
                       5         9.000000
      Cooper           2        19.000000
      Deidt            3        12.000000
      ERA              1        14.800000
                       2        11.500000
                       7         9.000000
      Ewing            3        17.000000
      Ferrari          2         8.500000
                       4        16.000000
                       5         5.500000
                       6         3.000000
                       7        11.000000
      Kurtis Kraft     3        15.800000
      Langley          3        16.000000
      Lesovsky         3        11.500000
     

Things seem to work, but looking at 2019, I don't see Haas scores for rounds 1 and 2, let's check this year more closely.

In [43]:
races2019 = races[races.year == 2019]

In [44]:
races2019.groupby(["year","constructorName","round"]).positionOrder.mean()

year  constructorName  round
2019  Alfa Romeo       1        11.5
                       2         9.0
                       3        12.0
                       4        11.0
      Ferrari          1         4.5
                       2         4.0
                       3         4.0
                       4         4.0
      Haas F1 Team     1        12.0
                       2        16.5
                       3        12.0
                       4        15.5
      McLaren          1        16.0
                       2        12.5
                       3        16.0
                       4         7.5
      Mercedes         1         1.5
                       2         1.5
                       3         1.5
                       4         1.5
      Racing Point     1        11.0
                       2        12.0
                       3        10.0
                       4         7.5
      Red Bull         1         7.0
                       2         6.0
         

In [45]:
races2019[races["constructorName"] == "Haas F1 Team"]

  """Entry point for launching an IPython kernel.


Unnamed: 0,raceId2,prixName,year,round,prixDate,constructorName,driverName,grid,positionText,positionOrder,points,status
24202,998,Australian Grand Prix,2019,1,2019-03-17,Haas F1 Team,Kevin Magnussen,7,6,6,8.0,Finished
24214,998,Australian Grand Prix,2019,1,2019-03-17,Haas F1 Team,Romain Grosjean,6,R,18,0.0,Wheel
24229,999,Bahrain Grand Prix,2019,2,2019-03-31,Haas F1 Team,Kevin Magnussen,6,13,13,0.0,+1 Lap
24236,999,Bahrain Grand Prix,2019,2,2019-03-31,Haas F1 Team,Romain Grosjean,11,R,20,0.0,Retired
24247,1000,Chinese Grand Prix,2019,3,2019-04-14,Haas F1 Team,Romain Grosjean,10,11,11,0.0,+1 Lap
24249,1000,Chinese Grand Prix,2019,3,2019-04-14,Haas F1 Team,Kevin Magnussen,9,13,13,0.0,+1 Lap
24269,1001,Azerbaijan Grand Prix,2019,4,2019-04-28,Haas F1 Team,Kevin Magnussen,12,13,13,0.0,+1 Lap
24274,1001,Azerbaijan Grand Prix,2019,4,2019-04-28,Haas F1 Team,Romain Grosjean,14,R,18,0.0,Brakes


It's all there. Let's try to save the 2019 data and graph it to see what that looks like.

In [46]:
test2019 = races2019.groupby(["year","constructorName","round"]).positionOrder.mean()

In [47]:
test2019.head()

year  constructorName  round
2019  Alfa Romeo       1        11.5
                       2         9.0
                       3        12.0
                       4        11.0
      Ferrari          1         4.5
Name: positionOrder, dtype: float64

In [48]:
test2019.rename("averagePositionOrder").reset_index()

Unnamed: 0,year,constructorName,round,averagePositionOrder
0,2019,Alfa Romeo,1,11.5
1,2019,Alfa Romeo,2,9.0
2,2019,Alfa Romeo,3,12.0
3,2019,Alfa Romeo,4,11.0
4,2019,Ferrari,1,4.5
5,2019,Ferrari,2,4.0
6,2019,Ferrari,3,4.0
7,2019,Ferrari,4,4.0
8,2019,Haas F1 Team,1,12.0
9,2019,Haas F1 Team,2,16.5


In [49]:
type(test2019)

pandas.core.series.Series

In [50]:
df_test2019 = test2019.to_frame()

In [51]:
type(df_test2019)

pandas.core.frame.DataFrame

In [52]:
df_test2019.reset_index()

Unnamed: 0,year,constructorName,round,positionOrder
0,2019,Alfa Romeo,1,11.5
1,2019,Alfa Romeo,2,9.0
2,2019,Alfa Romeo,3,12.0
3,2019,Alfa Romeo,4,11.0
4,2019,Ferrari,1,4.5
5,2019,Ferrari,2,4.0
6,2019,Ferrari,3,4.0
7,2019,Ferrari,4,4.0
8,2019,Haas F1 Team,1,12.0
9,2019,Haas F1 Team,2,16.5


In [53]:
df_test2019.reset_index().to_csv("../data/2019test.csv", index=False)

Let's now get a longer year cut (1988 first and then 1982)

In [54]:
races1982 = races[(races.year == 1982)]
races1982.head()

Unnamed: 0,raceId2,prixName,year,round,prixDate,constructorName,driverName,grid,positionText,positionOrder,points,status
8756,358,South African Grand Prix,1982,1,1982-01-23,Renault,Alain Prost,5,1,1,9.0,Finished
8757,358,South African Grand Prix,1982,1,1982-01-23,Williams,Carlos Reutemann,8,2,2,6.0,Finished
8758,358,South African Grand Prix,1982,1,1982-01-23,Renault,René Arnoux,1,3,3,4.0,Finished
8759,358,South African Grand Prix,1982,1,1982-01-23,McLaren,Niki Lauda,13,4,4,3.0,Finished
8760,358,South African Grand Prix,1982,1,1982-01-23,Williams,Keke Rosberg,7,5,5,2.0,Finished


In [55]:
races_set1 = "empty"
races_set1 = races1982.groupby(["year","constructorName","round"]).positionOrder.mean()
races_set1.rename("averagePositionOrder")

year  constructorName  round
1982  ATS              1         9.5
                       2         9.0
                       3        25.5
                       4         5.5
                       5        25.0
                       6        16.5
                       7        20.5
                       8        22.5
                       9        12.5
                       10       28.0
                       11       17.5
                       12       21.5
                       13       23.5
                       14       16.0
                       15       18.5
                       16       21.5
      Alfa Romeo       1        12.0
                       2        23.0
                       3        18.5
                       4        10.5
                       5        21.0
                       6        11.5
                       7        18.5
                       8        14.0
                       9        15.0
                       10        9.0
         

In [56]:
type(races_set1)

pandas.core.series.Series

In [57]:
races_set1.to_frame()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,positionOrder
year,constructorName,round,Unnamed: 3_level_1
1982,ATS,1,9.5
1982,ATS,2,9.0
1982,ATS,3,25.5
1982,ATS,4,5.5
1982,ATS,5,25.0
1982,ATS,6,16.5
1982,ATS,7,20.5
1982,ATS,8,22.5
1982,ATS,9,12.5
1982,ATS,10,28.0


In [58]:
races1 = races_set1.rename("averagePosition").reset_index()
races1.head(16)

Unnamed: 0,year,constructorName,round,averagePosition
0,1982,ATS,1,9.5
1,1982,ATS,2,9.0
2,1982,ATS,3,25.5
3,1982,ATS,4,5.5
4,1982,ATS,5,25.0
5,1982,ATS,6,16.5
6,1982,ATS,7,20.5
7,1982,ATS,8,22.5
8,1982,ATS,9,12.5
9,1982,ATS,10,28.0


In [59]:
races1.to_csv("../data/1982races.csv")

In [60]:
races1.averagePosition.describe()

count    259.000000
mean      15.833333
std        7.635878
min        1.500000
25%        9.500000
50%       15.000000
75%       22.000000
max       30.000000
Name: averagePosition, dtype: float64

In [61]:
races1988 = races[(races.year == 1988)]

In [62]:
races1988.head()

Unnamed: 0,raceId2,prixName,year,round,prixDate,constructorName,driverName,grid,positionText,positionOrder,points,status
11312,453,Brazilian Grand Prix,1988,1,1988-04-03,McLaren,Alain Prost,3,1,1,9.0,Finished
11313,453,Brazilian Grand Prix,1988,1,1988-04-03,Ferrari,Gerhard Berger,4,2,2,6.0,Finished
11314,453,Brazilian Grand Prix,1988,1,1988-04-03,Team Lotus,Nelson Piquet,5,3,3,4.0,Finished
11315,453,Brazilian Grand Prix,1988,1,1988-04-03,Arrows,Derek Warwick,11,4,4,3.0,Finished
11316,453,Brazilian Grand Prix,1988,1,1988-04-03,Ferrari,Michele Alboreto,6,5,5,2.0,Finished


In [63]:
mclaren88 = races[(races.year == 1988) & (races.constructorName == "McLaren")]

In [64]:
mclaren88.groupby("round").positionOrder.mean()

round
1      9.0
2      1.5
3      6.0
4      1.5
5      1.5
6      1.5
7      1.5
8     11.5
9      1.5
10     1.5
11     1.5
12    12.0
13     3.5
14     2.5
15     1.5
16     1.5
Name: positionOrder, dtype: float64

In [65]:
mclaren88.groupby("round").positionOrder.describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
round,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,2.0,9.0,11.313708,1.0,5.0,9.0,13.0,17.0
2,2.0,1.5,0.707107,1.0,1.25,1.5,1.75,2.0
3,2.0,6.0,7.071068,1.0,3.5,6.0,8.5,11.0
4,2.0,1.5,0.707107,1.0,1.25,1.5,1.75,2.0
5,2.0,1.5,0.707107,1.0,1.25,1.5,1.75,2.0
6,2.0,1.5,0.707107,1.0,1.25,1.5,1.75,2.0
7,2.0,1.5,0.707107,1.0,1.25,1.5,1.75,2.0
8,2.0,11.5,14.849242,1.0,6.25,11.5,16.75,22.0
9,2.0,1.5,0.707107,1.0,1.25,1.5,1.75,2.0
10,2.0,1.5,0.707107,1.0,1.25,1.5,1.75,2.0


In [66]:
mercedes16 = races[(races.year==2016) & (races.constructorName == "Mercedes")]

In [67]:
mercedes16.groupby("round").positionOrder.describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
round,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,2.0,1.5,0.707107,1.0,1.25,1.5,1.75,2.0
2,2.0,2.0,1.414214,1.0,1.5,2.0,2.5,3.0
3,2.0,4.0,4.242641,1.0,2.5,4.0,5.5,7.0
4,2.0,1.5,0.707107,1.0,1.25,1.5,1.75,2.0
5,2.0,21.5,0.707107,21.0,21.25,21.5,21.75,22.0
6,2.0,4.0,4.242641,1.0,2.5,4.0,5.5,7.0
7,2.0,3.0,2.828427,1.0,2.0,3.0,4.0,5.0
8,2.0,3.0,2.828427,1.0,2.0,3.0,4.0,5.0
9,2.0,2.5,2.12132,1.0,1.75,2.5,3.25,4.0
10,2.0,2.0,1.414214,1.0,1.5,2.0,2.5,3.0


In [68]:
ferrari02 = races[(races.year == 2002) & (races.constructorName == "Ferrari")]

In [69]:
ferrari02.groupby("round").positionOrder.describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
round,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,2.0,8.0,9.899495,1.0,4.5,8.0,11.5,15.0
2,2.0,8.5,7.778175,3.0,5.75,8.5,11.25,14.0
3,2.0,11.0,14.142136,1.0,6.0,11.0,16.0,21.0
4,2.0,1.5,0.707107,1.0,1.25,1.5,1.75,2.0
5,2.0,10.5,13.435029,1.0,5.75,10.5,15.25,20.0
6,2.0,1.5,0.707107,1.0,1.25,1.5,1.75,2.0
7,2.0,4.5,3.535534,2.0,3.25,4.5,5.75,7.0
8,2.0,2.0,1.414214,1.0,1.5,2.0,2.5,3.0
9,2.0,1.5,0.707107,1.0,1.25,1.5,1.75,2.0
10,2.0,1.5,0.707107,1.0,1.25,1.5,1.75,2.0


In [70]:
meanPosition = races.groupby(["year","constructorName","round"]).positionOrder.mean()

In [71]:
positions = meanPosition.rename("averagePosition").to_frame()

In [72]:
positions = positions.reset_index()

In [73]:
positions.to_csv("../data/averagePositions-all.csv", index=False)