# Does qualifying posititon really matter in formula one?

This is a quick check to compare the results of over 1000 individual F1 races over 69 years, to compare the top 5 qualifying positions to their final race results to see if starting on pole position really gives you better odds at winning the race.

The dataset used can be found here: 
https://www.kaggle.com/rohanrao/formula-1-world-championship-1950-2020#results.csv

In [146]:
import pandas as pd
import numpy as np

In [147]:
df = pd.read_csv("results.csv")
# Replace the non-int values with 0's
df = df.replace(to_replace=r'R', value=0, regex=True)
df = df.replace(to_replace=r'\\N', value=0, regex=True)
df.head()

Unnamed: 0,resultId,raceId,driverId,constructorId,number,grid,position,positionText,positionOrder,points,laps,time,milliseconds,fastestLap,rank,fastestLapTime,fastestLapSpeed,statusId
0,1,18,1,1,22,1,1,1,1,10.0,58,1:34:50.616,5690616,39,2,1:27.452,218.3,1
1,2,18,2,2,3,5,2,2,2,8.0,58,+5.478,5696094,41,3,1:27.739,217.586,1
2,3,18,3,3,7,7,3,3,3,6.0,58,+8.163,5698779,41,5,1:28.090,216.719,1
3,4,18,4,4,5,11,4,4,4,5.0,58,+17.181,5707797,58,7,1:28.603,215.464,1
4,5,18,5,1,23,3,5,5,5,4.0,58,+18.014,5708630,43,1,1:27.418,218.385,1


In [148]:
# 1: raceId
# 5: grid (starting position)
# 6: position (final position)
needed_columns = [1,5,6]
df = df[df.columns[needed_columns]]
df.head()

Unnamed: 0,raceId,grid,position
0,18,1,1
1,18,5,2
2,18,7,3
3,18,11,4
4,18,3,5


In [149]:
# Convert column values from strings to numeric
df["grid"] = pd.to_numeric(df["grid"])
df["position"] = pd.to_numeric(df["position"])
df["raceId"] = pd.to_numeric(df["raceId"])
df.head()

Unnamed: 0,raceId,grid,position
0,18,1,1
1,18,5,2
2,18,7,3
3,18,11,4
4,18,3,5


In [150]:
# First race isn't #1 but #18, take last from first to get total races
total_races = (df.raceId.iloc[-1] - df.raceId.iloc[0])

# Did the driver start on pole, if so did he finish the race first?
df['p1_win'] = np.where((df['grid'] == 1) & (df['position'] == 1),\
                         'yes', 'no')
p1_winners = df['p1_win'].value_counts()

df['p2_win'] = np.where((df['grid'] == 2) & (df['position'] == 1),\
                         'yes', 'no')
p2_winners = df['p2_win'].value_counts()

df['p3_win'] = np.where((df['grid'] == 3) & (df['position'] == 1),\
                         'yes', 'no')
p3_winners = df['p3_win'].value_counts()

df['p4_win'] = np.where((df['grid'] == 4) & (df['position'] == 1),\
                         'yes', 'no')
p4_winners = df['p4_win'].value_counts()

print("p1 wins: ", p1_winners.yes, "\np2 wins: ", p2_winners.yes, \
      "\np3 wins: ", p3_winners.yes, "\np4 wins: ", p4_winners.yes)

p1 wins:  425 
p2 wins:  246 
p3 wins:  127 
p4 wins:  63


In [151]:
# Use the winning values against the total races to calc percentage
p1_win_perc = ((p1_winners.yes / (total_races)) * 100)

p2_win_perc = ((p2_winners.yes / (total_races)) * 100)

p3_win_perc = ((p3_winners.yes / (total_races)) * 100)

p4_win_perc = ((p4_winners.yes / (total_races)) * 100)

print("p1 win percetage:\t", "%" + "%0.2f"%p1_win_perc,      \
      "\np2 win percentage:\t", "%" + "%0.2f"%p2_win_perc,   \
      "\np3 win percentage:\t", "%" + "%0.2f"%p3_win_perc,   \
      "\np4 win percentage:\t", "%" + "%0.2f"%p4_win_perc)

p1 win percetage:	 %42.00 
p2 win percentage:	 %24.31 
p3 win percentage:	 %12.55 
p4 win percentage:	 %6.23


## Conclusion

From these results we see that not only does starting on pole position give you a great advantage over starting 2nd, but also that each position gained during qualifying doubles* your chances of winning the race.

---
\* Not *exactly* doubles but it's close enough.