On the [Formula 1](https://github.com/gdv/foundationsCS/tree/master/students/ex-data/f1-db) dataset, answer the following questions.

# Questions

1. For each decade, compute who is the driver born in that decade that scored more points in his career
2. For each circuit, find the fastest lap and output it with: (1) the date it was perfomed, (2) the name of the driver, and (3) the lap time
3. Find the driver that spent most time performing pit stops
5. For each nationality, find the driver that scored most points in his/her career
1. Find the nations that have at least one driver with at least 1000 points
1. Find the nations that have at least two drivers with at least 1000 points

# Answers

In [36]:
import pandas as pd
import numpy as np

## Question #1

For each decade, compute who is the driver born in that decade that scored more points in his career.

#### Hints
1.  Remember that `apply` applies a function to each value of a `Series`
2.  `idxmax` computes the implicit index of the row attaining the maximum
1.  `iloc` can have a list as argument

There are at least two possible ways to compute the decade. 
The first is to take the `dob` column, transform it into a date, then extract the year

In [37]:
drivers_data = pd.read_csv('ex-data/f1-db/drivers.csv')
drivers_data['decade'] = pd.to_datetime(drivers_data['dob']).dt.year // 10
drivers_data.head()

Unnamed: 0,driverId,driverRef,number,code,forename,surname,dob,nationality,url,decade
0,1,hamilton,44.0,HAM,Lewis,Hamilton,07/01/1985,British,http://en.wikipedia.org/wiki/Lewis_Hamilton,198
1,2,heidfeld,,HEI,Nick,Heidfeld,10/05/1977,German,http://en.wikipedia.org/wiki/Nick_Heidfeld,197
2,3,rosberg,6.0,ROS,Nico,Rosberg,27/06/1985,German,http://en.wikipedia.org/wiki/Nico_Rosberg,198
3,4,alonso,14.0,ALO,Fernando,Alonso,29/07/1981,Spanish,http://en.wikipedia.org/wiki/Fernando_Alonso,198
4,5,kovalainen,,KOV,Heikki,Kovalainen,19/10/1981,Finnish,http://en.wikipedia.org/wiki/Heikki_Kovalainen,198


In [38]:
# Load standings data
standings_data = pd.read_csv('ex-data/f1-db/driverStandings.csv')

# Compute points for each driver
driver_points = standings_data.groupby('driverId')['points'].sum()
driver_points.head()

driverId
1    26468.0
2     2830.0
3    16910.0
4    18196.0
5      953.0
Name: points, dtype: float64

In [39]:
#driver_points.rename('points')
#driver_points.head()

Now we are able to compute the best driver for each decade, using the `idxmax` function

In [40]:
drivers_data_with_points = drivers_data.join(driver_points, on='driverId')
best_of_each_decade = drivers_data_with_points.groupby('decade')['points'].idxmax()
best_of_each_decade

decade
189    786
190    642
191    579
192    288
193    327
194    181
195    116
196     29
197      7
198      0
199    814
Name: points, dtype: int64

Since the `idxmax` function returns the implicit index corresponding to the maximum values, we can use `iloc` to extract the drivers

In [41]:
drivers_data_with_points.loc[best_of_each_decade][['forename', 'surname', 'decade', 'points']]

Unnamed: 0,forename,surname,decade,points
786,Luigi,Fagioli,189,116.0
642,Nino,Farina,190,528.31
579,Juan,Fangio,191,1131.28
288,Graham,Hill,192,1691.0
327,Jackie,Stewart,193,2574.0
181,Niki,Lauda,194,3768.5
116,Alain,Prost,195,6829.5
29,Michael,Schumacher,196,14514.0
7,Kimi,Räikkönen,197,15772.0
0,Lewis,Hamilton,198,26468.0


## Question #2

For each circuit, find the fastest lap and output it with: (1) the date it was perfomed, (2) the name of the driver, and (3) the lap time

In [42]:
races_data = pd.read_csv('ex-data/f1-db/races.csv')
laps_data = pd.read_csv('ex-data/f1-db/lapTimes.csv')
circuits_data = pd.read_csv('ex-data/f1-db/circuits.csv')

First we are going to add the circuit ID to each row of the laps dataset.

In [43]:
# Add circuit ID to each lap

laps_data_new = pd.merge(laps_data, races_data[['raceId', 'circuitId']])
assert len(laps_data_new) == len(laps_data), 'Lap without a matching circuit'

Compute the best lap for each circuit

In [44]:
best_lap_for_circuit = laps_data_new.groupby('circuitId')['milliseconds'].idxmin()
best_lap_for_circuit.head()

circuitId
1    246165
2    420521
3    248125
4    268741
5    278534
Name: milliseconds, dtype: int64

Join drivers and circuits data

In [45]:
laps_data.iloc[best_lap_for_circuit].head()

Unnamed: 0,raceId,driverId,lap,position,time,milliseconds
246165,90,30,29,1,1:24.125,84125
420521,983,20,41,4,1:34.080,94080
248125,92,30,7,1,1:30.252,90252
268741,75,21,66,5,1:15.641,75641
278534,84,31,39,2,1:24.770,84770


In [46]:
drivers_best_laps = pd.merge(laps_data.iloc[best_lap_for_circuit],
                             races_data[['raceId', 'date', 'circuitId']],
                             on='raceId')[['driverId', 'circuitId', 'date', 'time']]
drivers_best_laps.head()

Unnamed: 0,driverId,circuitId,date,time
0,30,1,2004-03-07,1:24.125
1,20,2,2017-10-01,1:34.080
2,30,3,2004-04-04,1:30.252
3,21,4,2005-05-08,1:15.641
4,31,5,2005-08-21,1:24.770


In [47]:
best_laps_data = drivers_best_laps.merge(drivers_data[['driverId', 'forename', 'surname']],
                                         on='driverId').merge(circuits_data[['circuitId', 'name']],
                                                           on='circuitId')

# Present only the data we need
best_laps_data[['name', 'forename', 'surname', 'date', 'time']]

Unnamed: 0,name,forename,surname,date,time
0,Albert Park Grand Prix Circuit,Michael,Schumacher,2004-03-07,1:24.125
1,Bahrain International Circuit,Michael,Schumacher,2004-04-04,1:30.252
2,Circuit de Monaco,Michael,Schumacher,2004-05-23,1:14.439
3,Silverstone Circuit,Michael,Schumacher,2004-07-11,1:18.739
4,Hungaroring,Michael,Schumacher,2002-08-18,1:16.207
5,Shanghai International Circuit,Michael,Schumacher,2004-09-26,1:32.238
6,Autodromo Enzo e Dino Ferrari,Michael,Schumacher,2004-04-25,1:20.411
7,A1-Ring,Michael,Schumacher,2003-05-18,1:08.337
8,Sepang International Circuit,Sebastian,Vettel,2017-10-01,1:34.080
9,Yas Marina Circuit,Sebastian,Vettel,2009-11-01,1:40.279


## Question #3

Find the driver that has spent the most time performing pit stops

In [48]:
pit_stops_data = pd.read_csv('ex-data/f1-db/pitStops.csv')
driver_id = pit_stops_data.groupby('driverId')['milliseconds'].sum().idxmax()
print(driver_id)
drivers_data[drivers_data['driverId'] == driver_id]

817


Unnamed: 0,driverId,driverRef,number,code,forename,surname,dob,nationality,url,decade
816,817,ricciardo,3.0,RIC,Daniel,Ricciardo,01/07/1989,Australian,http://en.wikipedia.org/wiki/Daniel_Ricciardo,198


## Question #4

For each nationality, find the driver that scored most points in his/her career

In [49]:
drivers_idxs = drivers_data_with_points.groupby('nationality')['points'].idxmax()
best_of_each_nat = drivers_data_with_points.iloc[drivers_idxs][['forename', 'surname', 'nationality', 'points']]
best_of_each_nat[best_of_each_nat['points'] > 0]

Unnamed: 0,forename,surname,nationality,points
206,Mario,Andretti,American,1594.0
198,Carlos,Reutemann,Argentine,2602.0
16,Mark,Webber,Australian,10608.0
181,Niki,Lauda,Austrian,3768.5
234,Jacky,Ickx,Belgian,1155.0
12,Felipe,Massa,Brazilian,11149.0
0,Lewis,Hamilton,British,26468.0
34,Jacques,Villeneuve,Canadian,2083.0
193,Eliseo,Salazar,Chilean,30.0
30,Juan,Pablo Montoya,Colombian,2760.0


## Question 5
Find the nations that have at least one driver with at least 1000 points

In [50]:
drivers_data_with_points[drivers_data_with_points['points'] >= 1000]['nationality'].unique()

array(['British', 'German', 'Spanish', 'Finnish', 'Polish', 'Brazilian',
       'Italian', 'Australian', 'Colombian', 'Canadian', 'French',
       'Austrian', 'Belgian', 'Japanese', 'Argentine', 'American',
       'South African', 'Swiss', 'Swedish', 'New Zealander', 'Mexican',
       'Russian', 'Dutch'], dtype=object)

## Question 6
Find the nations that have at least two drivers with at least 1000 points

In [51]:
nations = drivers_data_with_points[drivers_data_with_points['points'] >= 1000].groupby('nationality').size()
nations[nations >= 2]

nationality
Argentine         2
Australian        4
Austrian          2
Belgian           2
Brazilian         5
British          12
Finnish           4
French            6
German            8
Italian           5
New Zealander     2
Spanish           2
dtype: int64