# Questions

1. For each decade, compute who is the driver born in that decade that scored more points in his career
2. For each circuit, find the fastest lap and output it with: (1) the date it was perfomed, (2) the name of the driver, and (3) the lap time
3. Find the driver that spent most time performing pit stops
4. Check that the driver computed in the previous point is Daniel Ricciardo (whose driverId is 817) using unittest
5. For each nationality, find the driver that scored most points in his/her career

# Answers

In [22]:
import pandas as pd
import numpy as np

## Question #1

For each decade, compute who is the driver born in that decade that scored more points in his career.

#### Hints
1.  Remember that `apply` applies a function to each value of a `Series`
2.  `idxmax` computes the implicit index of the row attaining the maximum
1.  `iloc` can have a list as argument

There are at least two possible ways to compute the decade. 
The first is to take the `dob` column, transform it into a date, then extract the year

In [23]:
drivers_data = pd.read_csv('ex-data/f1-db/drivers.csv')
drivers_data['decade'] = pd.to_datetime(drivers_data['dob']).dt.year // 10
drivers_data.head()

Unnamed: 0,driverId,driverRef,number,code,forename,surname,dob,nationality,url,decade
0,1,hamilton,44.0,HAM,Lewis,Hamilton,07/01/1985,British,http://en.wikipedia.org/wiki/Lewis_Hamilton,198
1,2,heidfeld,,HEI,Nick,Heidfeld,10/05/1977,German,http://en.wikipedia.org/wiki/Nick_Heidfeld,197
2,3,rosberg,6.0,ROS,Nico,Rosberg,27/06/1985,German,http://en.wikipedia.org/wiki/Nico_Rosberg,198
3,4,alonso,14.0,ALO,Fernando,Alonso,29/07/1981,Spanish,http://en.wikipedia.org/wiki/Fernando_Alonso,198
4,5,kovalainen,,KOV,Heikki,Kovalainen,19/10/1981,Finnish,http://en.wikipedia.org/wiki/Heikki_Kovalainen,198


In [24]:
# Load standings data
standings_data = pd.read_csv('ex-data/f1-db/driverStandings.csv')

# Compute points for each driver
driver_points = standings_data.groupby('driverId')['points'].sum()
driver_points.head()

driverId
1    26468.0
2     2830.0
3    16910.0
4    18196.0
5      953.0
Name: points, dtype: float64

In [25]:
#driver_points.rename('points')
#driver_points.head()

Now we are able to compute the best driver for each decade, using the `idxmax` function

In [26]:
drivers_data_with_points = drivers_data.join(driver_points, on='driverId')
best_of_each_decade = drivers_data_with_points.groupby('decade', as_index=False)['points'].idxmax()
best_of_each_decade

0     786
1     642
2     579
3     288
4     327
5     181
6     116
7      29
8       7
9       0
10    814
dtype: int64

Since the `idxmax` function returns the implicit index corresponding to the maximum values, we can use `iloc` to extract the drivers

In [27]:
drivers_data_with_points.iloc[best_of_each_decade][['forename', 'surname', 'decade', 'points']]

Unnamed: 0,forename,surname,decade,points
786,Luigi,Fagioli,189,116.0
642,Nino,Farina,190,528.31
579,Juan,Fangio,191,1131.28
288,Graham,Hill,192,1691.0
327,Jackie,Stewart,193,2574.0
181,Niki,Lauda,194,3768.5
116,Alain,Prost,195,6829.5
29,Michael,Schumacher,196,14514.0
7,Kimi,Räikkönen,197,15772.0
0,Lewis,Hamilton,198,26468.0


## Question #2

For each circuit, find the fastest lap and output it with: (1) the date it was perfomed, (2) the name of the driver, and (3) the lap time

In [28]:
races_data = pd.read_csv('ex-data/f1-db/races.csv')
laps_data = pd.read_csv('ex-data/f1-db/lapTimes.csv')
circuits_data = pd.read_csv('ex-data/f1-db/circuits.csv')

First we are going to add the circuit ID to each row of the laps dataset.

In [29]:
# Add circuit ID to each lap

laps_data_new = pd.merge(laps_data, races_data[['raceId', 'circuitId']])
assert len(laps_data_new) == len(laps_data), 'Lap without a matching circuit'

In [30]:
# Compute the idx of the best lap for each circuit
best_lap_for_circuit = laps_data_new.groupby('circuitId')['milliseconds'].idxmin()

# Join drivers and circuits data
drivers_best_laps = laps_data.iloc[best_lap_for_circuit].merge(races_data[['raceId', 'date']],
                                                              on='raceId')[['driverId', 'circuitId', 'date', 'time']]
best_laps_data = drivers_best_laps.merge(drivers_data[['driverId', 'forename', 'surname']],
                                         on='driverId').merge(circuits_data[['circuitId', 'name']],
                                                           on='circuitId')
# Present only the data we need
best_laps_data[['name', 'forename', 'surname', 'date', 'time']]

KeyError: "['circuitId'] not in index"

## Question #3

Find the driver that has spent the most time performing pit stops

In [None]:
pit_stops_data = pd.read_csv('ex-data/f1-db/pitStops.csv')
driver_id = pit_stops_data.groupby('driverId')['milliseconds'].sum().idxmax()
print(driver_id)
drivers_data[drivers_data['driverId'] == driver_id]

## Question #4

For each nationality, find the driver that scored most points in his/her career

In [None]:
drivers_idxs = drivers_data_with_points.groupby('nationality')['points'].idxmax()
best_of_each_nat = drivers_data_with_points.iloc[drivers_idxs][['forename', 'surname', 'nationality', 'points']]
best_of_each_nat[best_of_each_nat['points'] > 0]

## Question 5
Find the nations that have at least one driver with at least 1000 points

In [None]:
drivers_data_with_points[drivers_data_with_points['points'] >= 1000]['nationality'].unique()

## Question 6
Find the nations that have at least two drivers with at least 1000 points

In [None]:
nations = drivers_data_with_points[drivers_data_with_points['points'] >= 1000].groupby('nationality').size()
nations[nations >= 2]