| # | Question | Core tables | Why it’s tricky |
|---|-----------|-------------|-----------------|
| **1** | **Who gained the most grid positions on average in 2024?** Compute *(grid-pos − finish-pos)* across all 2024 races and return the top-3 gainers. | `qualifying`, `races`, `driver_standings`, `drivers` | Requires merging qualifying grid, race results, and filtering by season; handle DNS/DNF cases. |
| **2** | **Most consistent pit-stop team:** For 2024, calculate each constructor’s standard deviation of pit-stop **durations** (consider only stops < 5 s) and name the lowest-variance team. | `pit_stops`, `constructors`, `races` | Convert duration strings to floats; aggregate with conditional filtering. |
| **3** | **Circuit with the highest overtake rate in 2024:** Define an *overtake* as any position change between consecutive laps for the top-10 finishers; divide total overtakes by total race laps. | `lap_times`, `races`, `driver_standings` | Very large lap-times table; need window functions per driver and lap. |
| **4** | **Biggest single-lap improvement:** Across all 2024 laps, find the driver/lap with the largest negative delta *(prev-lap − current-lap)*. | `lap_times`, `drivers`, `races` | Lag calculation per driver; exclude pit-in/out laps. |
| **5** | **Largest points swing between back-to-back races:** Identify the driver whose championship tally changed the most between any two consecutive rounds in 2024. | `driver_standings`, `races` | Order standings by round and compute per-driver diffs. |
| **6** | **Constructor efficiency metric:** Rank constructors by **points earned per total pit-stop seconds** in 2024. | `constructor_standings`, `pit_stops`, `constructors` | Combine season-long points with summed pit-stop durations. |
| **7** | **Predictive model:** Build a linear regression predicting pit-stop **duration** using lap number, stop sequence, and constructor; report R² and top-3 coefficients. | `pit_stops`, `races`, `constructors` | Requires feature engineering and train/test split. |
| **8** | **Fast-lap specialist (2020-2024):** Which driver recorded the highest *fastest-lap rate* (fastest-lap awards ÷ races entered) over the last five seasons? | `races`, `lap_times`, `drivers` | Multi-season filter and per-race fastest-lap extraction. |
| **9** | **Team-mate qualifying duel:** For every constructor in 2024, compute each driver’s average grid-position advantage over their team-mate; identify the most dominant pairing. | `qualifying`, `constructors`, `drivers` | Pairwise comparison within constructor & race; handle missing sessions. |
| **10** | **Season-on-season improvement (2023 → 2024):** Which constructor improved its **average points per race** the most YoY? Show absolute and percentage change. | `constructor_standings`, `races`, `constructors` | Aggregate by season, normalize by races contested, then diff and rank. |

# Question 1

### “Who gained the most grid positions on average in 2024?”

In [1]:
import analysis as f1   # the helper module we just created

# Top 10 drivers by average positions gained between the grid and the flag
q1 = f1.gain_positions(season=2024, top_n=10)
q1

Unnamed: 0,driverId,gain,code,forename,surname
0,815,3.75,PER,Sergio,Pérez
1,844,2.458333,LEC,Charles,Leclerc
2,840,1.958333,STR,Lance,Stroll
3,830,1.916667,VER,Max,Verstappen
4,1,1.125,HAM,Lewis,Hamilton
5,832,1.0,SAI,Carlos,Sainz
6,4,0.916667,ALO,Fernando,Alonso
7,846,0.458333,NOR,Lando,Norris
8,857,0.416667,PIA,Oscar,Piastri
9,807,-0.041667,HUL,Nico,Hülkenberg


In [2]:
# in simple pandas
import pandas as pd, numpy as np
from pathlib import Path

DATA = Path("Data")              # adjust if your CSV path differs
races  = pd.read_csv(DATA/"races.csv")[["raceId","year"]]
quali  = pd.read_csv(DATA/"qualifying.csv", usecols=["raceId","driverId","position"])
finish = pd.read_csv(DATA/"driver_standings.csv",
                     usecols=["raceId","driverId","position"])

race_ids = races.loc[races.year == 2024, "raceId"]
grid  = (quali[quali.raceId.isin(race_ids)]
         .groupby(["raceId","driverId"], as_index=False)
         .position.min().rename(columns={"position":"grid"}))
flag  = finish[finish.raceId.isin(race_ids)].rename(columns={"position":"finish"})

df = grid.merge(flag, on=["raceId","driverId"]).dropna()
df["gain"] = df.grid - df.finish
(df.groupby("driverId")["gain"]
   .mean()
   .sort_values(ascending=False)
   .head(10))

driverId
815    3.750000
844    2.458333
840    1.958333
830    1.916667
1      1.125000
832    1.000000
4      0.916667
846    0.458333
857    0.416667
807   -0.041667
Name: gain, dtype: float64

# Question 2  ▸ Most consistent pit-stop team in 2024

### For each constructor, compute the standard deviation of pit-stop durations (consider only stops < 5 s). The team with the lowest std-dev is the most consistent.

In [16]:
import importlib, analysis as f1
importlib.reload(f1)

q2 = f1.consistent_pit_stop_team(2024)
q2.head(10)

Unnamed: 0,constructorId,std_sec,name
0,1,3.565256,McLaren
6,131,3.847778,Mercedes
8,214,3.887328,Alpine F1 Team
3,9,3.998399,Red Bull
5,117,4.220559,Aston Martin
9,215,4.702186,RB F1 Team
2,6,4.704309,Ferrari
1,3,4.724565,Williams
4,15,5.265065,Sauber
7,210,5.753934,Haas F1 Team


# Question 3: 
### “Circuit with the highest overtake-rate in 2024.”

In [17]:
import importlib, analysis as f1
importlib.reload(f1)

q3 = f1.circuit_highest_overtake_rate(2024)
q3.head()      # top circuits by overtakes per lap

  # -----------------------------------------------------------------------------


Unnamed: 0,name,circuit,rate
21,Las Vegas Grand Prix,Las Vegas Strip Street Circuit,2.54
13,Belgian Grand Prix,Circuit de Spa-Francorchamps,2.409091
4,Chinese Grand Prix,Shanghai International Circuit,1.839286
15,Italian Grand Prix,Autodromo Nazionale di Monza,1.811321
9,Spanish Grand Prix,Circuit de Barcelona-Catalunya,1.80303


# Question 4
### “Biggest single-lap improvement”

In [28]:
import importlib, analysis as f1
importlib.reload(f1)

q4 = f1.biggest_single_lap_improvement(2024)
q4

driver               Daniel Ricciardo
grand_prix        Canadian Grand Prix
lap                                12
improvement_ms                   2990
dtype: object

# Question 5

### Largest points swing between consecutive races

In [33]:
import importlib, analysis as f1
importlib.reload(f1)

q5 = f1.largest_points_swing(2024)
q5

driver               Max Verstappen
from_gp         Japanese Grand Prix
to_gp            Chinese Grand Prix
points_swing                     33
dtype: object

# Question 6
### Constructor efficiency (points per pit-stop second)

In [34]:
q6 = f1.constructor_efficiency(2024)
q6.head()            # ranked by highest pts/sec

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pit = pit.dropna(subset=["duration"])


Unnamed: 0,constructorId,points,duration,pts_per_sec,name
2,6,652.0,79.488,8.202496,Ferrari
4,214,65.0,22.08,2.943841,Alpine F1 Team
0,1,666.0,3788.489,0.175796,McLaren
1,3,17.0,4877.295,0.003486,Williams
3,15,4.0,1992.245,0.002008,Sauber


# Q7
### Pit-stop duration regression

In [35]:
r2, model = f1.pit_stop_duration_model(2024)
r2   # R-squared

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pit = pit.dropna(subset=["duration"])


0.019174454561327714

# Question 8 
### “Fast-lap specialist” (2020 – 2024)

#### This metric looks for the driver who most often sets the single fastest lap of a Grand Prix over the last five seasons, relative to the number of races they actually entered

In [37]:
import importlib, analysis as f1
importlib.reload(f1)

q8 = f1.fast_lap_specialist(2020, 2024, top_n=10)   # last five seasons
q8

Unnamed: 0_level_0,code,forename,surname,fastlap_rate
driverId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
830,VER,Max,Verstappen,0.269231
1,HAM,Lewis,Hamilton,0.192308
846,NOR,Lando,Norris,0.11215
815,PER,Sergio,Pérez,0.078431
847,RUS,George,Russell,0.075472
857,PIA,Oscar,Piastri,0.066667
844,LEC,Charles,Leclerc,0.058824
817,RIC,Daniel,Ricciardo,0.047619
822,BOT,Valtteri,Bottas,0.04717
832,SAI,Carlos,Sainz,0.039604


# Question 9

### Team-mate qualifying duel – average grid-position advantage within each constructor.

In [45]:
import importlib, analysis as f1
importlib.reload(f1)

q9 = f1.teammate_qualifying_duel(2024)
q9.head(10)

  def _season_points(year):


Unnamed: 0,constructorId,abs_diff,name
3,9,6.583333,Red Bull
6,131,4.916667,Mercedes
7,210,4.791667,Haas F1 Team
5,117,4.458333,Aston Martin
1,3,4.347826,Williams
8,214,4.041667,Alpine F1 Team
9,215,3.875,RB F1 Team
4,15,3.625,Sauber
0,1,3.375,McLaren
2,6,3.083333,Ferrari


# Question 10
### Year-on-year constructor improvement (2023 → 2024)

In [46]:
q10 = f1.constructor_season_improvement(2023, 2024).head(10)
q10

Unnamed: 0,constructorId,prev,curr,abs_change,pct_change,name
0,1,13.727273,27.75,14.022727,1.021523,McLaren
2,6,18.454545,27.166667,8.712121,0.472085,Ferrari
11,215,0.0,1.916667,1.916667,,RB F1 Team
8,210,0.545455,2.416667,1.871212,3.430556,Haas F1 Team
7,131,18.590909,19.5,0.909091,0.0489,Mercedes
4,15,0.0,0.166667,0.166667,,Sauber
1,3,1.272727,0.708333,-0.564394,-0.443452,Williams
5,51,0.727273,0.0,-0.727273,-1.0,Alfa Romeo
9,213,1.136364,0.0,-1.136364,-1.0,AlphaTauri
10,214,5.454545,2.708333,-2.746212,-0.503472,Alpine F1 Team
