# 🏁 F1 Dataset Overview

This notebook provides an initial exploration of the Formula 1 World Championship dataset (1950–2020) from Kaggle.  
We will inspect the structure, content, and relationships of the CSV files to guide further analysis and modeling.   on various
The dataset includes information

In [3]:
# 📦 Imports
import pandas as pd
from pathlib import Path

# 📁 DATA_DIR = Path("../../data/raw/rohanrao_formula-1-world-championship-1950-2020")

# Load datasets
drivers = pd.read_csv(DATA_DIR / "drivers.csv", parse_dates=["dob"])
races = pd.read_csv(DATA_DIR / "races.csv")
results = pd.read_csv(DATA_DIR / "results.csv")

In [4]:
# 📂 List all CSV files
csv_files = sorted(DATA_DIR.glob("*.csv"))

print(f"Found {len(csv_files)} CSV files:")
for f in csv_files:
    print("•", f.name)

Found 14 CSV files:
• circuits.csv
• constructor_results.csv
• constructor_standings.csv
• constructors.csv
• driver_standings.csv
• drivers.csv
• lap_times.csv
• pit_stops.csv
• qualifying.csv
• races.csv
• results.csv
• seasons.csv
• sprint_results.csv
• status.csv


In [5]:
# 🧾 Load all CSVs into memory
dfs = {f.stem: pd.read_csv(f) for f in csv_files}

In [6]:
# 🔍 Display basic info on each table
for name, df in dfs.items():
    print(f"\n📄 {name.upper()} — shape: {df.shape}")
    display(df.head(3))


📄 CIRCUITS — shape: (77, 9)


Unnamed: 0,circuitId,circuitRef,name,location,country,lat,lng,alt,url
0,1,albert_park,Albert Park Grand Prix Circuit,Melbourne,Australia,-37.8497,144.968,10,http://en.wikipedia.org/wiki/Melbourne_Grand_P...
1,2,sepang,Sepang International Circuit,Kuala Lumpur,Malaysia,2.76083,101.738,18,http://en.wikipedia.org/wiki/Sepang_Internatio...
2,3,bahrain,Bahrain International Circuit,Sakhir,Bahrain,26.0325,50.5106,7,http://en.wikipedia.org/wiki/Bahrain_Internati...



📄 CONSTRUCTOR_RESULTS — shape: (12625, 5)


Unnamed: 0,constructorResultsId,raceId,constructorId,points,status
0,1,18,1,14.0,\N
1,2,18,2,8.0,\N
2,3,18,3,9.0,\N



📄 CONSTRUCTOR_STANDINGS — shape: (13391, 7)


Unnamed: 0,constructorStandingsId,raceId,constructorId,points,position,positionText,wins
0,1,18,1,14.0,1,1,1
1,2,18,2,8.0,3,3,0
2,3,18,3,9.0,2,2,0



📄 CONSTRUCTORS — shape: (212, 5)


Unnamed: 0,constructorId,constructorRef,name,nationality,url
0,1,mclaren,McLaren,British,http://en.wikipedia.org/wiki/McLaren
1,2,bmw_sauber,BMW Sauber,German,http://en.wikipedia.org/wiki/BMW_Sauber
2,3,williams,Williams,British,http://en.wikipedia.org/wiki/Williams_Grand_Pr...



📄 DRIVER_STANDINGS — shape: (34863, 7)


Unnamed: 0,driverStandingsId,raceId,driverId,points,position,positionText,wins
0,1,18,1,10.0,1,1,1
1,2,18,2,8.0,2,2,0
2,3,18,3,6.0,3,3,0



📄 DRIVERS — shape: (861, 9)


Unnamed: 0,driverId,driverRef,number,code,forename,surname,dob,nationality,url
0,1,hamilton,44,HAM,Lewis,Hamilton,1985-01-07,British,http://en.wikipedia.org/wiki/Lewis_Hamilton
1,2,heidfeld,\N,HEI,Nick,Heidfeld,1977-05-10,German,http://en.wikipedia.org/wiki/Nick_Heidfeld
2,3,rosberg,6,ROS,Nico,Rosberg,1985-06-27,German,http://en.wikipedia.org/wiki/Nico_Rosberg



📄 LAP_TIMES — shape: (589081, 6)


Unnamed: 0,raceId,driverId,lap,position,time,milliseconds
0,841,20,1,1,1:38.109,98109
1,841,20,2,1,1:33.006,93006
2,841,20,3,1,1:32.713,92713



📄 PIT_STOPS — shape: (11371, 7)


Unnamed: 0,raceId,driverId,stop,lap,time,duration,milliseconds
0,841,153,1,1,17:05:23,26.898,26898
1,841,30,1,1,17:05:52,25.021,25021
2,841,17,1,11,17:20:48,23.426,23426



📄 QUALIFYING — shape: (10494, 9)


Unnamed: 0,qualifyId,raceId,driverId,constructorId,number,position,q1,q2,q3
0,1,18,1,1,22,1,1:26.572,1:25.187,1:26.714
1,2,18,9,2,4,2,1:26.103,1:25.315,1:26.869
2,3,18,5,1,23,3,1:25.664,1:25.452,1:27.079



📄 RACES — shape: (1125, 18)


Unnamed: 0,raceId,year,round,circuitId,name,date,time,url,fp1_date,fp1_time,fp2_date,fp2_time,fp3_date,fp3_time,quali_date,quali_time,sprint_date,sprint_time
0,1,2009,1,1,Australian Grand Prix,2009-03-29,06:00:00,http://en.wikipedia.org/wiki/2009_Australian_G...,\N,\N,\N,\N,\N,\N,\N,\N,\N,\N
1,2,2009,2,2,Malaysian Grand Prix,2009-04-05,09:00:00,http://en.wikipedia.org/wiki/2009_Malaysian_Gr...,\N,\N,\N,\N,\N,\N,\N,\N,\N,\N
2,3,2009,3,17,Chinese Grand Prix,2009-04-19,07:00:00,http://en.wikipedia.org/wiki/2009_Chinese_Gran...,\N,\N,\N,\N,\N,\N,\N,\N,\N,\N



📄 RESULTS — shape: (26759, 18)


Unnamed: 0,resultId,raceId,driverId,constructorId,number,grid,position,positionText,positionOrder,points,laps,time,milliseconds,fastestLap,rank,fastestLapTime,fastestLapSpeed,statusId
0,1,18,1,1,22,1,1,1,1,10.0,58,1:34:50.616,5690616,39,2,1:27.452,218.3,1
1,2,18,2,2,3,5,2,2,2,8.0,58,+5.478,5696094,41,3,1:27.739,217.586,1
2,3,18,3,3,7,7,3,3,3,6.0,58,+8.163,5698779,41,5,1:28.090,216.719,1



📄 SEASONS — shape: (75, 2)


Unnamed: 0,year,url
0,2009,http://en.wikipedia.org/wiki/2009_Formula_One_...
1,2008,http://en.wikipedia.org/wiki/2008_Formula_One_...
2,2007,http://en.wikipedia.org/wiki/2007_Formula_One_...



📄 SPRINT_RESULTS — shape: (360, 16)


Unnamed: 0,resultId,raceId,driverId,constructorId,number,grid,position,positionText,positionOrder,points,laps,time,milliseconds,fastestLap,fastestLapTime,statusId
0,1,1061,830,9,33,2,1,1,1,3,17,25:38.426,1538426,14,1:30.013,1
1,2,1061,1,131,44,1,2,2,2,2,17,+1.430,1539856,17,1:29.937,1
2,3,1061,822,131,77,3,3,3,3,1,17,+7.502,1545928,17,1:29.958,1



📄 STATUS — shape: (139, 2)


Unnamed: 0,statusId,status
0,1,Finished
1,2,Disqualified
2,3,Accident


In [7]:
# 📊 Riepilogo degli shape delle tabelle
for name, df in dfs.items():
    print(f"{name.upper()}: {df.shape}")

CIRCUITS: (77, 9)
CONSTRUCTOR_RESULTS: (12625, 5)
CONSTRUCTOR_STANDINGS: (13391, 7)
CONSTRUCTORS: (212, 5)
DRIVER_STANDINGS: (34863, 7)
DRIVERS: (861, 9)
LAP_TIMES: (589081, 6)
PIT_STOPS: (11371, 7)
QUALIFYING: (10494, 9)
RACES: (1125, 18)
RESULTS: (26759, 18)
SEASONS: (75, 2)
SPRINT_RESULTS: (360, 16)
STATUS: (139, 2)


In [8]:
# 🧼 Null values per table
for name, df in dfs.items():
    nulls = df.isnull().sum()
    if nulls.sum() > 0:
        print(f"\n⚠️ {name.upper()} — Nulls:")
        print(nulls[nulls > 0])


⚠️ QUALIFYING — Nulls:
q2    22
q3    46
dtype: int64
