# Formula 1 Driver Greatness Index (DGI) Notebook
This notebook calculates a composite metric, **Driver Greatness Index (DGI)**, to identify the all-time best Formula 1 drivers. The DGI combines multiple performance metrics such as podium finishes, pole positions, wins from non-pole positions, and dominance over teammates into a single score.

We aim to go beyond simple statistics like total wins or championships to capture a more nuanced picture of driver greatness.

Step 1: Define the Scoring Components

Here’s a breakdown of the key factors and their weights:

1. Teammate Dominance
	•	Metric: Percentage of races finished ahead of their teammate.
	•	Why: F1 teammates drive the same car, making this a direct performance comparison.
	•	Formula: (Number of times driver finishes ahead of teammate) / (Races with teammate) * Weight

2. Podium Percentage
	•	Metric: Percentage of races with podium finishes.
	•	Why: Consistency and top-tier performance.
	•	Formula: (Number of podiums) / (Total races) * Weight

3. Pole Positions
	•	Metric: Total number of pole positions.
	•	Why: Reflects raw speed and qualifying mastery.
	•	Formula: (Pole positions) * Weight

4. Wins from Non-Pole Positions
	•	Metric: Weighted wins based on starting position.
	•	Why: Highlights drivers who overcome grid disadvantages.
	•	Formula: Sum of points based on starting position (e.g., P2 = +1, P5 = +5).

5. Championship Wins
	•	Metric: Total number of championships.
	•	Why: Ultimate measure of success in F1.
	•	Formula: (Championship wins) * Weight

6. Longevity and Versatility
	•	Metric: Career length and number of constructors driven for.
	•	Why: Shows adaptability and sustained excellence.
	•	Formula: (Number of seasons) + (Number of constructors driven for) * Weight

Step 2: Data Preparation

Prepare the dataset to calculate each component. Here’s how:
	1.	Teammate Comparison:
	•	Use results and drivers tables to determine race-by-race teammate performances.
	2.	Podium Percentage:
	•	Count podium finishes (P1–P3) from the results table.
	3.	Poles and Non-Pole Wins:
	•	Use qualifying results (P1 for poles).
	•	Match grid position and final position to compute wins from non-pole.
	4.	Championship Wins:
	•	Derive from the drivers or constructorStandings table.
	5.	Longevity:
	•	Use the first and last race year per driver from the races table.
	•	Count unique constructors per driver.

Step 3: Scoring System

Assign weights to each metric. For example:
Metric	Weight
Teammate Dominance	25%
Podium Percentage	20%
Wins from Non-Pole Positions	20%
Pole Positions	15%
Championship Wins	10%
Longevity and Versatility	10%

Step 4: Calculate Metrics

Calculate the individual components of the Driver Greatness Index (DGI):
1.	Teammate Dominance
    •	Compute teammate dominance for each race.
2. Podium Percentage 
    •	Calculate the percentage of races where a driver finishes in the top 3.
3. Wins from Non-Pole Positions
    •	Count wins starting from grid positions other than P1 and weight them.
4.  Pole Positions
    •	Count the number of times a driver starts from P1 on the grid.
5. Championship Wins
    •	Count the number of championships each driver has won.
6.  Longevity
    •	Calculate the number of seasons and constructors for each driver.

Step 5: Combine Metrics into DGI

Normalize and weight each metric, then sum them up.

## Step 1: Set Up and Connect to the Database
In this step, we connect to the F1DB SQLite database, which contains historical Formula 1 data, and list the available tables. This dataset includes drivers, race results, constructors, and other essential details.

In [None]:
from sqlalchemy import create_engine
import pandas as pd

# Connect to the SQLite database
db_path = 'sqlite/f1db.sqlite'  # Update to your local path
engine = create_engine(f'sqlite:///{db_path}')

# List available tables
tables = engine.table_names()
print("Available tables:", tables)

## Step 2: Load and Explore Relevant Tables
We load the key tables needed for our analysis, such as drivers, race results, constructors, and championship standings. These tables will form the foundation for our calculations.

In [None]:
# Load tables
drivers = pd.read_sql('SELECT * FROM drivers', engine)
results = pd.read_sql('SELECT * FROM results', engine)
races = pd.read_sql('SELECT * FROM races', engine)
constructors = pd.read_sql('SELECT * FROM constructors', engine)
driver_standings = pd.read_sql('SELECT * FROM driverStandings', engine)

# Preview key data
print(drivers.head())
print(results.head())
print(races.head())
print(constructors.head())
print(driver_standings.head())

## Step 3: Merge and Prepare Data
To calculate our metrics, we need to merge and combine tables:

- **Results with Races:** Links race metadata (e.g., year, circuit).
- **Add Driver and Constructor Details:** Enhances the dataset with driver and team information.

In [None]:
# Merge results with races
results_races = results.merge(races, on='raceId', suffixes=('', '_race'))

# Add driver and constructor details
results_races_drivers = results_races.merge(drivers, on='driverId', suffixes=('', '_driver'))
results_full = results_races_drivers.merge(constructors, on='constructorId', suffixes=('', '_constructor'))
print(results_full.head())

## Step 4: Calculate Metrics
This step calculates the individual components of the Driver Greatness Index (DGI):

- **Teammate Dominance:** Measures how often a driver outperforms their teammate in the same car.
- **Podium Percentage:** Reflects consistency in finishing in the top 3.
- **Wins from Non-Pole Positions:** Highlights a driver's ability to win from challenging starting positions.
- **Pole Positions:** Captures qualifying performance.
- **Championship Wins:** Counts the number of championship titles.
- **Longevity and Versatility:** Accounts for the number of seasons and teams driven for.

In [None]:
# Teammate dominance
teammate_performance = results_full.groupby(['raceId', 'constructorId']).apply(
    lambda group: group.sort_values('positionOrder')
).reset_index(drop=True)
teammate_performance['is_teammate_beaten'] = (
    teammate_performance.groupby(['raceId', 'constructorId'])['positionOrder']
    .rank(method='min', ascending=True) == 1
)
teammate_dominance = teammate_performance.groupby('driverId')['is_teammate_beaten'].mean() * 100

# Podium percentage
podium_finishes = results_full[results_full['positionOrder'] <= 3]
podium_percentage = podium_finishes.groupby('driverId').size() / results_full.groupby('driverId').size() * 100
podium_percentage.fillna(0, inplace=True)

# Wins from non-pole
non_pole_wins = results_full[(results_full['positionOrder'] == 1) & (results_full['grid'] > 1)]
non_pole_points = non_pole_wins.groupby('driverId')['grid'].sum()

# Pole positions
pole_positions = results_full[results_full['grid'] == 1].groupby('driverId').size()

# Championship wins
championship_wins = driver_standings[driver_standings['position'] == 1].groupby('driverId').size()
championship_wins.fillna(0, inplace=True)

# Longevity and versatility
career_span = results_full.groupby('driverId')['year'].agg(['min', 'max'])
career_length = career_span['max'] - career_span['min'] + 1
num_constructors = results_full.groupby('driverId')['constructorId'].nunique()

print(teammate_dominance.head())
print(podium_percentage.head())

## Step 5: Combine Metrics into the Driver Greatness Index (DGI)
In this step, we normalize all metrics and assign weights to create a single composite score, the **Driver Greatness Index (DGI)**. The weights reflect the relative importance of each metric.

In [None]:
# Combine metrics into a DataFrame
metrics = pd.DataFrame({
    'teammate_dominance': teammate_dominance,
    'podium_percentage': podium_percentage,
    'non_pole_points': non_pole_points,
    'pole_positions': pole_positions,
    'championship_wins': championship_wins,
    'career_length': career_length,
    'num_constructors': num_constructors,
}).fillna(0)

# Normalize metrics
metrics_normalized = (metrics - metrics.min()) / (metrics.max() - metrics.min())

# Assign weights and calculate DGI
weights = {
    'teammate_dominance': 0.25,
    'podium_percentage': 0.20,
    'non_pole_points': 0.20,
    'pole_positions': 0.15,
    'championship_wins': 0.10,
    'career_length': 0.05,
    'num_constructors': 0.05,
}
metrics_normalized['DGI'] = sum(metrics_normalized[col] * weight for col, weight in weights.items())

# Sort by DGI
metrics_normalized = metrics_normalized.sort_values('DGI', ascending=False)
print(metrics_normalized.head(10))

## Step 6: Visualize the Top 30 Drivers by DGI
Finally, we visualize the top 30 drivers by their Driver Greatness Index using a bar chart.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Prepare data for visualization
top_30_drivers = metrics_normalized.nlargest(30, 'DGI').reset_index()
top_30_drivers = top_30_drivers.merge(drivers[['driverId', 'forename', 'surname']], on='driverId')
top_30_drivers['full_name'] = top_30_drivers['forename'] + " " + top_30_drivers['surname']

# Sort for visualization
top_30_drivers = top_30_drivers.sort_values('DGI', ascending=True)

# Plot
plt.figure(figsize=(12, 10))
sns.barplot(
    x='DGI',
    y='full_name',
    data=top_30_drivers,
    palette='viridis'
)
plt.title('Top 30 Formula 1 Drivers by Driver Greatness Index (DGI)', fontsize=16)
plt.xlabel('Driver Greatness Index (DGI)', fontsize=12)
plt.ylabel('Driver', fontsize=12)
plt.tight_layout()
plt.show()

Next Steps
	1.	Visualize Results: Create a bar chart for the top 10 drivers by DGI.
	2.	Add Interactivity: Allow users to adjust weights dynamically using ipywidgets.
	3.	Notebook Narrative: Add Markdown to explain each metric and its significance.