# Formula 1 drivers analysis

## Project description

F1 Analytics specializes in analyzing the performance of drivers and teams in the world of Formula 1. The goal of this project is to analyze the results of the 2008 Formula 1 World Championship season using the data contained in the formula1_data.csv file.

This file contains detailed information about drivers, manufacturers, cities, and countries of Grand Prix races, as well as the finishing order of each driver. Based on this data, various functions will be implemented to provide an in-depth analysis of points, wins, and podium finishes at both the individual and manufacturer levels.


## Dataset

The dataset formula1_data.csv (downloadable from here: https://proai-datasets.s3.eu-west-3.amazonaws.com/formula1_data.csv) contains the following columns:
1. **Driver**: Name of the driver.
2. **Team**: Name of the manufacturer for which the driver competes.
3. **Race**: City where the Grand Prix took place.
4. **Country**: Country where the Grand Prix took place.
5. **Position**: Number between 0 and 8 indicating the driver's finishing order (0 means that the driver did not finish in the top 8 and did not score any points).

## Scoring system

At the end of each Grand Prix, points are awarded to drivers based on their finishing order as follows: 
- 1st place: 10 points 
- 2nd place: 8 points 
- 3rd place: 6 points 
- 4th place: 5 points 
- 5th place: 4 points 
- 6th place: 3 points 
- 7th place: 2 points 
- 8th place: 1 point 
- 9th place or lower: 0 points

## Project objectives

The project involves the implementation of the following features:

1. Function for analyzing **individual driver performance**

    The first function receives a driver's name as input and returns a list containing three key pieces of information:
    - The total points accumulated by the driver during the championship.
    - The number of wins, i.e., how many times the driver finished first in a Grand Prix. 
    - The number of podium finishes, i.e., how many times the driver finished in the top three. This function will be useful for analyzing the individual performance of drivers and providing a clear overview of their positions throughout the season.

2. Function for creating the **final driver standings**

    The second function generates a dictionary containing the names of the drivers as keys and their total scores as values. The dictionary is then used to create an overall driver standings.
    Finally, the standings will be saved in a text file (Drivers_Standings_2008.txt) with the
    following format:

    Drivers Standings 2008 Formula 1
    DriverName1: TotalPoints
    DriverName2: TotalPoints

3. Function for the **constructors' standings**

    The third function creates a dictionary with the names of the teams/constructors as keys and their total scores as values. Each team's score is the sum of the points obtained by the drivers who competed for that constructor.
    This function uses the data previously generated for the drivers and calculates the constructors' standings. This information is also essential for gaining a clear picture of the teams' performance throughout the year.


# Analysis

## Import

In [92]:
import pandas as pd
import numpy as np
import matplotlib as plt

## Load dataset

In [93]:
path = "https://proai-datasets.s3.eu-west-3.amazonaws.com/formula1_data.csv"
df = pd.read_csv(path)

In [94]:
df.head()

Unnamed: 0,Driver,Team,Race,Country,Position
0,Hamilton,McLaren,Melbourne,Australia,1
1,Massa,Ferrari,Melbourne,Australia,0
2,Raikkonen,Ferrari,Melbourne,Australia,8
3,Kubica,BMW,Melbourne,Australia,0
4,Alonso,Renault,Melbourne,Australia,4


## Overview

In [95]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 180 entries, 0 to 179
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Driver    180 non-null    object
 1   Team      180 non-null    object
 2   Race      180 non-null    object
 3   Country   180 non-null    object
 4   Position  180 non-null    int64 
dtypes: int64(1), object(4)
memory usage: 7.2+ KB


There aren't Nan values

In [96]:
df['Driver'].value_counts()

Driver
Hamilton      18
Massa         18
Raikkonen     18
Kubica        18
Alonso        18
Heidfeld      18
Kovalainen    18
Vettel        18
Trulli        18
Glock         18
Name: count, dtype: int64

In [97]:
df['Race'].value_counts()

Race
Melbourne      10
Sepang         10
Sakhir         10
Barcellona     10
Istanbul       10
Montecarlo     10
Montreal       10
Magny-Cours    10
Silverstone    10
Hockenheim     10
Hungaroring    10
Valencia       10
Spa            10
Monza          10
Singapore      10
Fuji           10
Shangai        10
Interlagos     10
Name: count, dtype: int64

In [98]:
df['Team'].unique()

array(['McLaren', 'Ferrari', 'BMW', 'Renault', 'Toro Rosso', 'Toyota'],
      dtype=object)

# Analysis with only functions

For this analysis I'll use a copy of the original df

In [99]:
df1=df.copy()

In [100]:
position_to_points = {0:0, 1:10, 2:8, 3:6, 4:5, 5:4, 6:3, 7:2, 8:1}
df1['Points'] = df1['Position'].map(position_to_points)

In [101]:
def competitor_info(competitor:str, name:str)->pd.DataFrame:
    '''
    Return the df with only the records related to the choosen competitor
    :params competitor can be 'Driver' or 'Team'
    :params name: of the single competitor
    '''
    if competitor not in ['Driver', 'Team']:
        print("Input error: insert 'Driver' or 'Team'")
        return None
    else: 
        df_comp = df1[df1[competitor]==name]
        return df_comp

In [102]:
def get_wins(data_series):
    '''
    Return the number of wins in a given dataset of results
    '''
    df_first_place = data_series[data_series['Position']==1]
    wins = df_first_place['Position'].count()
    
    return wins

In [103]:
def get_podiums(data_series):
    '''
    Return the number of podiums in a given dataset of results
    '''
    df_podiums = data_series[(data_series['Position']>0) & (data_series['Position']<4)]
    podiums = df_podiums['Position'].count()
    
    return podiums

In [104]:
def get_performance(competitor:str, name:str, print_info=True):
    '''
    Return a list with the summary of the driver performances along the champioship: 
    total points, number of wins, number of podiums
    '''
    if competitor not in ['Driver', 'Team']:
        print("Input error: insert 'Driver' or 'Team'")
        return None
    else: 
        df_comp = competitor_info(competitor,name)
        
        points_champ = sum(df_comp['Points'])
        wins = get_wins(df_comp)
        podiums = get_podiums(df_comp)
        if print_info:
            print(f"The {competitor} {name} made {points_champ} points in the 2008 championship with {wins} wins and {podiums} podiums")
        
        return [points_champ, wins, podiums]

In [105]:
def get_competitor_list(competitor:str, data:pd.DataFrame)->list:
    '''
    Return a list with the names of the competitor in the dataset in input
    :params competitor can be 'Driver' or 'Team'
    '''
    if competitor not in ['Driver', 'Team']:
        print("Input error: insert 'Driver' or 'Team'")
        return None
    else: 
        names = data[competitor].unique().tolist()
        return names

In [106]:
def final_standings(competitor:str)->dict:
    '''
    Return the dict as {competitor_name:points} descending sorted
    Create a file txt on which is printed the standings
    :params competitor can be 'Driver' or 'Team'
    '''
    if competitor not in ['Driver', 'Team']:
        print("Input error: insert 'Driver' or 'Team'")
        return None
    else: 
        standings_dict={}
        comp_names = get_competitor_list(competitor, df1)
        for name in comp_names:
            points = get_performance(competitor, name, print_info=False)[0]
            standings_dict[name]=points
        
        #sort the dict
        standings_dict = {k: v for k, v in sorted(standings_dict.items(), key=lambda item: item[1], reverse=True)} 

        #write the standing on a txt file
        with open(f"{competitor}_Standings_2008.txt", "w") as f:
            f.write(f"{competitor} Standings 2008 Formula 1 \n")
            for d, p in standings_dict.items():
                f.write(f"{d}: {p}\n")
        
    return standings_dict

## 1. Individual driver performance

In [110]:
driver = 'Hamilton'
hamilton_info = get_performance('Driver', driver)
hamilton_info

The Driver Hamilton made 98 points in the 2008 championship with 5 wins and 10 podiums


[98, np.int64(5), np.int64(10)]

## 2. Final standing

In [108]:
d = final_standings('Driver')
d

{'Hamilton': 98,
 'Massa': 97,
 'Raikkonen': 75,
 'Kubica': 75,
 'Alonso': 61,
 'Heidfeld': 60,
 'Kovalainen': 53,
 'Vettel': 35,
 'Trulli': 31,
 'Glock': 25}

## 3. Team standings

In [109]:
t = final_standings('Team')
t

{'Ferrari': 172,
 'McLaren': 151,
 'BMW': 135,
 'Renault': 61,
 'Toyota': 56,
 'Toro Rosso': 35}

# Analysis with Classes

In [22]:
class F1_champ:
    '''
    Class that model the F1 championship
    '''
    def __init__(self, year:int=None):
        self.year = year      
        self.drivers = {}         #list of drivers in that championship
        self.teams = {}           #list of teams in that championship
        self.races = {}           #list of races in that championship
        self.competitors = {'Drivers':self.drivers, 'Teams':self.teams}
        
    def __repr__(self) -> str:
        return f"{self.year} Formula 1 championship"
    
    def get_standings_dict(self, stand_obj:str)->dict:
        '''
        Return the sorted dictionary (standings) of the objects in collection
        Sort is based on total points
        :params stand_obj: 'Drivers' or 'Teams' as the F1_champ.collection keys
        '''
        obj_dict = self.competitors[stand_obj]
        dict_with_points = {obj.name: obj.total_points() for _,obj in obj_dict.items()}
        standings  = {k: v for k, v in sorted(dict_with_points.items(), 
                key=lambda item: item[1], reverse=True)} 

        return standings
    
    def final_standings(self, competitor:str = 'Drivers', save:bool=True)->dict:
        '''
        Return the standings by drivers or teams
        Manage the input instertion
        :params category: 'Drivers' or 'Teams' which you want to rank
        :params save: default True if you want save the standings in a txt file
        '''
        if competitor in ['Drivers', 'Teams']:
            standings = self.get_standings_dict(competitor)
        else:
            print("Input error: insert 'Drivers' or 'Teams'")
            return None
        
        if save:
            title = f"{competitor} Standings {self.year} Formula 1"
            create_file_txt(standings, title)
            
        return standings

In [23]:
class Competitor:
    '''
    Model of the F1 championship competitors (drivers and teams) 
    with the same attributes (name, results) and methods 
    '''
    
    def __init__(self, name:str):
        self.name = name
        self.results = []
    
    def get_podiums(self) -> int:
        '''
        Return the number of podiums (final position between 1 and 3) along the champioship
        '''
        podiums = len([r.position for r in self.results if r.position>0 and r.position<4])
        return podiums

    def get_wins(self) -> int:
        '''
        Return the number of wins (final position 1) along the champioship
        '''
        wins = len([r.position for r in self.results if r.position>0 and r.position<2])
        return wins

    def total_points(self) -> int:
        '''
        Return the total points of a competitor along the champioship
        '''
        points = sum(self.results[i].points for i in range(0,len(self.results)))
        return points
        
    def get_performance(self) -> list:
        '''
        Return a list of the performance of a competitor: 
        total points, number of wins, number of podiums
        '''
        points = self.total_points()
        wins = self.get_wins()
        podiums = self.get_podiums()

        return [points, wins, podiums]

In [24]:
class Team(Competitor):
    '''
    Model a F1 team
    attributes: name, set of drivers in the team , list of results 
    '''
    def __init__(self, name:str):
        super().__init__(name)
        self.drivers = set()
        self.results = []
        
    def add_driver(self, driver:str) -> None:
        self.drivers.add(driver)
        
    def __repr__(self) -> str:
        return self.name
            

In [25]:
class Driver(Competitor):
    '''
    Model a F1 driver
    attributes: name, belonging team, list of results 
    '''
    
    def __init__(self, name:str, team:Team):
        super().__init__(name)
        self.team = team
        
    def __repr__(self) -> str:
        return self.name
        
    

In [26]:
class Race():
    '''
    Model a F1 race with attirbutes name, country, results
    '''
    
    def __init__(self, name:str, country:str):
         
        self.name = name
        self.country = country
        self.results = []
        
    def __repr__(self) -> str:
        return self.name
    
    def get_standings(self, save:bool=False)->dict:
        '''
        Return the dictionary as {driver:position} in a race
        :params save: default False, if True save the standings of the race in a txt file
        '''
        standings_dict = {result.driver:result.position 
                          for result in sorted(self.results, reverse=True)}
        if save:
            title = f"{self.name} Standings:\n"
            create_file_txt(standings_dict,title)
            
        return standings_dict
        

In [27]:
class Result:
    '''
    Model the single result 
    attributes: driver, team, race, position
    '''
    
    def __init__(self, driver:Driver, team:Team, race:Race, position:int):
        
        points_dict = {0:0, 1:10, 2:8, 3:6, 4:5, 5:4, 6:3, 7:2, 8:1}
        
        self.driver = driver
        self.race = race
        self.position = position
        self.points = points_dict[position]   # map the attribute points from the position
        
        driver.results.append(self)
        team.results.append(self)
        race.results.append(self)
        
    def __repr__(self) -> str:
        return f"Result -> Race:{self.race}, Driver: {self.driver}, Pos: {self.position}, Points: {self.points})"
    
    
    def __lt__(self, other):
        '''
        Allow to sort the result by points
        '''
        return self.points < other.points
    
    

In [28]:
def load_data(dataset:pd.DataFrame, year:int)->F1_champ:
    '''
    Create a championship from the data in input
    Return an object F1_champ 
    :params dataset
    :params year of the championship
    '''

    champ=F1_champ(year)

    drivers = {}
    teams = {}
    races = {}
    for i, row in dataset.iterrows():
        
        driver_name, team_name, race_name, country, position = [str(el).strip() for el in row]
        position = int(position)
        if team_name not in champ.teams:
            champ.teams[team_name] = Team(team_name)
        team_obj = champ.teams[team_name]
        
        if driver_name not in champ.drivers:
            champ.drivers[driver_name] = Driver(driver_name, team_obj)
            team_obj.add_driver(driver_name)
        driver_obj = champ.drivers[driver_name]
        
        
        if race_name not in champ.races:
            champ.races[race_name] = Race(race_name, country)
        race_obj = champ.races[race_name]
        
        result = Result(
            driver=driver_obj, 
            team=team_obj, 
            race=race_obj,  
            position=position)

        
    return champ


In [29]:
def create_file_txt(dict:dict, title:str)->None:
    '''
    Create a txt file
    :param dict: dictionary to save on file
    :param title:string to save as title and first row of the file
    '''
    with open(str(title + ".txt"), "w") as f:
        f.write(f"{title} \n")
        for k,v in dict.items():
            f.write(f"{k}: {v}\n")

### Test of the classes and functions

In [30]:
# main


champ = load_data(df, 2008)
print(f"Loaded {champ} data")


# info on the championship
print(f"Register of drivers in {champ}: \n {[d for _,d in champ.drivers.items()]}\n")
print(f"Register of teams in {champ}: \n {[t for _,t in champ.teams.items()]}\n")
print(f"Register of races in {champ}: \n {[r for _,r in champ.races.items()]}")

# final standings by driver and team
print(f"\n")
print(f"Final Standings for drivers of {champ}:")
for n,p in champ.final_standings('Drivers').items():
    print(f"\t{n}:{p}")
print(f"Final Standings for teams of {champ}:")
for n,p in champ.final_standings('Teams').items():
    print(f"\t{n}:{p}")


# info about driver
print(f"\n")
driver = champ.drivers['Alonso']
perf = driver.get_performance()
print(f"The driver {driver.name} runs for {driver.team}")
print(f"His results are:\n")
for r in driver.results:
    print(f"\t{r}")
print(f"For a total of:\n {perf[0]} points\n {perf[1]} wins\n {perf[2]} podiums")

# info about team
print(f"\n")
team = champ.teams['BMW']
perf = team.get_performance()
print(f"The team {team.name} has as drivers: {[d for d in team.drivers]} ")
print(f"Its results are:\n")
for r in team.results:
    print(f"\t{r}")
print(f"For a total of:\n {perf[0]} points\n {perf[1]} wins\n {perf[2]} podiums")

ValueError: too many values to unpack (expected 5)

In [31]:
df.head()

Unnamed: 0,Driver,Team,Race,Country,Position,Points
0,Hamilton,McLaren,Melbourne,Australia,1,10
1,Massa,Ferrari,Melbourne,Australia,0,0
2,Raikkonen,Ferrari,Melbourne,Australia,8,1
3,Kubica,BMW,Melbourne,Australia,0,0
4,Alonso,Renault,Melbourne,Australia,4,5


# Conclusions

The first methodology is good if you need a fast analysis based on a notebook and the goal is to only extract information as asked in the project description. Since there are only the data of the 2008 championship the dataset isn't set as a input in the functions but directly used inside as a global variable. 

In a more general view the second method allow to model with classes and related functions the whole F1 championship (and even different champioship in different years). In this approach the code can be applied in some .py files to execute and run an analysis on external data loaded.