# Formula One Insights with Pyton & SQL
Since its inception in the 1950s, Formula One has represented the pinnacle of global motorsport, pushing the boundaries of racing and automotive engineering. This analysis leverages Python and SQL to uncover insights into the achievements of drivers and constructors across F1's decades-long history.

*For a more detailed exploration, please refer to the accompanying PDF document. The comments within this Jupyter notebook are provided exclusively to explain the functionality of the code.*
*The values and data in this Jupyter Notebook were last updated on the 17th of January 2025.*

# Organizing the Notebook into Multiple Parts
Due to the extensive code in this notebook, it has been divided into five parts. The first notebook focuses on retrieving data from the formula1.com website. The second notebook handles data retrieval from the F1DB database. The third notebook is dedicated to creating statistics and visualizations. The fourth notebook explores the question of who is the Greatest Driver of All Time. The fifth notebook consolidates multiple CSV files into separate Excel worksheets.

# Combining Multiple CSV Files into Separate Excel Worksheets

We are consolidating the numerous CSV files generated during our analysis into two Excel worksheets to facilitate easier data handling for the upcoming Tableau analysis in the next project. The *InitialDataSets* worksheet will include all the originally retrieved and cleaned files, while the *ResultsDataSets* worksheet will contain all the files produced from our various analyses.

# Importing Python Libraries
This Jupyter notebook is designed to run on most modern Python installations. However, to ensure reproducibility, note that it was developed and tested with Python 3.12.3. The following libraries and their respective versions were used in this analysis:

- pandas 2.2.2
- openpyxl 3.1.5

In [1]:
# Import libraries
import os
import pandas as pd
from openpyxl import load_workbook
from openpyxl.utils import get_column_letter
from openpyxl.styles import Font, Alignment

print("Libraries imported")

Libraries imported


# Combining the *CSV* Folder

This step consolidates the original retrieved datasets into a single Excel worksheet. Additionally, we will format the worksheet to enhance its readability and presentation.

In [2]:
# Create Excel directory
os.makedirs("excel", exist_ok=True)

# Folder containing the CSV files
csv_folder = "csv"

# Output Excel file
output_excel = "excel/InitialDataSets.xlsx"

# Create an Excel writer using openpyxl
with pd.ExcelWriter(output_excel, engine='openpyxl') as writer:
    # Iterate over all CSV files in the folder
    for file in os.listdir(csv_folder):
        if file.endswith(".csv"):
            # Read the CSV file
            file_path = os.path.join(csv_folder, file)
            df = pd.read_csv(file_path)
            
            # Convert the file name to PascalCase for the sheet name
            sheet_name = file.replace(".csv", "").replace("_", " ").title().replace(" ", "")
            
            # Write the DataFrame to the Excel sheet
            df.to_excel(writer, sheet_name=sheet_name, index=False)

# Load the workbook to apply formatting
wb = load_workbook(output_excel)

# Iterate through all sheets to apply formatting
for sheet in wb.sheetnames:
    ws = wb[sheet]
    
    # Set font to Arial for all cells
    arial_font = Font(name="Arial", size=12)
    for row in ws.iter_rows():
        for cell in row:
            cell.font = arial_font
    
    # Adjust column widths
    for col in ws.columns:
        max_length = 0
        col_letter = get_column_letter(col[0].column)  # Get the column letter
        for cell in col:
            try:
                if cell.value:
                    max_length = max(max_length, len(str(cell.value)))
            except:
                pass
        ws.column_dimensions[col_letter].width = max_length + 2  # Add extra space

    # Remove borders for the header row
    for cell in ws[1]:
        cell.border = None

# Save the workbook
wb.save(output_excel)

print(f"Formatted Excel file has been saved as {output_excel}.")

Formatted Excel file has been saved as excel/InitialDataSets.xlsx.


# Combining the *Results* Folder

Next, we consolidate all the datasets generated during our various analyses into a single, organized Excel worksheet.

In [3]:
# Folder containing the CSV files
csv_folder = "results"

# Output Excel file
output_excel = "excel/ResultsDataSets.xlsx"

# Create an Excel writer using openpyxl
with pd.ExcelWriter(output_excel, engine='openpyxl') as writer:
    # Iterate over all CSV files in the folder
    for file in os.listdir(csv_folder):
        if file.endswith(".csv"):
            # Read the CSV file
            file_path = os.path.join(csv_folder, file)
            df = pd.read_csv(file_path)
            
            # Convert the file name to PascalCase for the sheet name
            sheet_name = file.replace(".csv", "").replace("_", " ").title().replace(" ", "")
            
            # Write the DataFrame to the Excel sheet
            df.to_excel(writer, sheet_name=sheet_name, index=False)

# Load the workbook to apply formatting
wb = load_workbook(output_excel)

# Iterate through all sheets to apply formatting
for sheet in wb.sheetnames:
    ws = wb[sheet]
    
    # Set font to Arial for all cells
    arial_font = Font(name="Arial", size=12)
    for row in ws.iter_rows():
        for cell in row:
            cell.font = arial_font
    
    # Adjust column widths
    for col in ws.columns:
        max_length = 0
        col_letter = get_column_letter(col[0].column)  # Get the column letter
        for cell in col:
            try:
                if cell.value:
                    max_length = max(max_length, len(str(cell.value)))
            except:
                pass
        ws.column_dimensions[col_letter].width = max_length + 2  # Add extra space

    # Remove borders for the header row
    for cell in ws[1]:
        cell.border = None

# Save the workbook
wb.save(output_excel)

print(f"Formatted Excel file has been saved as '{output_excel}'.")

Formatted Excel file has been saved as 'excel/ResultsDataSets.xlsx'.
