# Toyota GR Cup Data Exploration

This notebook demonstrates the initial exploration of Toyota GR Cup racing data for the hackathon project.

In [90]:
# Force reload the module to get the latest changes
import importlib
import sys

# Remove modules from cache to force reload
modules_to_reload = [mod for mod in sys.modules.keys() if 'toyota_gr_cup_analytics' in mod]
for mod in modules_to_reload:
    del sys.modules[mod]

print(f"Removed {len(modules_to_reload)} modules from cache")
print("Modules will be reloaded with latest changes")

Removed 18 modules from cache
Modules will be reloaded with latest changes


## Load Race Data

Let's start by loading some sample race data from the Barber track.

In [91]:
# PERMANENT FIX: Clean column names in both dataframes
print("=== CLEANING COLUMN NAMES ===")

# Clean lap data columns
if not lap_data.empty:
    print("Before cleaning lap data:")
    print(f"  Sample columns: {list(lap_data.columns[:3])}")
    
    lap_data.columns = lap_data.columns.str.strip()
    
    print("After cleaning lap data:")
    print(f"  Sample columns: {list(lap_data.columns[:3])}")
    print(f"  Has LAP_TIME: {'LAP_TIME' in lap_data.columns}")

# Clean weather data columns  
if not weather_data.empty:
    print("\nCleaning weather data columns...")
    weather_data.columns = weather_data.columns.str.strip()
    print(f"  Weather columns: {list(weather_data.columns[:3])}")

print("\n✅ Column names cleaned! Now the analysis should work.")

=== CLEANING COLUMN NAMES ===
Before cleaning lap data:
  Sample columns: ['NUMBER', 'DRIVER_NUMBER', 'LAP_NUMBER']
After cleaning lap data:
  Sample columns: ['NUMBER', 'DRIVER_NUMBER', 'LAP_NUMBER']
  Has LAP_TIME: True

Cleaning weather data columns...
  Weather columns: ['TIME_UTC_SECONDS', 'TIME_UTC_STR', 'AIR_TEMP']

✅ Column names cleaned! Now the analysis should work.


In [92]:
# Debug what file is actually being loaded
print("=== DEBUG FILE LOADING ===")
print(f"Lap data empty: {lap_data.empty}")
print(f"Lap data shape: {lap_data.shape}")

if not lap_data.empty:
    print(f"Actual columns in lap_data: {list(lap_data.columns)}")
    print(f"Has LAP_TIME column: {'LAP_TIME' in lap_data.columns}")
    print(f"Column names containing 'LAP': {[col for col in lap_data.columns if 'LAP' in col.upper()]}")
    print(f"Column names containing 'TIME': {[col for col in lap_data.columns if 'TIME' in col.upper()]}")
else:
    print("Lap data is empty - file loading failed!")

# Also check weather data
print(f"\nWeather data empty: {weather_data.empty}")
print(f"Weather data shape: {weather_data.shape}")

if not weather_data.empty:
    print(f"Weather columns: {list(weather_data.columns)}")
else:
    print("Weather data is empty - file loading failed!")

=== DEBUG FILE LOADING ===
Lap data empty: False
Lap data shape: (579, 39)
Actual columns in lap_data: ['NUMBER', 'DRIVER_NUMBER', 'LAP_NUMBER', 'LAP_TIME', 'LAP_IMPROVEMENT', 'CROSSING_FINISH_LINE_IN_PIT', 'S1', 'S1_IMPROVEMENT', 'S2', 'S2_IMPROVEMENT', 'S3', 'S3_IMPROVEMENT', 'KPH', 'ELAPSED', 'HOUR', 'S1_LARGE', 'S2_LARGE', 'S3_LARGE', 'TOP_SPEED', 'PIT_TIME', 'CLASS', 'GROUP', 'MANUFACTURER', 'FLAG_AT_FL', 'S1_SECONDS', 'S2_SECONDS', 'S3_SECONDS', 'IM1a_time', 'IM1a_elapsed', 'IM1_time', 'IM1_elapsed', 'IM2a_time', 'IM2a_elapsed', 'IM2_time', 'IM2_elapsed', 'IM3a_time', 'IM3a_elapsed', 'FL_time', 'FL_elapsed']
Has LAP_TIME column: True
Column names containing 'LAP': ['LAP_NUMBER', 'LAP_TIME', 'LAP_IMPROVEMENT', 'ELAPSED', 'IM1a_elapsed', 'IM1_elapsed', 'IM2a_elapsed', 'IM2_elapsed', 'IM3a_elapsed', 'FL_elapsed']
Column names containing 'TIME': ['LAP_TIME', 'PIT_TIME', 'IM1a_time', 'IM1_time', 'IM2a_time', 'IM2_time', 'IM3a_time', 'FL_time']

Weather data empty: False
Weather data s

## Analyze Lap Progression

Let's analyze the lap progression to understand tire degradation and performance trends.

In [93]:
# Analyze lap progression
lap_analysis = analyze_lap_progression(lap_data)
print("Lap Progression Analysis:")
for key, value in lap_analysis.items():
    print(f"  {key}: {value}")

2025-11-09 09:30:27.989 | INFO     | toyota_gr_cup_analytics.analysis.lap_analysis:analyze_lap_progression:18 - Analyzing lap progression
2025-11-09 09:30:27.992 | DEBUG    | toyota_gr_cup_analytics.analysis.lap_analysis:analyze_lap_progression:32 - Processing 579 lap times
2025-11-09 09:30:27.992 | DEBUG    | toyota_gr_cup_analytics.analysis.lap_analysis:analyze_lap_progression:38 - Processing lap time 0: '1:54.168'
2025-11-09 09:30:27.992 | DEBUG    | toyota_gr_cup_analytics.analysis.lap_analysis:analyze_lap_progression:47 - Converted '1:54.168' to 114.168 seconds
2025-11-09 09:30:27.992 | DEBUG    | toyota_gr_cup_analytics.analysis.lap_analysis:analyze_lap_progression:38 - Processing lap time 1: '2:13.691'
2025-11-09 09:30:27.992 | DEBUG    | toyota_gr_cup_analytics.analysis.lap_analysis:analyze_lap_progression:47 - Converted '2:13.691' to 133.691 seconds
2025-11-09 09:30:27.992 | DEBUG    | toyota_gr_cup_analytics.analysis.lap_analysis:analyze_lap_progression:38 - Processing lap ti

In [94]:
# Force reload visualization module to get latest changes
import importlib
import sys

# Remove visualization modules from cache
viz_modules = [mod for mod in sys.modules.keys() if 'toyota_gr_cup_analytics.visualization' in mod]
for mod in viz_modules:
    del sys.modules[mod]

# Re-import the visualization function
from toyota_gr_cup_analytics.visualization import create_lap_time_chart

print(f"Reloaded {len(viz_modules)} visualization modules")
print("Visualization functions updated with latest changes")

Reloaded 0 visualization modules
Visualization functions updated with latest changes


In [95]:
# Debug the actual data to understand the driver distribution
print("=== DATA ANALYSIS FOR VISUALIZATION ===")

if not lap_data.empty and 'DRIVER_NUMBER' in lap_data.columns:
    print(f"Total lap records: {len(lap_data)}")
    print(f"Unique drivers: {sorted(lap_data['DRIVER_NUMBER'].unique())}")
    print(f"Driver counts:")
    driver_counts = lap_data['DRIVER_NUMBER'].value_counts().sort_index()
    for driver, count in driver_counts.items():
        print(f"  Driver {driver}: {count} laps")
    
    # Show sample data for first few drivers
    print(f"\nSample data structure:")
    sample_data = lap_data[['DRIVER_NUMBER', 'LAP_NUMBER', 'LAP_TIME']].head(10)
    print(sample_data)
    
    # Check if NUMBER column might be the car number
    if 'NUMBER' in lap_data.columns:
        print(f"\nUnique car numbers (NUMBER column): {sorted(lap_data['NUMBER'].unique())}")
else:
    print("No driver data available for analysis")

=== DATA ANALYSIS FOR VISUALIZATION ===
Total lap records: 579
Unique drivers: [np.int64(1)]
Driver counts:
  Driver 1: 579 laps

Sample data structure:
   DRIVER_NUMBER  LAP_NUMBER  LAP_TIME
0              1           1  1:54.168
1              1           2  2:13.691
2              1           3  1:58.021
3              1           4  1:40.861
4              1           5  1:39.725
5              1           6  1:40.323
6              1           7  1:39.387
7              1           8  1:39.870
8              1           9  1:39.167
9              1          10  1:39.284

Unique car numbers (NUMBER column): [np.int64(2), np.int64(3), np.int64(5), np.int64(7), np.int64(13), np.int64(18), np.int64(21), np.int64(22), np.int64(31), np.int64(41), np.int64(46), np.int64(47), np.int64(51), np.int64(55), np.int64(58), np.int64(72), np.int64(80), np.int64(88), np.int64(93), np.int64(98), np.int64(111), np.int64(113)]


# Create interactive lap time chart
lap_chart = create_lap_time_chart(lap_data, "Barber Race 1 - Lap Times")
lap_chart.show()

# The chart now displays real lap times for each driver with:
# - Individual driver traces in different colors
# - Interactive hover information
# - Fastest lap annotation
# - Lap time progression analysis

In [97]:
# Create lap time chart
lap_chart = create_lap_time_chart(lap_data, "Barber Race 1 - Lap Times")
lap_chart.show()

# Note: This will show a placeholder chart until real data is loaded

2025-11-09 09:31:55.363 | INFO     | toyota_gr_cup_analytics.visualization.charts:create_lap_time_chart:21 - Creating lap time chart: Barber Race 1 - Lap Times


## Next Steps

1. **Real Data Integration**: Load actual CSV files from the dataset
2. **Model Development**: Build tire degradation and lap time prediction models
3. **Strategy Optimization**: Implement pit stop optimization algorithms
4. **Dashboard Creation**: Build interactive Dash/Streamlit dashboard

This notebook provides the foundation for exploring Toyota GR Cup racing data and developing innovative analytics tools.