## Project Proposal — Sam Sieloff

### Research Objective
For my **DATA 720 Final Project**, I will analyze **NASCAR pit stop performance** to determine how pit crew efficiency impacts overall race results.  

---


### Background

**NASCAR (National Association for Stock Car Auto Racing)** is a professional motorsport series where 40 drivers race weekly across a variety of track types — from short ovals to superspeedways and road courses.

Every driver has a designated **pit box**, where the pit crew performs services to the car during the race. These **pit stops** allow teams to:
- Change tires  
- Refuel the car  
- Make handling or aerodynamic adjustments  

Because NASCAR races are often decided by fractions of a second, pit stop efficiency is extremely important. A single slow stop can cost a driver several positions, while a fast and stop can gain valuable track position.

Teams typically make multiple pit stops throughout each race, depending on fuel strategy, tire wear, and caution periods. As a result, pit crew performance plays a major role in the overall race outcome and championship standings.

This project will focus on analyzing pit stop data from the 2025 NASCAR season to better understand how pit crew performance affects race results. By examining patterns in stop times, positions gained or lost, and crew consistency, I aim to identify trends and relationships that can provide strategic insights into what makes a pit crew truly elite.

---


### Data Source and Quality
The dataset comes from the publicly available [**NASCAR API**](https://github.com/ooohfascinating/NascarApi/tree/main), which contains detailed race and pit stop information in JSON format.  

The primary dataset includes **26 columns** with information about:
- The race and driver IDs  
- Pit stop duration  
- Tires changed (left, right, four tire stop)  
- Positions gained or lost during the stop  

Each row represents one **pit stop** for a given driver during a specific race. Because drivers make multiple stops per race, they appear multiple times per event.  

To perform a complete analysis, I will combine the pit stop data with race results (using the `race_id` key) to include finishing position, laps completed, and track characteristics.  

**Data quality:**  
The NASCAR API data is regularly updated and standardized across races. Initial exploration shows minimal missing fields and consistent keys.



In [4]:
from src.fetch_pit_data import fetch_pit_data
pit_data_2025 = fetch_pit_data(2025,1)
pit_data_2025.to_csv('pit_data_2025.csv', index=False)

In [5]:
import pandas as pd

In [6]:
pit_data = pd.read_csv('pit_data_2025.csv')
pit_data.head()

Unnamed: 0,vehicle_number,driver_name,vehicle_manufacturer,leader_lap,lap_count,pit_in_flag_status,pit_out_flag_status,pit_in_race_time,pit_out_race_time,total_duration,...,left_front_tire_changed,left_rear_tire_changed,right_front_tire_changed,right_rear_tire_changed,previous_lap_time,next_lap_time,pit_in_rank,pit_out_rank,positions_gained_lost,race_id
0,24,William Byron,Chv,10,10,2,2,666.638,882.668,11614.947,...,False,False,False,False,0,0,1,1,0,5546
1,2,Austin Cindric,Frd,10,10,2,2,668.834,885.746,11615.829,...,False,False,True,True,0,0,2,2,0,5546
2,10,Ty Dillon,Chv,10,10,2,2,669.462,887.356,11616.811,...,False,False,False,False,0,0,3,3,0,5546
3,19,Chase Briscoe,Tyt,10,10,2,2,670.433,889.714,11618.198,...,False,False,False,False,0,0,4,4,0,5546
4,22,Joey Logano,Frd,10,10,2,2,672.424,903.406,11629.899,...,True,True,True,True,0,0,6,5,1,5546


### Potential Questions to Be Answered
1. Which pit crews were the most efficient over the course of the year?  
   - Using metrics like average stop time, positions gained/lost, and fastest four tire stop.  

2. How much impact does the pit crew have on the result of a race?  
   - Is there a trend between average pit time and finishing position when joined with race data?  

3. Can pit stop data be used to predict finishing position?

4. For teams that changed pit crews mid-season, what is the impact on performance?  
