A data analysis project examining 6,000 vehicle safety complaints from the National Highway Traffic Safety Administration (NHTSA) to identify field failure patterns, high-risk components, and manufacturer safety trends.
This project simulates the kind of analysis performed by field reliability data teams — specifically the work of identifying which vehicle components fail most frequently, at what mileage, and under what conditions. The dataset mirrors the structure of real NHTSA complaint data available at nhtsa.gov.
Key questions answered:
- Which vehicle components generate the most field complaints?
- Which manufacturers have the highest crash-related failure rates?
- How have complaint volumes trended year-over-year (2018–2023)?
- What is the average mileage at first failure per component?
- Which model + component combinations carry the highest crash risk?
| Finding | Detail |
|---|---|
| Top failure component | Electrical System — 18% of all complaints |
| Highest crash-rate component | Airbag & Brakes |
| Earliest average failure | Battery system (~48k miles avg) |
| Most complaints filed | 2021–2022 — peak complaint period |
| Highest volume state | CA, TX, FL — top 3 combined = ~30% of all complaints |
vehicle-recall-analysis/
├── README.md
├── data/
│ ├── nhtsa_complaints_raw.csv ← original dataset (6,000 records)
│ ├── nhtsa_complaints_clean.csv ← cleaned & validated dataset
│ ├── q1_components.csv ← component complaint counts
│ ├── q2_manufacturers.csv ← manufacturer crash rates
│ ├── q3_yearly_trend.csv ← year-over-year trends
│ ├── q4_mileage.csv ← mileage at failure by component
│ ├── q5_high_risk.csv ← high-risk model/component combos
│ └── q6_states.csv ← state-level complaint volume
├── notebooks/
│ ├── 01_data_cleaning.ipynb ← data loading, QA, validation
│ ├── 02_sql_analysis.ipynb ← SQL queries (CTEs, window functions)
│ └── 03_visualizations.ipynb ← Matplotlib charts + findings
└── charts/
├── chart1_top_components.png
├── chart2_yearly_trend.png
├── chart3_mileage_at_failure.png
├── chart4_manufacturer_crash_rate.png
└── chart5_top_states.png
| Tool | Usage |
|---|---|
| Python (Pandas) | Data loading, cleaning, transformation, validation |
| SQL (SQLite) | CTEs, window functions (RANK, LAG), GROUP BY, HAVING |
| Matplotlib | Bar charts, dual-axis plots, horizontal bar charts |
| Data QA | Null checks, outlier flagging, logic validation |
| Jupyter Notebooks | Reproducible, documented analysis workflow |
- Clone this repository
- Open any notebook in Jupyter Lab or Google Colab
- Run cells top to bottom — no additional installs needed (uses Python standard library only)
git clone https://github.com/YOUR_USERNAME/vehicle-recall-analysis
cd vehicle-recall-analysis
jupyter notebook notebooks/01_data_cleaning.ipynbDataset structure mirrors NHTSA vehicle safety complaints:
https://www.nhtsa.gov/vehicle-safety/complaints
Built as part of a data analytics portfolio focused on field reliability and vehicle quality analysis.



