# NHTSA API SQL Analysis

This notebook contains descriptive and diagnostic analytics queries against the `nhtsa_recalls_summary` and `nhtsa_recalls_detail` tables.

## Setup
Load credentials from environment variables and establish a database connection.

In [1]:
import os
import pandas as pd
from sqlalchemy import create_engine
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

PG_user = os.getenv('PG_USER')
PG_password = os.getenv('PG_PASSWORD')
PG_host = os.getenv('PG_HOST')
PG_DB = os.getenv('PG_DB')

# Create database engine
engine = create_engine(f"postgresql://{PG_user}:{PG_password}@{PG_host}/{PG_DB}")

# Display all rows in pandas DataFrame
pd.set_option('display.max_rows', None)


## Query 1: Descriptive Analytics Query
**Business Question:** Which makes and models have the highest number of recalls?

In [2]:
sql_query = '''
-- Descriptive Analytics Query for NHTSA Recalls (with proper quoting)
SELECT
    "Make"            AS make,
    "Model"           AS model,
    "RecallCount_2000_2025" AS recall_count
FROM sql_project."nhtsa_recalls_summary"
ORDER BY recall_count DESC;
'''
df1 = pd.read_sql(sql_query, engine)
df1

Unnamed: 0,make,model,recall_count
0,Toyota,Tundra,218
1,Toyota,Tacoma,151
2,Toyota,Sienna,148
3,Toyota,Corolla,145
4,Toyota,RAV4,130
5,Toyota,Sequoia,128
6,Toyota,4Runner,120
7,Toyota,Highlander,113
8,Toyota,Camry,103
9,Toyota,Prius,84


### Insight

Five Toyota nameplates (Tundra 218, Tacoma 151, Sienna 148, Corolla 145, RAV4 130) make up ~41 % of all Toyota recalls

### Recommendation

Place these models at the top of the recall-reduction backlog: build a quick dashboard that tracks their open campaigns and highlights any spike > 5 recalls/quarter.

### Prediction

Concentrated effort on just these five programs could trim Toyota’s annual recall volume by a large amount.

## Query 2: Diagnostic Analytics Query
**Business Question:** For the top 5 recalled models, which components are most frequently recalled and what percentage do they represent?

In [None]:
sql_query = '''
WITH top_models AS (
  SELECT
    "Make",
    "Model",
    "RecallCount_2000_2025" AS recall_count
  FROM sql_project."nhtsa_recalls_summary"
  ORDER BY recall_count DESC
  LIMIT 5
),
component_counts AS (
  SELECT
    "Manufacturer",
    "Model",
    "Component",
    COUNT(*) AS component_count
  FROM sql_project."nhtsa_recalls_detail"
  GROUP BY "Manufacturer", "Model", "Component"
)
SELECT
  tm."Make"  AS make,
  tm."Model" AS model,
  cc."Component"          AS component,
  cc.component_count      AS count,
  cc.component_count * 100.0
        / SUM(cc.component_count) OVER (PARTITION BY tm."Make", tm."Model")
        AS pct_of_model_recalls
FROM top_models tm
JOIN component_counts cc
  ON LOWER(cc."Model") = LOWER(tm."Model")
 AND LOWER(cc."Manufacturer") ILIKE '%%' || LOWER(tm."Make") || '%%'   -- note the doubled %%
ORDER BY tm."Model", count DESC;
'''
df2 = pd.read_sql(sql_query, engine)
df2



Unnamed: 0,make,model,component,count,pct_of_model_recalls
0,Toyota,Corolla,AIR BAGS:FRONTAL:PASSENGER SIDE:INFLATOR MODULE,34,29.310345
1,Toyota,Corolla,AIR BAGS,12,10.344828
2,Toyota,Corolla,AIR BAGS: AIR BAG/RESTRAINT CONTROL MODULE,9,7.758621
3,Toyota,Corolla,EQUIPMENT:OTHER:LABELS,5,4.310345
4,Toyota,Corolla,EQUIPMENT:OTHER:LABELS,5,4.310345
5,Toyota,Corolla,VEHICLE SPEED CONTROL:ACCELERATOR PEDAL,4,3.448276
6,Toyota,Corolla,ENGINE AND ENGINE COOLING,4,3.448276
7,Toyota,Corolla,SEATS:FRONT ASSEMBLY:SEAT HEATER/COOLER,4,3.448276
8,Toyota,Corolla,VISIBILITY:POWER WINDOW DEVICES AND CONTROLS,4,3.448276
9,Toyota,Corolla,AIR BAGS:FRONTAL:SENSOR/CONTROL MODULE-INACTIVE,4,3.448276


In [None]:
df2.groupby(['model']).apply(
    lambda g: g.nlargest(1, 'pct_of_model_recalls')[['component',
                                                     'pct_of_model_recalls']]
)


  df2.groupby(['model']).apply(


Unnamed: 0_level_0,Unnamed: 1_level_0,component,pct_of_model_recalls
model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Corolla,0,AIR BAGS:FRONTAL:PASSENGER SIDE:INFLATOR MODULE,29.310345
RAV4,28,SUSPENSION:REAR,13.846154
Sienna,64,SEATS:FRONT ASSEMBLY:SEAT HEATER/COOLER,10.769231
Tacoma,99,SEATS:FRONT ASSEMBLY:SEAT HEATER/COOLER,8.633094
Tundra,139,STEERING:HYDRAULIC POWER ASSIST SYSTEM,15.306122


### Insight

Each high-recall model shows a single dominant component family—e.g., passenger air-bag inflators drive 29 % of Corolla recalls, rear-suspension links 14 % of RAV4, hydraulic power-assist 15 % of Tundra.

### Recommendation

Run root-cause analyses starting with the top component for each model; prioritize supplier audits, design reviews, or process-control checks on those items first.

### Prediction

Eliminating the leading component issue per model would cut total recalls for that vehicle line, generating potential savings.