# NJSP Fatal Crash Plots
This notebook is run as part of the daily update Github Action:
```bash
njsp -cc update_plots
```
It updates plots based on the latest NJSP fatal crash data (in this Git repo).

It also computes an estimate for the number of traffic deaths in the remainder of the current year (which helps make sense of otherwise-incomplete data about the current year).

In [1]:
from utz import *
import json
from utz import plots
import plotly.graph_objects as go
import plotly.express as px
from nj_crashes.paths import PLOTS_DIR, ROOT_DIR
from njsp.crashes import load
from njsp.paths import PROJECTED_CSV, PROJECTED_TOTALS_PATH, RUNDATE_PATH
from njsp.ytd import Ytd, normalized_ytd_days

[Papermill](https://papermill.readthedocs.io/) parameters:

In [2]:
show = None
ytc_fmts = 'csv'  # comma-delimited subset of {csv, pqt, db}

In [3]:
# Parameters
show = "png"


Common settings for plots created later:

In [4]:
save = partial(
    plots.save,
    bg='white',
    xgrid='#ccc',
    ygrid='#ccc',
    hoverx=True,
    show=show,
    dir=PLOTS_DIR,
    bottom_legend='all',
    title_suffix='_titled',
)

## Load most recent NJSP fatal crash data
This table is produced by the `njsp -cc update_pqts` step that precedes this in [the daily Github Action](.github/workflows/daily.yml):

In [5]:
crashes = load()
crashes

Unnamed: 0_level_0,cc,mc,dt,tk,ti,dk,ok,pk,bk,location,street,highway
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1703,1,2,2008-01-01 00:35:00-05:00,1,1,,,,,State/Interstate Authority 446 S MP 1,,446
1681,9,10,2008-01-01 04:11:00-05:00,1,,,,,,Bergenline Ave S MP 0 at 6th St,Bergenline Ave,
1659,4,15,2008-01-01 06:46:00-05:00,1,1,,,,,State Highway 42 N MP 8.2,,42
1661,20,4,2008-01-01 12:29:00-05:00,1,1,,,,,County 624 W MP 2.2 at Ikea Dr,,624
1811,7,16,2008-01-01 18:53:00-05:00,1,,,,,,County 648 E MP .87 at Franklin Ave,,648
...,...,...,...,...,...,...,...,...,...,...,...,...
13898,7,10,2025-05-08 00:59:00-04:00,1,,0,0,1,0,County 635,,635
13899,17,6,2025-05-09 01:08:00-04:00,1,,1,0,0,0,County 613,,613
13900,15,8,2025-05-09 22:24:00-04:00,1,,0,0,0,1,State Highway 70,,70
13902,11,6,2025-05-12 19:08:00-04:00,1,,1,0,0,0,State Highway 31,,31


Load info about when the NJSP data was most recently updated:

In [6]:
from njsp import Rundate
rundate = Rundate()
print(f'Most recent NJSP run date: {rundate}')
print(f'Most recent month end: {rundate.cur_month_dt}')
print(f'Current year start: {rundate.cur_year_dt}')
print(f'Next year start: {rundate.nxt_year_dt}')

Most recent NJSP run date: 2025-05-13 10:00:01-04:00
Most recent month end: 2025-05-01 00:00:00-04:00
Current year start: 2025-01-01 00:00:00-05:00
Next year start: 2026-01-01 00:00:00-05:00


## YTD Calculations
Create series that cumulatively sum year-to-date deaths (as of each day in the dataset history, going back to January 1, 2008).

### Plot YTD counts, for each year ≥2008

In [7]:
years = crashes.dt.dt.year.unique()

#### Color utilities

In [8]:
from utz.colors import RGB, color_interp, colors_lengthen, swatches
from nj_crashes.colors import get_colors, gridcolor, px_colors

colors = get_colors(len(years))
black, red, year_colors = colors.black, colors.red, colors.year_colors
colors

{'black': '#000004', 'red': '#ae3159', 'year_colors': ['#fcffa4', '#f9e66d', '#f7cd39', '#f9b11c', '#f99509', '#f17a1a', '#e7622a', '#d74e3c', '#c53e4c', '#ae3159', '#972763', '#7f1e6a', '#67166c', '#4f0d6b', '#360c59', '#1d0c43', '#0e0624', '#000004']}

In [9]:
cur_year = rundate.year
month_starts = [
    to_dt(f'{cur_year}-{m}').strftime('%b 1')
    for m in range(1, 13)
]
month_starts

['Jan 1',
 'Feb 1',
 'Mar 1',
 'Apr 1',
 'May 1',
 'Jun 1',
 'Jul 1',
 'Aug 1',
 'Sep 1',
 'Oct 1',
 'Nov 1',
 'Dec 1']

In [10]:
ytd = Ytd()

In [11]:
save(
    px.line(
        ytd.ytds,
        x='Text', y='YTD Deaths', color='Year',
        color_discrete_sequence=year_colors,
    ),
    'ytd-deaths',
    'NJ Traffic Deaths – YTD',
    x=dict(
        title='',
        dtick=50,
        tickmode='array',
        tickvals=month_starts,
        ticktext=month_starts,
    ),
    y='',
    legend='reversed',
    png=(850, 800),
);

Wrote plot JSON to www/public/plots/ytd-deaths.json


Wrote plot image to www/public/plots/ytd-deaths.png


Wrote plot image to www/public/plots/ytd-deaths_titled.png


![](../www/public/plots/ytd-deaths_titled.png)

## Plot deaths by {year, victim type}

### Group by year

In [12]:
dt = crashes.dt.dt
fatalities_per_year = crashes.tk.groupby(dt.year).sum().astype(int).rename('NJSP records')

### Group by month

In [13]:
ym = crashes.dt.apply(lambda d: d.strftime('%Y-%m')).rename('ym')
ym

id
1703     2008-01
1681     2008-01
1659     2008-01
1661     2008-01
1811     2008-01
          ...   
13898    2025-05
13899    2025-05
13900    2025-05
13902    2025-05
13903    2025-05
Name: ym, Length: 9790, dtype: object

In [14]:
cur_month = rundate.cur_month_dt
TZ = cur_month.tz
fatalities_per_month = crashes[crashes.dt < cur_month].tk.groupby(ym).sum()
fatalities_per_month

ym
2008-01    59
2008-02    40
2008-03    33
2008-04    50
2008-05    46
           ..
2024-12    52
2025-01    35
2025-02    42
2025-03    39
2025-04    52
Name: tk, Length: 208, dtype: Int8

### Rolling avg

In [15]:
rolling = fatalities_per_month.rolling(12).mean()
rolling

ym
2008-01          NaN
2008-02          NaN
2008-03          NaN
2008-04          NaN
2008-05          NaN
             ...    
2024-12    57.083333
2025-01    56.000000
2025-02    55.250000
2025-03    54.333333
2025-04    54.000000
Name: tk, Length: 208, dtype: float64

In [16]:
mos = (
    sxs(
        dt.year.rename('year'),
        dt.month.rename('month'),
        crashes.tk,
    )
    .groupby(['year', 'month']).sum()
)
mos

Unnamed: 0_level_0,Unnamed: 1_level_0,tk
year,month,Unnamed: 2_level_1
2008,1,59
2008,2,40
2008,3,33
2008,4,50
2008,5,46
...,...,...
2025,1,35
2025,2,42
2025,3,39
2025,4,52


In [17]:
pivoted = mos.reset_index().sort_values(['month', 'year'])
pivoted = pivoted[
    pivoted.apply(
        lambda r: to_dt('%d-%02d' % (r.year, r.month)).tz_localize(cur_month.tz) < cur_month,
        axis=1
    )
]
pivoted

Unnamed: 0,year,month,tk
0,2008,1,59
12,2009,1,57
24,2010,1,37
36,2011,1,36
48,2012,1,52
...,...,...,...
155,2020,12,47
167,2021,12,61
179,2022,12,50
191,2023,12,58


In [18]:
by_month = crashes.tk.groupby([dt.year, dt.month]).sum()
by_month

dt    dt
2008  1     59
      2     40
      3     33
      4     50
      5     46
            ..
2025  1     35
      2     42
      3     39
      4     52
      5     14
Name: tk, Length: 209, dtype: Int8

### Break out victim "types"

Check victim "type" subtotals vs. total:

In [19]:
fatal_totals = sxs(*[crashes[f'{t}k'].fillna(0) for t in 'dopb']).sum(axis=1)
sxs(crashes.dt, (crashes.tk - fatal_totals).rename('diff')).groupby(dt.year)['diff'].sum()

dt
2008    590
2009    584
2010    556
2011    627
2012    589
2013    542
2014    556
2015    562
2016    602
2017    624
2018    563
2019    558
2020      0
2021      0
2022      0
2023      0
2024      0
2025      0
Name: diff, dtype: Int64

Cross-reference with annual totals, populate "unknown" subtotal:

In [20]:
base_type_cols_map = {
    'dk': 'driver',
    'pk': 'pedestrian',
    'ok': 'passenger',
    'bk': 'cyclist',
}
base_type_cols = list(base_type_cols_map.values())

In [21]:
from njsp.paths import ANNUAL_SUMMARIES_YT_CSV

year_stats = read_csv(ANNUAL_SUMMARIES_YT_CSV).astype(int).set_index('year')
year_stats

Unnamed: 0_level_0,driver,passenger,cyclist,pedestrian,crashes
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2008,320,112,20,138,555
2009,315,98,14,157,550
2010,303,99,13,141,530
2011,362,105,17,143,586
2012,309,103,14,163,553
2013,304,92,14,132,508
2014,295,80,11,170,523
2015,276,96,17,173,522
2016,330,89,17,166,570
2017,339,85,17,183,591


In [22]:
projected_total = read_csv(PROJECTED_CSV, index_col='county').drop(columns='crashes').sum().sum()
print(f'{projected_total} projected deaths for {cur_year}')

649 projected deaths for 2025


In [23]:
year_types = (
    sxs(
        crashes.dt,
        crashes.rename(columns=base_type_cols_map)[base_type_cols].fillna(0)
    )
    .groupby(dt.year.rename('year'))
    .sum(numeric_only=True)
    .astype(int)
)
# Patch in year-types.csv values for [2008, 2020]
year_types.loc[range(2008, 2020)] = year_stats.loc[range(2008, 2020), base_type_cols]

year_types['projected_total'] = fatalities_per_year
year_types.loc[cur_year, 'projected_total'] = projected_total
year_types['projected'] = year_types.projected_total - fatalities_per_year
year_types

Unnamed: 0_level_0,driver,pedestrian,passenger,cyclist,projected_total,projected
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2008,320,138,112,20,590,0
2009,315,157,98,14,584,0
2010,303,141,99,13,556,0
2011,362,143,105,17,627,0
2012,309,163,103,14,589,0
2013,304,132,92,14,542,0
2014,295,170,80,11,556,0
2015,276,173,96,17,562,0
2016,330,166,89,17,602,0
2017,339,183,85,17,624,0


## Update {year,type,county} stats

In [24]:
from njsp.paths import ANNUAL_SUMMARIES_YTC_CSV

Load {year,type,county} subtotals from annual summary PDFs (see [NJSP summary PDFs.ipynb](data/njsp/annual-summaries/NJSP%20summary%20PDFs.ipynb)):

In [25]:
ytc0 = read_csv(ANNUAL_SUMMARIES_YTC_CSV).set_index(['year', 'county']).astype(int)
ytc0

Unnamed: 0_level_0,Unnamed: 1_level_0,driver,passenger,cyclist,pedestrian,crashes
year,county,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2008,Atlantic,17,8,0,6,30
2008,Bergen,10,5,1,7,22
2008,Burlington,23,6,4,12,45
2008,Camden,25,4,0,15,42
2008,Cape May,8,3,0,0,11
...,...,...,...,...,...,...
2023,Salem,8,2,0,2,11
2023,Somerset,13,4,0,6,21
2023,Sussex,6,2,0,1,6
2023,Union,14,6,2,15,35


Generate a similar dataframe from crash records:

In [26]:
from njdot.data import cc2cn

In [27]:
ytc1 = (
    crashes
    .assign(year=dt.year, crashes=1, county=crashes.cc.map(cc2cn))
    [dt.year >= 2020]
    .rename(columns=dict(
        **base_type_cols_map
    ))
    [['year', 'county'] + ytc0.columns.tolist()]
    .groupby(['year', 'county'])
    .sum(numeric_only=True)
    .astype(int)
)
ytc1

Unnamed: 0_level_0,Unnamed: 1_level_0,driver,passenger,cyclist,pedestrian,crashes
year,county,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020,Atlantic,26,5,0,9,38
2020,Bergen,14,9,0,20,38
2020,Burlington,26,4,3,9,40
2020,Camden,19,5,1,13,36
2020,Cape May,5,0,1,3,8
...,...,...,...,...,...,...
2025,Salem,5,0,0,2,7
2025,Somerset,4,2,0,0,6
2025,Sussex,0,1,0,0,1
2025,Union,2,2,0,3,7


Verify they match (for years ≥2020, where they overlap):

In [28]:
m = ytc0.merge(ytc1, left_index=True, right_index=True)
m.columns = pd.MultiIndex.from_tuples([ (c[-1], c[:-2]) for c in m.columns ])
diffs = m['x'] != m['y']
has_diffs = diffs.any().any()
if has_diffs:
    xd = m['x'].loc[diffs.any(axis=1), diffs.any()]
    xd.columns = pd.MultiIndex.from_tuples([ ('x', c) for c in xd.columns ])
    yd = m['y'].loc[diffs.any(axis=1), diffs.any()]
    yd.columns = pd.MultiIndex.from_tuples([ ('y', c) for c in yd.columns ])
    diffs = sxs(xd, yd)
else:
    diffs = None
diffs

In [29]:
assert not has_diffs, diffs

Combine:

In [30]:
y0 = ytc0.index.levels[0]
y1 = ytc1.index.levels[0]
ytc = pd.concat([
    ytc0.drop(index=y0[y0.isin(y1)], level=0),
    ytc1,
])
ytc

Unnamed: 0_level_0,Unnamed: 1_level_0,driver,passenger,cyclist,pedestrian,crashes
year,county,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2008,Atlantic,17,8,0,6,30
2008,Bergen,10,5,1,7,22
2008,Burlington,23,6,4,12,45
2008,Camden,25,4,0,15,42
2008,Cape May,8,3,0,0,11
...,...,...,...,...,...,...
2025,Salem,5,0,0,2,7
2025,Somerset,4,2,0,0,6
2025,Sussex,0,1,0,0,1
2025,Union,2,2,0,3,7


Export:

In [31]:
from njsp.paths import YTC_CSV, YTC_PQT, YTC_DB, YTC_DB_URI

In [32]:
for ytc_fmt in ytc_fmts.split(','):
    if ytc_fmt == 'csv':
        err(f'Writing {relpath(YTC_CSV)}')
        ytc.to_csv(YTC_CSV)
    elif ytc_fmt == 'pqt':
        err(f'Writing {relpath(YTC_PQT)}')
        ytc.to_parquet(YTC_PQT)
    elif ytc_fmt == 'db':
        err(f'Writing {relpath(YTC_DB)}')
        ytc.to_sql('ytc', YTC_DB_URI, if_exists='replace')
    else:
        raise ValueError(f'Unrecognized ytc_fmt {ytc_fmt}')

Writing njsp/data/year-type-county.csv


## Fatalities per year (by type)

In [33]:
ytc = colors_lengthen(px_colors, 7)
print(' '.join(ytc))
swatches(ytc)

#000004 #320c56 #781c6d #ba3853 #ed6925 #f9b621 #fcffa4


<span style="font-family: monospace">#000004 <span style="color: #000004">██████</span></span> <span style="font-family: monospace">#320c56 <span style="color: #320c56">██████</span></span> <span style="font-family: monospace">#781c6d <span style="color: #781c6d">██████</span></span> <span style="font-family: monospace">#ba3853 <span style="color: #ba3853">██████</span></span> <span style="font-family: monospace">#ed6925 <span style="color: #ed6925">██████</span></span> <span style="font-family: monospace">#f9b621 <span style="color: #f9b621">██████</span></span> <span style="font-family: monospace">#fcffa4 <span style="color: #fcffa4">██████</span></span>

In [34]:
type_cols = [
    'cyclist',
    'driver',
    'pedestrian',
    'passenger',
    'projected',
]
type_cols_map = {
    c: f'{c[0].upper()}{c[1:]}{"s" if c != "projected" else ""}'
    for c in type_cols
}
type_cols_map

{'cyclist': 'Cyclists',
 'driver': 'Drivers',
 'pedestrian': 'Pedestrians',
 'passenger': 'Passengers',
 'projected': 'Projected'}

In [35]:
fig = px.bar(
    year_types[type_cols].rename(columns=type_cols_map).replace(0, nan),
    barmode='stack',
    color_discrete_sequence=ytc[1:],
    text_auto='d',
)
for year, projected_total in year_types.projected_total.to_dict().items():
    fig.add_annotation(
        x=year, y=projected_total,
        text=projected_total,
        showarrow=False,
        yshift=10,
    )
save(
    fig,
    'fatalities_per_year_by_type',
    f'NJ Traffic Deaths per Year (by victim type)',
    x=dict(dtick='y', title=None), xgrid=None,
    y=dict(dtick=50, title=None),
    png=(1200, 600),
);

Wrote plot JSON to www/public/plots/fatalities_per_year_by_type.json


Wrote plot image to www/public/plots/fatalities_per_year_by_type.png
Wrote plot image to www/public/plots/fatalities_per_year_by_type_titled.png


![](../www/public/plots/fatalities_per_year_by_type_titled.png)

## Fatalities per month (by victim type)

In [36]:
month_types = (
    sxs(
        crashes.dt,
        crashes.rename(columns=base_type_cols_map)[base_type_cols].fillna(0)
    )
    [ dt.year >= 2020 ]
    .groupby([
        dt.year.rename('year'),
        dt.month.rename('month'),
    ])
    [base_type_cols]
    .sum()
    .astype(int)
)

month_types = month_types.reset_index()
month_types['dt'] = (
    month_types
    [['year', 'month']]
    .apply(lambda r: '%04d-%02d' % (r['year'], r['month']), axis=1)
)
month_types = month_types.set_index('dt').drop(columns=['year', 'month'])
month_types

Unnamed: 0_level_0,driver,pedestrian,passenger,cyclist
dt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-01,21,18,8,2
2020-02,15,17,2,0
2020-03,11,16,8,1
2020-04,17,8,2,1
2020-05,28,13,9,2
2020-06,30,8,9,0
2020-07,30,19,8,5
2020-08,31,8,14,1
2020-09,31,21,5,0
2020-10,33,17,5,3


In [37]:
type_colors = colors_lengthen(px_colors, 7)

fig = px.line(
    month_types.rename(columns=type_cols_map).loc[to_dt(month_types.index).tz_localize(TZ) < cur_month],
    labels={'variable': '',},
    color_discrete_sequence=type_colors,
)
fig.update_traces(line=dict(width=3))
save(
    fig,
    'fatalities_per_month_by_type',
    title='NJ Traffic Deaths per Month (by victim type)',
    x=dict(
        title='',
        tickformat="%b '%y",
    ),
    y='',
    png=800,
);

Wrote plot JSON to www/public/plots/fatalities_per_month_by_type.json
Wrote plot image to www/public/plots/fatalities_per_month_by_type.png
Wrote plot image to www/public/plots/fatalities_per_month_by_type_titled.png


![](../www/public/plots/fatalities_per_month_by_type_titled.png)

## Fatalities per month

In [38]:
to_dt(fatalities_per_month.index).to_series().dt.date

ym
2008-01-01    2008-01-01
2008-02-01    2008-02-01
2008-03-01    2008-03-01
2008-04-01    2008-04-01
2008-05-01    2008-05-01
                 ...    
2024-12-01    2024-12-01
2025-01-01    2025-01-01
2025-02-01    2025-02-01
2025-03-01    2025-03-01
2025-04-01    2025-04-01
Name: ym, Length: 208, dtype: object

In [39]:
to_dt(rolling.index).to_series().dt.date

ym
2008-01-01    2008-01-01
2008-02-01    2008-02-01
2008-03-01    2008-03-01
2008-04-01    2008-04-01
2008-05-01    2008-05-01
                 ...    
2024-12-01    2024-12-01
2025-01-01    2025-01-01
2025-02-01    2025-02-01
2025-03-01    2025-03-01
2025-04-01    2025-04-01
Name: ym, Length: 208, dtype: object

In [40]:
fig = go.Figure()
fig.add_trace(go.Bar(
    x=to_dt(fatalities_per_month.index).to_series(),
    y=fatalities_per_month.values,
    name='Fatalities',
    marker_color=red,
))
fig.add_trace(go.Scatter(
    x=to_dt(rolling.index).to_series(),
    y=rolling.apply(partial(round, ndigits=1)),
    name='12mo avg',
    line={'width': 4, 'color': black, }
))
save(
    fig,
    'fatalities_per_month',
    'NJ Traffic Deaths per Month',
    x=dict(dtick='M12'),
    png=(1200, 600),
);

Wrote plot JSON to www/public/plots/fatalities_per_month.json
Wrote plot image to www/public/plots/fatalities_per_month.png
Wrote plot image to www/public/plots/fatalities_per_month_titled.png


![](../www/public/plots/fatalities_per_month_titled.png)

In [41]:
month_names = [ to_dt('2022-%02d' % i).strftime('%b') for i in range(1, 13) ]
print(' '.join(month_names))

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec


In [42]:
fig = px.bar(
    x=pivoted.month,
    y=pivoted.tk,
    color=pivoted.year.astype(str),
    color_discrete_sequence=year_colors,
    labels=dict(color='', x='', y='',),
    barmode='group',
)
save(
    fig,
    'fatalities_by_month_bars',
    'NJ Traffic Deaths, grouped by month',
    legend='reversed',
    xaxis=dict(
        tickmode='array',
        tickvals=list(range(1, 13)),
        ticktext=month_names,
    ),
    png=(1200, 700),
);

Wrote plot JSON to www/public/plots/fatalities_by_month_bars.json
Wrote plot image to www/public/plots/fatalities_by_month_bars.png


Wrote plot image to www/public/plots/fatalities_by_month_bars_titled.png


![](../www/public/plots/fatalities_by_month_bars_titled.png)

In [43]:
fig = px.line(
    x = pivoted.month,
    y = pivoted.tk,
    color = pivoted.year,
    color_discrete_sequence=year_colors,
    labels={ 'color': '', 'x': '', 'y': '' },
).update_yaxes(
    gridcolor=gridcolor,
)
save(
    fig,
    title='NJ Traffic Deaths by Month',
    name='fatalities_by_month_lines',
    xaxis=dict(
        tickmode='array',
        tickvals=list(range(1, 13)),
        ticktext=month_names,
    ),
    legend='reversed',
    png=(1200, 700),
);

Wrote plot JSON to www/public/plots/fatalities_by_month_lines.json
Wrote plot image to www/public/plots/fatalities_by_month_lines.png


Wrote plot image to www/public/plots/fatalities_by_month_lines_titled.png


![](../www/public/plots/fatalities_by_month_lines_titled.png)