# {County, Muni, Year, Month, Condition, Type} aggregation

Compute aggregate sums to answer questions like:

> How many {drivers, passengers, pedestrians, cyclists} were {killed, seriously injured, moderately injured, …} in {county, muni, year, month}?

Those sums are stored in a table called `cmymc` (County, Muni, Year, Month, Condition), with a column for each victim "type", and another for the total number of crashes where that was the worst "condition" of anyone involved (i.e. fatal crashes, serious-injury crashes, …).

Other aggregate tables are also saved, e.g. `cmyc` ("Month" summed out, i.e. annual totals), `cymc` ("Muni" summed out, i.e. {county} x {month} totals), etc., down to `yc` (Year x Condition: how many of each {victim type} x {condition} were there each year, state-wide).

There is a separate "condition" ontology for vehicles: "hit & run" (left scene), "disabled", "towed" (the latter two are not mutually exclusive). Counts of each vehicle condition are also stored in columns on tables like `cmymv`, `cmyv`, etc.

This notebook saves a SQLite file with all the tables above (`www/public/njdot/cmymc.db`, used by the webapp), and also uploads it to S3 (`s3://nj-crashes/njdot/data/cmymc.db`). `cmyc`, `cyc`, and `yc` tables are also saved as Parquet.

In [1]:
from utz import *
from njdot import crashes, occupants, pedestrians, vehicles
from njdot.paths import CMYMC_DB, CMYC_PQT, CYC_PQT, YC_PQT
from nj_crashes.utils import sql

In [2]:
%%time
c = crashes.load(cols=['dt', 'year', 'cc', 'mc', 'severity', 'tk', 'ti', 'pk', 'pi', ])
y = c.year.rename('y')
m = c.dt.dt.month.rename('m')
c = sxs(y, m, c[[k for k in c if k not in ['y', 'm', 'year', 'dt']]])
c

Reading njdot/data/crashes.parquet


CPU times: user 438 ms, sys: 148 ms, total: 586 ms
Wall time: 415 ms


Unnamed: 0_level_0,y,m,cc,mc,severity,tk,ti,pk,pi
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0,2001,12,1,1,p,0,0,0,0
1,2001,1,1,1,p,0,0,0,0
2,2001,4,1,1,i,0,4,0,0
3,2001,4,1,1,i,0,1,0,0
4,2001,4,1,1,p,0,0,0,0
...,...,...,...,...,...,...,...,...,...
6319789,2022,12,21,23,p,0,0,0,0
6319790,2022,12,21,23,p,0,0,0,0
6319791,2022,12,21,23,p,0,0,0,0
6319792,2022,12,21,23,p,0,0,0,0


In [3]:
cmym_cols = [ 'cc', 'mc', 'y', 'm', ]
cmymtc_cols = cmym_cols + [ 'condition', 'type', ]

In [4]:
p = pedestrians.load()
p

Reading njdot/data/pedestrians.parquet


Unnamed: 0_level_0,crash_id,pn,condition,city,state,zip,dob,age,sex,alc_test_given,alc_test_type,alc_test_results,charge1,summons1,traffic_controls,cir1,cir2,dir,act,inj_loc,inj_type,med_refused,safety_used,hospital,status1,cyclist,other,charge2,summons2,charge3,summons3,charge4,summons4,status2
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1
0,7,1,3,ABSECON,NJ,08201,07/28/1990,10,M,,,,,,,,,,41,1,5,,,,1,False,False,,,,,,,
1,48,31,3,ABSECON,NJ,08201,01/16/1967,34,M,,,,,,,,,,,1,4,,,,,True,False,,,,,,,
2,76,1,2,EGG HARBOR TWP.,NJ,08234,11/16/1952,48,F,N,,,,,,,,,46,12,3,,,,1,False,False,,,,,,,
3,114,1,4,GALLOWAY TWP.,NJ,08205,03/31/1959,42,F,N,,,SUBPOENA,,,,,,49,7,,,,,1,False,False,,,,,,,
4,236,1,4,ABSECON,NJ,08201,04/18/1953,48,M,N,,,,,,,,,43,1,4,,,,1,False,False,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
185444,6319317,1,4,PHILLIPSBURG,NJ,08865,,31,M,N,,,,,11,75,72,1,42,0,0,2,,7102,7,False,False,,,,,,,6
185445,6319399,1,5,BELVIDERE,NJ,07823,,16,M,N,,,,,,78,,,42,8,8,1,,,1,False,False,,,,,,,
185446,6319407,31,2,WASHINGTON,NJ,07882,,12,M,N,,,,,11,25,,3,1,11,7,2,1,7102,1,True,False,,,,,,,
185447,6319412,1,1,WASHINGTON,NJ,07882,,55,M,N,,,,,4,78,,1,46,1,3,2,1,6404,7,False,False,,,,,,,


In [5]:
pm = p.merge(c, left_on='crash_id', right_index=True, how='left', validate='m:1')
pm = pm[(pm.condition >= 1) & (pm.condition <= 5)]
pm.loc[ pm.cyclist, 'type'] = 'b'
pm.loc[~pm.cyclist, 'type'] = 't'
pg = pm.groupby(cmymtc_cols).size().rename('num')
pg

cc  mc  y     m   condition  type
1   1   2001  1   4          t       1
              6   3          t       1
                  4          t       1
              7   4          b       1
              9   3          b       1
                                    ..
21  23  2012  7   4          b       1
        2016  4   1          b       1
        2017  8   4          b       1
        2019  7   2          b       1
        2020  12  1          t       1
Name: num, Length: 84477, dtype: int64

In [6]:
o = occupants.load()
o

Reading njdot/data/occupants.parquet


Unnamed: 0_level_0,crash_id,vehicle_id,on,condition,pos,eject,age,sex,inj_loc,inj_type,med_refused,safety_avail,safety_used,airbag,hospital
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
0,0,0.0,1,,1,1,38,M,,,,4,4,,
1,0,1.0,2,,1,1,63,F,,,,4,4,,
2,1,2.0,1,,,,,,,,,,,,
3,2,4.0,1,3,1,1,29,F,6,8,,4,4,,
4,2,4.0,2,3,3,1,7,M,8,5,,4,4,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14300194,6319792,,2,5,3,1,48,F,,,,11,4,,
14300195,6319792,,3,5,3,1,48,F,,,,11,4,,
14300196,6319792,,4,5,1,1,24,F,,,,11,4,,
14300197,6319793,,1,5,1,1,77,M,,,,11,4,,


In [7]:
om = o.merge(c, left_on='crash_id', right_index=True, how='left', validate='m:1')
om = om[(om.condition >= 1) & (om.condition <= 5)]
om['type'] = nan
om.loc[om.pos == 1, 'type'] = 'd'
om.loc[om.pos > 1, 'type'] = 'o'
om = om[~om.type.isna()]
og = om.groupby(cmymtc_cols).size().rename('num')
og

  om.loc[om.pos == 1, 'type'] = 'd'


cc  mc  y     m   condition  type
1   1   2001  1   3          d        3
                  4          d        6
                             o        4
              2   2          d        1
                  4          d        6
                                     ..
21  23  2022  12  3          d        1
                  4          d        3
                             o        1
                  5          d       21
                             o        5
Name: num, Length: 383262, dtype: int64

In [8]:
g = pd.concat([ pg, og ]).sort_index()
g

cc  mc  y     m   condition  type
1   1   2001  1   3          d        3
                  4          d        6
                             o        4
                             t        1
              2   2          d        1
                                     ..
21  23  2022  12  3          d        1
                  4          d        3
                             o        1
                  5          d       21
                             o        5
Name: num, Length: 467739, dtype: int64

In [9]:
occ_severity = om.groupby('crash_id')['condition'].min().rename('occ_severity')
ped_severity = pm.groupby('crash_id')['condition'].min().rename('ped_severity')
sev = sxs(occ_severity, ped_severity).min(axis=1).rename('condition')
cs = sxs(c.drop(columns='severity'), sev)
cs['condition'] = cs.condition.fillna(5)
cs

Unnamed: 0,y,m,cc,mc,tk,ti,pk,pi,condition
0,2001,12,1,1,0,0,0,0,5
1,2001,1,1,1,0,0,0,0,5
2,2001,4,1,1,0,4,0,0,3
3,2001,4,1,1,0,1,0,0,4
4,2001,4,1,1,0,0,0,0,5
...,...,...,...,...,...,...,...,...,...
6319789,2022,12,21,23,0,0,0,0,5
6319790,2022,12,21,23,0,0,0,0,5
6319791,2022,12,21,23,0,0,0,0,5
6319792,2022,12,21,23,0,0,0,0,5


In [10]:
cxs = cs.groupby(cmym_cols + ['condition']).size().rename('num_crashes')
cxs

cc  mc  y     m   condition
1   1   2001  1   3             2
                  4             8
                  5            10
              2   2             1
                  4             5
                               ..
21  23  2022  11  5             9
              12  2             1
                  3             1
                  4             2
                  5            13
Name: num_crashes, Length: 374555, dtype: int64

In [11]:
cmymc = g.reset_index(level=5).pivot(columns='type', values='num')[['d', 'o', 't', 'b']].rename(columns={
    'd': 'drivers',
    'o': 'passengers',
    't': 'pedestrians',
    'b': 'cyclists',
})
cmymc = sxs(cmymc, cxs).fillna(0).astype(int)
cmymc

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,drivers,passengers,pedestrians,cyclists,num_crashes
cc,mc,y,m,condition,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1,1,2001,1,3,3,0,0,0,2
1,1,2001,1,4,6,4,1,0,8
1,1,2001,2,2,1,0,0,0,1
1,1,2001,2,4,6,2,0,0,5
1,1,2001,3,3,2,1,0,0,2
...,...,...,...,...,...,...,...,...,...
21,23,2018,8,5,0,0,0,0,8
21,23,2018,9,5,0,0,0,0,10
21,23,2018,10,5,0,0,0,0,16
21,23,2018,11,5,0,0,0,0,21


In [12]:
hc = cmymc.loc[9].reset_index()
hcf = hc[hc.condition == 1]
hcf

Unnamed: 0,mc,y,m,condition,drivers,passengers,pedestrians,cyclists,num_crashes
6,1,2001,3,1,2,0,0,0,2
16,1,2001,7,1,2,0,0,0,2
32,1,2001,12,1,1,0,0,0,1
74,1,2003,5,1,0,0,1,0,1
130,1,2005,1,1,0,0,1,0,1
...,...,...,...,...,...,...,...,...,...
7324,12,2019,5,1,0,0,1,0,1
7329,12,2019,6,1,0,1,0,0,1
7403,12,2021,5,1,0,0,1,0,1
7422,12,2021,10,1,0,0,1,0,1


In [13]:
hcf.sum()

mc               2662
y              820673
m                2673
condition         408
drivers           199
passengers         79
pedestrians       208
cyclists           20
num_crashes       480
dtype: Int64

In [14]:
%%time
sql.write(
    cmymc, 'cmymc', CMYMC_DB,
    idxs=[('cc', 'mc', 'y', 'm', 'condition')],
    rm=True,
    page_size=2**16,
)

Removing www/public/njdot/cmymc.db
Writing 378481 rows to www/public/njdot/cmymc.db (cmymc)
Wrote DB: 30752768 bytes
After indices: 37879808 bytes


CPU times: user 2.49 s, sys: 278 ms, total: 2.77 s
Wall time: 2.9 s


After setting page_size=65536 and vacuum: 36241408 bytes


In [15]:
def sum_idx_col(df0, col, tbl_suffix='', page_size=None):
    idx_cols0 = df0.index.names
    idx_cols1 = [ c for c in idx_cols0 if c != col ]
    assert len(idx_cols1) + 1 == len(idx_cols0)
    df1 = df0.reset_index().drop(columns=col).groupby(idx_cols1).sum()
    tbl = ''.join([ c[0] for c in idx_cols1 ]) + tbl_suffix
    sql.write(
        df1, tbl, CMYMC_DB,
        idxs=[tuple(idx_cols1)],
        replace=False,
        page_size=page_size,
    )
    return df1

Sum over months:

In [16]:
cmyc = sum_idx_col(cmymc, 'm')
cmyc.sort_index().reset_index().to_parquet(CMYC_PQT, index=False)
cmyc

Writing 49508 rows to www/public/njdot/cmymc.db (cmyc)
Wrote DB: 39911424 bytes
After indices: 40828928 bytes


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,drivers,passengers,pedestrians,cyclists,num_crashes
cc,mc,y,condition,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,1,2001,1,1,1,0,0,2
1,1,2001,2,4,4,1,0,5
1,1,2001,3,26,8,1,1,27
1,1,2001,4,90,53,3,1,94
1,1,2001,5,0,0,0,0,184
...,...,...,...,...,...,...,...,...
21,23,2022,1,1,0,0,0,1
21,23,2022,2,3,0,0,0,3
21,23,2022,3,12,1,0,0,12
21,23,2022,4,13,2,0,0,11


Sum over municipalities:

In [17]:
cymc = sum_idx_col(cmymc, 'mc')
cymc

Writing 26623 rows to www/public/njdot/cmymc.db (cymc)
Wrote DB: 42991616 bytes
After indices: 43515904 bytes


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,drivers,passengers,pedestrians,cyclists,num_crashes
cc,y,m,condition,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,2001,1,1,0,0,2,0,2
1,2001,1,2,10,5,3,0,12
1,2001,1,3,44,17,4,1,55
1,2001,1,4,155,86,9,5,164
1,2001,1,5,0,0,0,0,496
...,...,...,...,...,...,...,...,...
21,2022,12,1,1,0,0,0,1
21,2022,12,2,2,0,0,0,2
21,2022,12,3,16,4,0,0,16
21,2022,12,4,21,7,0,0,21


In [18]:
ymc = sum_idx_col(cymc, 'cc')
ymc

Writing 1320 rows to www/public/njdot/cmymc.db (ymc)
Wrote DB: 43778048 bytes
After indices: 43843584 bytes


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,drivers,passengers,pedestrians,cyclists,num_crashes
y,m,condition,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2001,1,1,24,9,8,1,41
2001,1,2,144,50,34,2,182
2001,1,3,1048,390,150,20,1262
2001,1,4,5121,2129,336,37,5187
2001,1,5,0,0,0,0,22225
...,...,...,...,...,...,...,...
2022,12,1,29,5,21,1,53
2022,12,2,171,48,48,5,239
2022,12,3,1546,462,194,38,1718
2022,12,4,2635,926,180,38,2673


In [19]:
cyc = sum_idx_col(cymc, 'm')
cyc.sort_index().reset_index().to_parquet(CYC_PQT, index=False)
cyc

Writing 2310 rows to www/public/njdot/cmymc.db (cyc)
Wrote DB: 44105728 bytes
After indices: 44171264 bytes


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,drivers,passengers,pedestrians,cyclists,num_crashes
cc,y,condition,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,2001,1,38,14,14,5,65
1,2001,2,119,48,28,8,158
1,2001,3,617,274,72,47,756
1,2001,4,1982,1157,93,70,2121
1,2001,5,0,0,0,0,6639
...,...,...,...,...,...,...,...
21,2022,1,11,4,1,0,16
21,2022,2,33,7,6,1,35
21,2022,3,216,42,1,0,205
21,2022,4,203,76,3,2,205


In [20]:
yc = sum_idx_col(cyc, 'cc')
yc.sort_index().reset_index().to_parquet(YC_PQT, index=False)
yc

Writing 110 rows to www/public/njdot/cmymc.db (yc)
Wrote DB: 44367872 bytes
After indices: 44433408 bytes


Unnamed: 0_level_0,Unnamed: 1_level_0,drivers,passengers,pedestrians,cyclists,num_crashes
y,condition,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2001,1,418,158,118,23,662
2001,2,1899,661,427,124,2595
2001,3,13972,5609,1931,958,17718
2001,4,58388,27542,3514,1384,60004
2001,5,0,0,0,0,231717
...,...,...,...,...,...,...
2022,1,424,91,196,16,689
2022,2,2195,659,534,196,3055
2022,3,18769,5928,1501,922,20677
2022,4,28652,10363,1457,613,28626


In [21]:
v = vehicles.load()
v

Reading njdot/data/vehicles.parquet


Unnamed: 0_level_0,crash_id,vn,ins_co,owner_state,make,model,color,vy,state,rm_by,impact_loc,damage_loc,type,use,cargo_type,cir1,cir2,dir,act,ev1,ev2,ev3,ev4,oversize,hit_run,departure,damage,ev
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1
0,0,1,426,NJ,NISSAN MAXIMA,,BUR,1991,NJ,,8,7,1,,,25,,1,3,26,,,,,False,1,,
1,0,2,989,NJ,LINCOLN TOWNCAR,,BK,1996,NJ,2,12,12,6,,0,4,,2,3,26,,,,0,False,1,,
2,1,1,962,NJ,TOYOTA 4DR,,GRN,1997,NJ,1,11,,1,,,25,,3,10,28,,,,,False,1,,
3,1,2,,,,,,0,,,0,0,5,,0,2,,1,1,26,,,,0,False,0,,
4,2,1,85,NJ,CHEVY CORSICA,,PUR,1996,NJ,3,8,15,1,,,25,,3,1,26,1,,,,False,6,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11907735,6319791,1,413,NJ,MAZDA,TRI,GY,2006,NJ,2,12,12,4,01,,52,,4,1,60,,,,,False,0,4,60
11907736,6319792,1,102,NJ,TOYOTA,TUNDRA,GN,2022,NJ,2,6,6,5,01,,29,,2,12,26,,,,,False,0,2,26
11907737,6319792,2,100,NJ,HONDA,HRV,GY,2020,NJ,1,12,12,4,01,,29,,2,12,26,,,,,False,0,2,26
11907738,6319793,1,426,NJ,CHEVROLET,TRA,RD,2012,NJ,3,11,11,4,01,,4,,3,6,26,,,,,False,0,4,26


In [22]:
vm = v.merge(c, left_on='crash_id', right_index=True, how='left', validate='m:1')
vm['towed'] = vm.departure >= 3
vm['impounded'] = (vm.departure == 4) | (vm.departure == 5)
vm['driven'] = vm.departure == 1
vm['disabled'] = (vm.departure == 3) | (vm.departure == 5) | (vm.damage == 4)
vm['condition'] = vm.damage.fillna(0).astype(int)
vm['left'] = vm.departure == 2
vm.loc[vm.disabled, 'condition'] = 4
vm

Unnamed: 0_level_0,crash_id,vn,ins_co,owner_state,make,model,color,vy,state,rm_by,impact_loc,damage_loc,type,use,cargo_type,cir1,cir2,dir,act,ev1,ev2,ev3,ev4,oversize,hit_run,departure,damage,ev,y,m,cc,mc,severity,tk,ti,pk,pi,towed,impounded,driven,disabled,condition,left
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1
0,0,1,426,NJ,NISSAN MAXIMA,,BUR,1991,NJ,,8,7,1,,,25,,1,3,26,,,,,False,1,,,2001,12,1,1,p,0,0,0,0,False,False,True,,0,False
1,0,2,989,NJ,LINCOLN TOWNCAR,,BK,1996,NJ,2,12,12,6,,0,4,,2,3,26,,,,0,False,1,,,2001,12,1,1,p,0,0,0,0,False,False,True,,0,False
2,1,1,962,NJ,TOYOTA 4DR,,GRN,1997,NJ,1,11,,1,,,25,,3,10,28,,,,,False,1,,,2001,1,1,1,p,0,0,0,0,False,False,True,,0,False
3,1,2,,,,,,0,,,0,0,5,,0,2,,1,1,26,,,,0,False,0,,,2001,1,1,1,p,0,0,0,0,False,False,False,,0,False
4,2,1,85,NJ,CHEVY CORSICA,,PUR,1996,NJ,3,8,15,1,,,25,,3,1,26,1,,,,False,6,,,2001,4,1,1,i,0,4,0,0,True,False,False,,0,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11907735,6319791,1,413,NJ,MAZDA,TRI,GY,2006,NJ,2,12,12,4,01,,52,,4,1,60,,,,,False,0,4,60,2022,12,21,23,p,0,0,0,0,False,False,False,True,4,False
11907736,6319792,1,102,NJ,TOYOTA,TUNDRA,GN,2022,NJ,2,6,6,5,01,,29,,2,12,26,,,,,False,0,2,26,2022,12,21,23,p,0,0,0,0,False,False,False,False,2,False
11907737,6319792,2,100,NJ,HONDA,HRV,GY,2020,NJ,1,12,12,4,01,,29,,2,12,26,,,,,False,0,2,26,2022,12,21,23,p,0,0,0,0,False,False,False,False,2,False
11907738,6319793,1,426,NJ,CHEVROLET,TRA,RD,2012,NJ,3,11,11,4,01,,4,,3,6,26,,,,,False,0,4,26,2022,12,21,23,i,0,1,0,0,False,False,False,True,4,False


In [23]:
cmymv = vm.groupby(cmym_cols)[['hit_run', 'towed', 'disabled']].sum()
cmymv

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,hit_run,towed,disabled
cc,mc,y,m,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,1,2001,1,0,14,0
1,1,2001,2,0,10,0
1,1,2001,3,0,22,0
1,1,2001,4,0,27,0
1,1,2001,5,0,10,0
...,...,...,...,...,...,...
21,23,2022,8,0,0,3
21,23,2022,9,0,0,3
21,23,2022,10,1,1,9
21,23,2022,11,0,1,4


In [24]:
sql.write(
    cmymv, 'cmymv', CMYMC_DB,
    idxs=[cmym_cols],
    replace=False,
    # page_size=2**16,
)

Writing 138626 rows to www/public/njdot/cmymc.db (cmymv)
Wrote DB: 53936128 bytes
After indices: 56360960 bytes


In [25]:
cmyv = sum_idx_col(cmymv, 'm', tbl_suffix='v')
cmyv

Writing 12275 rows to www/public/njdot/cmymc.db (cmyv)
Wrote DB: 57278464 bytes
After indices: 57540608 bytes


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,hit_run,towed,disabled
cc,mc,y,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,1,2001,0,193,0
1,1,2002,0,191,0
1,1,2003,0,238,0
1,1,2004,0,191,0
1,1,2005,0,181,0
...,...,...,...,...,...
21,23,2018,4,99,99
21,23,2019,6,85,88
21,23,2020,7,65,66
21,23,2021,3,31,60


In [26]:
cymv = sum_idx_col(cmymv, 'mc', tbl_suffix='v')
cymv

Writing 5544 rows to www/public/njdot/cmymc.db (cymv)
Wrote DB: 57933824 bytes
After indices: 58130432 bytes


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,hit_run,towed,disabled
cc,y,m,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,2001,1,0,343,0
1,2001,2,0,419,0
1,2001,3,0,314,0
1,2001,4,0,332,0
1,2001,5,0,350,0
...,...,...,...,...,...
21,2022,8,7,37,76
21,2022,9,7,42,79
21,2022,10,7,32,101
21,2022,11,6,31,93


In [27]:
cyv = sum_idx_col(cymv, 'm', tbl_suffix='v')
cyv

Writing 462 rows to www/public/njdot/cmymc.db (cyv)
Wrote DB: 58327040 bytes
After indices: 58392576 bytes


Unnamed: 0_level_0,Unnamed: 1_level_0,hit_run,towed,disabled
cc,y,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,2001,0,4442,0
1,2002,0,4795,0
1,2003,0,5125,0
1,2004,0,5476,0
1,2005,0,5272,0
...,...,...,...,...
21,2018,248,1957,2011
21,2019,270,1880,1936
21,2020,215,1422,1467
21,2021,202,1265,1615


In [28]:
ymv = sum_idx_col(cymv, 'cc', tbl_suffix='v')
ymv

Writing 264 rows to www/public/njdot/cmymc.db (ymv)
Wrote DB: 58589184 bytes
After indices: 58654720 bytes


Unnamed: 0_level_0,Unnamed: 1_level_0,hit_run,towed,disabled
y,m,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2001,1,0,11579,0
2001,2,0,9624,0
2001,3,0,9770,0
2001,4,0,9166,0
2001,5,0,10659,0
...,...,...,...,...
2022,8,2454,6982,8734
2022,9,2458,7409,9300
2022,10,2728,8973,11653
2022,11,2450,7911,10567


In [29]:
yv = sum_idx_col(ymv, 'm', tbl_suffix='v', page_size=2**16)
yv

Writing 22 rows to www/public/njdot/cmymc.db (yv)
Wrote DB: 58785792 bytes
After indices: 58851328 bytes
After setting page_size=65536 and vacuum: 57933824 bytes


Unnamed: 0_level_0,hit_run,towed,disabled
y,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2001,0,129096,0
2002,0,136971,0
2003,0,139858,0
2004,0,141870,0
2005,0,138166,0
2006,39032,126134,0
2007,41291,128322,0
2008,40903,123938,0
2009,39749,122630,0
2010,39654,120327,0


In [30]:
import boto3
s3 = boto3.client('s3')

In [31]:
s3.upload_file(CMYMC_DB, Bucket='nj-crashes', Key=f'njdot/data/{basename(CMYMC_DB)}')