# Data Trends in Allegheny County's Fatal Accidental Overdoses, 2007-Present

The code below reads in a csv file with data on fatal accidental drug overdoses that have occurred in Allegheny County since 2007, then parses the data so that the fatalities can be sorted by age, race, sex, location (zip code), and incident year. 
<br>

The data is available through the [Western PA Regional Data Center](https://data.wprdc.org/dataset/allegheny-county-fatal-accidental-overdoses). 

**Research questions:**<br>
1) Which year in the dataset had the highest number of fatal accidental overdoses? <br>
2) When was the greatest fluctuation in fatality rates from one year to the next? 

In [19]:
# I used the same structure from Eric's Tri Water demo during our Week 3 class for the function below

from csv import DictReader

def sort_data_dict():
    """creates a dictionary to store pertinent data from the original dataset, reads in the csv of the original dataset,
    and loops through the data to organize it into the new dictionary"""
    # Create data tracking container prior to iterating over csv
    # empty dicts as values for keys will be added to as we iterate
    od_summary = {'incident_count':0, 'year':{}, 'sex':{}, 'race':{}, 'age':{}, 'location':{}}
    # use with open to read in file object
    with open('WPRDC_fatal_overdose.csv') as overdose_file:
        # pass file object to DictReader, creating an iterable reader object
        d_reader = DictReader(overdose_file)
        # loop over each row to display contents
        for record in d_reader:
            # tabulate records as we go
            od_summary['incident_count'] = od_summary['incident_count'] + 1
            #print('Running record count: ', od_summary['incident_count'])
            # check location dict for presence of our current records' zip code
            if record['incident_zip'] not in od_summary['location']:
                # we need to add the zip to our dict with count 1
                od_summary['location'][record['incident_zip']] = 1
            else:
                # when triggered, our zip is already in our location dict
                od_summary['location'][record['incident_zip']] += 1
            if record['case_year'] not in od_summary['year']:
                od_summary['year'][record['case_year']] = 1
            else:
                od_summary['year'][record['case_year']] += 1
            if record['age'] not in od_summary['age']:
                od_summary['age'][record['age']] = 1
            else:
                od_summary['age'][record['age']] += 1
            if record['sex'] not in od_summary['sex']:
                od_summary['sex'][record['sex']] = 1
            else: 
                od_summary['sex'][record['sex']] += 1
            if record['race'] not in od_summary['race']:
                od_summary['race'][record['race']] = 1
            else:
                od_summary['race'][record['race']] += 1
        return od_summary


In [20]:
# The function below displays counts of each occurence of each category for any dictionary key in od_summary

def view_agg_data(dict_key):
    """displays total number of records for each value in a given dictionary key"""
    od_summary = sort_data_dict()
    for value in sorted(od_summary[dict_key]):
        print(value, ': ', od_summary[dict_key][value])

view_agg_data('year')

2007 :  27353
2008 :  28276
2009 :  26793
2010 :  27451
2011 :  31689
2012 :  34969
2013 :  32875
2014 :  37419
2015 :  51581
2016 :  78983
2017 :  90271
2018 :  52623
2019 :  60355
2020 :  49986
2021 :  25


In [21]:
# calling the view_agg_data function again to view breakdown by race...
view_agg_data('race')

 :  559
A :  1205
B :  95903
H :  2807
I :  243
M :  490
O :  1396
U :  244
W :  527802


In [22]:
# and again to view data by sex...
view_agg_data('sex')

 :  244
F :  195783
M :  434622


In [23]:
# and again to view data by age...
view_agg_data('age')

 :  244
0 :  122
1 :  611
12 :  122
15 :  247
16 :  733
17 :  1099
18 :  1897
19 :  4280
20 :  5603
21 :  6342
22 :  6740
23 :  9122
24 :  10659
25 :  12751
26 :  13642
27 :  15136
28 :  17041
29 :  18042
30 :  18192
31 :  16346
32 :  16980
33 :  14960
34 :  18187
35 :  14838
36 :  17005
37 :  17089
38 :  15985
39 :  17103
40 :  13922
41 :  14252
42 :  12827
43 :  13125
44 :  13850
45 :  15655
46 :  16797
47 :  16110
48 :  13899
49 :  16692
50 :  15067
51 :  19752
52 :  18305
53 :  14181
54 :  14800
55 :  13080
56 :  13675
57 :  12141
58 :  13575
59 :  11075
60 :  9110
61 :  6798
62 :  5267
63 :  6352
64 :  3792
65 :  4152
66 :  2448
67 :  1584
68 :  1955
69 :  1103
70 :  488
71 :  461
72 :  122
73 :  735
74 :  551
75 :  489
76 :  245
78 :  123
79 :  122
83 :  122
84 :  244
85 :  122
88 :  244
91 :  122


In [13]:
# And once more to view data by zipcode. There are a LOT of null and invalid entries in zipcodes. 
view_agg_data('location')

 :  61855
10002 :  610
1201 :  605
1220 :  615
12501 :  240
13219 :  610
14218 :  610
14607 :  605
14769 :  610
15-71 :  615
15001 :  3670
15003 :  6110
15005 :  2440
15007 :  1230
15009 :  605
15010 :  2450
15012 :  1230
15014 :  8590
15015 :  1830
15016 :  610
15017 :  22020
15018 :  1230
15020 :  1820
15021 :  615
15022 :  1840
15024 :  14725
15025 :  40060
15026 :  2440
15027 :  610
15030 :  2435
15031 :  1830
15033 :  1835
15034 :  6860
15035 :  6140
15037 :  20900
15038 :  1220
15041 :  610
15042 :  1830
15044 :  26840
15045 :  21780
15046 :  5520
15047 :  610
15049 :  2450
15051 :  2445
15052 :  615
15056 :  1215
15057 :  11605
15058 :  610
15061 :  1520
15063 :  1830
15064 :  2440
15065 :  26165
15066 :  2440
15068 :  25790
15071 :  11485
15074 :  3650
15076 :  3060
15081 :  610
15083 :  605
15084 :  30925
15085 :  5490
15086 :  615
15088 :  1725
15089 :  1225
15090 :  21850
15101 :  22630
15102 :  31150
15104 :  33375
15106 :  59765
15107 :  615
15108 :  59055
15110 :  20520
1

### Fatality rates: year-to-year comparison

In [32]:
def calc_percent_change():
    """loops through data dictionary to sum up fatalities by year, calculate % change from one year to the next, 
    and then print the output"""
    od_summary = sort_data_dict()
    prevkey = 0
    for i in sorted(od_summary['year']):
        print(i, ':',od_summary['year'][i], sep=' ', end='')
        # percent change = previous - current / previous
        if prevkey!= 0:
            percent_change = (od_summary['year'][i] - od_summary['year'][prevkey]) / od_summary['year'][prevkey]
            print(', change: ', percent_change)
            # move our cursor to remember the last key
        prevkey = i

calc_percent_change()

2007 : 273532008 : 28276, change:  0.03374401345373451
2009 : 26793, change:  -0.0524473051350969
2010 : 27451, change:  0.02455865337961408
2011 : 31689, change:  0.15438417543987468
2012 : 34969, change:  0.10350594843636593
2013 : 32875, change:  -0.05988160942549115
2014 : 37419, change:  0.13822053231939163
2015 : 51581, change:  0.3784708303268393
2016 : 78983, change:  0.5312421240379209
2017 : 90271, change:  0.14291683020396795
2018 : 52623, change:  -0.41705531122952
2019 : 60355, change:  0.14693194990783498
2020 : 49986, change:  -0.1718001822549913
2021 : 25, change:  -0.999499859960789


Since 2007, Allegheny County saw the highest number of fatal accidental drug overdoses in 2017 (90,271). 

The largest % change happened from 2015 to 2016, when total overdose fatalities in Allegheny County increased by 53%. 

Notably, there was a steep decline in fatalities (-41%) just two years later, from 2017 to 2018. 

The dataset is currently reporting a 17% decline in fatalities in 2020, but this seems like it could be subject to change. Since we are not even two months past the end of 2020, there may be additional 2020 data added to this dataset in the coming weeks (the low number of fatalities recorded in 2021 so far suggests that much of the data is entered on a delay). 

For further research, it would be interesting to break down the time-of-death data column by month to compare monthly changes in fatality rates. 