# EXERCISE SOLUTIONS: NOTE FOR POSTING
# Merging Data to Understand the Relationship between Drug Legalization and Violent Crime



In recent years, many US states have decided to legalize the use of marijuana. 

When these ideas were first proposed, there were many theories about the relationship between crime and the "War on Drugs" (the term given to US efforts to arrest drug users and dealers over the past several decades). 

In this exercise, we're going to test a few of those theories using drug arrest data from the state of California. 

Though California has passed a number of laws lessening penalities for marijuana possession over the years, arguably the biggest changes were in  2010, when the state changed the penalty for possessing a small amount of marijuana from a criminal crime to a "civil" penality (meaning those found guilty only had to pay a fine, not go to jail), though possessing, selling, or producing larger quantities remained illegal. Then in 2016, the state fully legalized marijuana for recreational use, not only making possession of small amounts legal, but also creating a regulatory system for producing marijuana for sale. 

Proponents of drug legalization have long argued that the war on drugs contributes to violent crime by creating an opportunity for drug dealers and organized crime to sell and distribute drugs, a business which tends to generate violence when gangs battle over territory. According to this theory, with drug legalization, we should see violent crime decrease after legalization in places where drug arrests had previously been common. In this exercise, we will explore this argument and explore the relationship between drug legalization and violent crime.

**To be clear,** drug legalization is a complex issue and far more study than what we will do here is required to understand its complexities! This exercise is meant to help you think through how to address data science questions programmatically.

## EXERCISE 1: Loading our data pre-legalization

**(1)** We will begin by examining [county-level data on arrests from California in 2009](https://github.com/nickeubank/practicaldatascience/tree/master/Example_Data/ca), which is derived from data provided by the Office of the California State Attorney General [here](https://openjustice.doj.ca.gov/data). Load the file `ca_arrests_2009.csv`. 

In [60]:
import pandas as pd
import numpy as np
pd.set_option('display.max_rows', 20)

file2009 = "data/ca_arrests_2009.csv"
df_arrests_2009 = pd.read_csv(file2009)
df_arrests_2009

Unnamed: 0.1,Unnamed: 0,COUNTY,VIOLENT,PROPERTY,F_DRUGOFF,F_SEXOFF,F_ALLOTHER,F_TOTAL,M_TOTAL,S_TOTAL
0,1682,Alameda County,4318,4640,5749,260,3502,18469,37247,431
1,1683,Alpine County,8,4,2,1,1,16,83,0
2,1684,Amador County,100,59,101,5,199,464,801,2
3,1685,Butte County,641,602,542,34,429,2248,9026,1
4,1686,Calaveras County,211,83,123,14,70,501,968,3
...,...,...,...,...,...,...,...,...,...,...
53,1735,Tulare County,2183,2080,1681,170,1767,7881,15368,301
54,1736,Tuolumne County,160,252,209,14,261,896,2062,83
55,1737,Ventura County,2275,2425,2040,120,2083,8943,25762,1734
56,1738,Yolo County,585,634,614,39,662,2534,5426,73


**(2)** Use your data exploration skills to get a feel for this data. If you need to, you can find the [original codebook here](https://data-openjustice.doj.ca.gov/sites/default/files/dataset/2019-07/Arrests%20Context_062119.pdf) (This data are similar, but have been collapsed to one observation per county.)

**(3)** Figuring out what county has the most violent arrests isn't very meaningful if we don't normalize for size. A county with 10 people and 10 arrests for violent crimes is obviously worse than a county with 1,000,000 people an 11 arrests for violent crime. 

To address this, also import `nhgis_county_populations.csv`.

In [61]:
pop_file = "data/nhgis_county_populations.csv"
df_pop = pd.read_csv(pop_file)
df_pop

Unnamed: 0.1,Unnamed: 0,YEAR,STATE,COUNTY,total_population
0,0,2005-2009,Alabama,Autauga County,49584
1,1,2005-2009,Alabama,Baldwin County,171997
2,2,2005-2009,Alabama,Barbour County,29663
3,3,2005-2009,Alabama,Bibb County,21464
4,4,2005-2009,Alabama,Blount County,56804
...,...,...,...,...,...
6436,3215,2013-2017,Puerto Rico,Vega Baja Municipio,54754
6437,3216,2013-2017,Puerto Rico,Vieques Municipio,8931
6438,3217,2013-2017,Puerto Rico,Villalba Municipio,23659
6439,3218,2013-2017,Puerto Rico,Yabucoa Municipio,35025


**(4)** Use your data exploration skills to get used to these data and figure out how they relates to your 2009 arrest data. Determine the meaning of the various columns and check the data for completeness

## EXERCISE 2: Merging our data


**(5)** Once you feel like you have a good sense of the relation between our arrest and population data, merge the two datasets. You may need to filter the data first. Do both datasets cover all states or just some or just one? Which years do you care about in this case?

In [62]:
# Filter the data so we're only using data from the correct year and the correct State
df_pop_filtered_2009 = df_pop.loc[df_pop["YEAR"]=="2005-2009"]
df_pop_filtered_2009 = df_pop_filtered_2009.loc[df_pop_filtered_2009["STATE"]=="California"]
df_pop_filtered_2009.loc[df_pop_filtered_2009["COUNTY"]=="Alameda County"]

Unnamed: 0.1,Unnamed: 0,YEAR,STATE,COUNTY,total_population
186,186,2005-2009,California,Alameda County,1457095



**(6)** Now repeat your previous merge using *both* the `validate` keyword *and* the `indicator` keyword with `how='outer'` as discussed in the last reading to help you debug your merge. 

In [63]:
df_2009 = pd.merge(df_arrests_2009, df_pop_filtered_2009, how='left', on="COUNTY", validate="1:1")
df_2009

Unnamed: 0,Unnamed: 0_x,COUNTY,VIOLENT,PROPERTY,F_DRUGOFF,F_SEXOFF,F_ALLOTHER,F_TOTAL,M_TOTAL,S_TOTAL,Unnamed: 0_y,YEAR,STATE,total_population
0,1682,Alameda County,4318,4640,5749,260,3502,18469,37247,431,186.0,2005-2009,California,1457095.0
1,1683,Alpine County,8,4,2,1,1,16,83,0,187.0,2005-2009,California,1153.0
2,1684,Amador County,100,59,101,5,199,464,801,2,188.0,2005-2009,California,38039.0
3,1685,Butte County,641,602,542,34,429,2248,9026,1,189.0,2005-2009,California,217917.0
4,1686,Calaveras County,211,83,123,14,70,501,968,3,190.0,2005-2009,California,46548.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
53,1735,Tulare County,2183,2080,1681,170,1767,7881,15368,301,239.0,2005-2009,California,416299.0
54,1736,Tuolumne County,160,252,209,14,261,896,2062,83,240.0,2005-2009,California,55761.0
55,1737,Ventura County,2275,2425,2040,120,2083,8943,25762,1734,241.0,2005-2009,California,792313.0
56,1738,Yolo County,585,634,614,39,662,2534,5426,73,242.0,2005-2009,California,192974.0


In [71]:
df_2009 = pd.merge(df_arrests_2009, df_pop_filtered_2009, how='outer', on="COUNTY", validate="1:1", indicator=True)
df_2009._merge.value_counts()

_merge
both          56
left_only      2
right_only     2
Name: count, dtype: int64

In [72]:
df_2009[df_2009._merge != "both"]

Unnamed: 0,Unnamed: 0_x,COUNTY,VIOLENT,PROPERTY,F_DRUGOFF,F_SEXOFF,F_ALLOTHER,F_TOTAL,M_TOTAL,S_TOTAL,Unnamed: 0_y,YEAR,STATE,total_population,_merge
7,1689.0,Del Norte County,144.0,104.0,79.0,13.0,97.0,437.0,1268.0,5.0,,,,,left_only
13,1695.0,Inyo County,81.0,44.0,39.0,3.0,38.0,205.0,851.0,1.0,,,,,left_only
58,,DelNorte County,,,,,,,,,193.0,2005-2009,California,28729.0,right_only
59,,Injo County,,,,,,,,,199.0,2005-2009,California,17438.0,right_only


**(7)** You *should* be able to get to the point that all counties in our arrest data merge with population data. Can you figure out why that did not happen? Using the tools we just discussed, look for any inconsistencies across the two datasets to see if anything did not match when it should have. Can you fix the data so that they all merge with population data? *Hint: what are the DataFrames being merged on? Does it match across both DataFrames?*

In [73]:
df_pop_fixed_2009 = df_pop_filtered_2009.replace(to_replace=["DelNorte County","Injo County"], value=["Del Norte County", "Inyo County"])
df_pop_fixed_2009

Unnamed: 0.1,Unnamed: 0,YEAR,STATE,COUNTY,total_population
186,186,2005-2009,California,Alameda County,1457095
187,187,2005-2009,California,Alpine County,1153
188,188,2005-2009,California,Amador County,38039
189,189,2005-2009,California,Butte County,217917
190,190,2005-2009,California,Calaveras County,46548
...,...,...,...,...,...
239,239,2005-2009,California,Tulare County,416299
240,240,2005-2009,California,Tuolumne County,55761
241,241,2005-2009,California,Ventura County,792313
242,242,2005-2009,California,Yolo County,192974


In [74]:
df_2009 = pd.merge(df_arrests_2009, df_pop_fixed_2009, how='outer', on="COUNTY", validate="1:1", indicator=True)
df_2009._merge.value_counts()

_merge
both          58
left_only      0
right_only     0
Name: count, dtype: int64

In [75]:
df_2009

Unnamed: 0,Unnamed: 0_x,COUNTY,VIOLENT,PROPERTY,F_DRUGOFF,F_SEXOFF,F_ALLOTHER,F_TOTAL,M_TOTAL,S_TOTAL,Unnamed: 0_y,YEAR,STATE,total_population,_merge
0,1682,Alameda County,4318,4640,5749,260,3502,18469,37247,431,186,2005-2009,California,1457095,both
1,1683,Alpine County,8,4,2,1,1,16,83,0,187,2005-2009,California,1153,both
2,1684,Amador County,100,59,101,5,199,464,801,2,188,2005-2009,California,38039,both
3,1685,Butte County,641,602,542,34,429,2248,9026,1,189,2005-2009,California,217917,both
4,1686,Calaveras County,211,83,123,14,70,501,968,3,190,2005-2009,California,46548,both
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
53,1735,Tulare County,2183,2080,1681,170,1767,7881,15368,301,239,2005-2009,California,416299,both
54,1736,Tuolumne County,160,252,209,14,261,896,2062,83,240,2005-2009,California,55761,both
55,1737,Ventura County,2275,2425,2040,120,2083,8943,25762,1734,241,2005-2009,California,792313,both
56,1738,Yolo County,585,634,614,39,662,2534,5426,73,242,2005-2009,California,192974,both


## EXERCISE 3: Calculating arrest rates and gathering 2018 data

**(8)** Now that we have arrest counts and population data, we can calculate arrest *rates*. For each county, create a new variable called `violent_arrest_rate_2009` that is the number of violent arrests for 2009 divided by the population of the county from 2005-2009, and an analogous variable for drug offenses (`F_DRUGOFF`) called `f_drugoff_arrest_rate_2009`. 

In [76]:
df_2009['violent_arrest_rate_2009'] = df_2009['VIOLENT'] / df_2009['total_population']
df_2009['f_drugoff_arrest_rate_2009'] = df_2009['F_DRUGOFF'] / df_2009['total_population']

**(9)** Just as we created violent arrest rates and drug arrest rates for 2009, now we want to do it for 2018, so we can work towards comparing the two. Using the data on 2018 arrests (ca_arrests_2018.csv) and the same dataset of population data (you'll use population from 2013-2017 this time), create a dataset of arrest rates. 

As before, *be careful with your merges!!!*

In [77]:
file = "data/ca_arrests_2018.csv"
df_arrests_2018 = pd.read_csv(file)
df_arrests_2018

Unnamed: 0.1,Unnamed: 0,COUNTY,VIOLENT,PROPERTY,F_DRUGOFF,F_SEXOFF,F_ALLOTHER,F_TOTAL,M_TOTAL,S_TOTAL
0,2204,Alameda County,4132,3051,1062,173,2619,11037,28305,82
1,2205,Alpine County,5,2,1,0,3,11,41,0
2,2206,Amador County,72,40,31,3,142,288,701,1
3,2207,Butte County,785,437,229,47,741,2239,8853,1
4,2208,Calaveras County,147,42,29,6,96,320,897,0
...,...,...,...,...,...,...,...,...,...,...
53,2257,Tulare County,2322,1495,931,107,1741,6596,15163,360
54,2258,Tuolumne County,203,122,87,9,311,732,2019,12
55,2259,Ventura County,2271,1535,732,80,2193,6811,26612,630
56,2260,Yolo County,575,351,120,22,518,1586,4602,6


In [78]:
df_pop_filtered_2018 = df_pop.loc[df_pop["YEAR"]=="2013-2017"]
df_pop_filtered_2018 = df_pop_filtered_2018.loc[df_pop_filtered_2018["STATE"]=="California"]
df_pop_filtered_2018.loc[df_pop_filtered_2018["COUNTY"]=="Alameda County"]

Unnamed: 0.1,Unnamed: 0,YEAR,STATE,COUNTY,total_population
3407,186,2013-2017,California,Alameda County,1629615


In [79]:
df_pop_fixed_2018 = df_pop_filtered_2018.replace(to_replace=["DelNorte County","Injo County"], value=["Del Norte County", "Inyo County"])
df_pop_fixed_2018

Unnamed: 0.1,Unnamed: 0,YEAR,STATE,COUNTY,total_population
3407,186,2013-2017,California,Alameda County,1629615
3408,187,2013-2017,California,Alpine County,1203
3409,188,2013-2017,California,Amador County,37306
3410,189,2013-2017,California,Butte County,225207
3411,190,2013-2017,California,Calaveras County,45057
...,...,...,...,...,...
3460,239,2013-2017,California,Tulare County,458809
3461,240,2013-2017,California,Tuolumne County,53899
3462,241,2013-2017,California,Ventura County,847834
3463,242,2013-2017,California,Yolo County,212605


In [80]:
df_2018 = pd.merge(df_arrests_2018, df_pop_fixed_2018, how='outer', on="COUNTY", validate="1:1", indicator=True)
df_2018._merge.value_counts()

_merge
both          58
left_only      0
right_only     0
Name: count, dtype: int64

**(10)** Go ahead and calculate the arrest rates for the 2018 dataset as well. For each county, create a new variable called `violent_arrest_rate_2018` that is the number of violent arrests for 2018 divided by the population of the county from 2013-2017, and an analogous variable for drug offenses (`F_DRUGOFF`) called `f_drugoff_arrest_rate_2018`. 

In [81]:
df_2018['violent_arrest_rate_2018'] = df_2018['VIOLENT'] / df_2018['total_population']
df_2018['f_drugoff_arrest_rate_2018'] = df_2018['F_DRUGOFF'] / df_2018['total_population']

## EXERCISE 4: Comparing 2009 with 2018 Arrests - Repeating your merge from the 2009 data

If we plotted our rate data for 2009 (violent crime arrest rate vs felony drug arrest rate) it would show that drug arrests and violent crime arrests tend to be positively correlated, but that does not tell us much about whether they are *causally* related. It *could* be the case that people dealing drugs *causes* more violent crime, but it could also be that certain communities, for some other reason, tend to have *both* more drug sales *and* more violent crime. 

So to test for this, we went to see if the same communities that had violent crime in 2009 *also* have violent crime in 2019 (after marijuana legalization). If these communities have just as much crime in 2018, that would suggest that violent crime is being driven by a third factor, and not drug sales of marijuana. 

**(11)** Merge the two county-level datasets so you have one row for each county, and variables for violent arrest rates in 2018, violent arrest rates in 2009, felony drug arrest rates in 2018, and felony drug arrest rates in 2009. You will need at least 5 columns from this going forward (you're welcome to drop the rest for the remainder of the analysis):
1. COUNTY
2. violent_arrest_rate_2009
3. violent_arrest_rate_2018
4. f_drug_arrest_rate_2009
4. f_drug_arrest_rate_2018

*Hints and notes*: 

- If you used `indicator = True`, you may need to drop the `_merge` columns before merging from each dataset
- Note that since you'll be merging two DataFrames with the same column names, when you merge them it will create two versions from each dataset, one from the first DataFrame you list in the merge (which will be appended with '_x' in the column name and one from the second DataFrame you list in the merge which will be appended with '_y' in the column name)
- At any time you can use the `rename` method in pandas to adjust column names if it makes them easier for you to understand

In [82]:
df_2009 = df_2009.drop(columns='_merge')
df_2018 = df_2018.drop(columns='_merge')
df_all = pd.merge(df_2009, df_2018, how='outer', on="COUNTY", validate="1:1", indicator=True)
df_all._merge.value_counts()

_merge
both          58
left_only      0
right_only     0
Name: count, dtype: int64

In [83]:
df_all

Unnamed: 0,Unnamed: 0_x_x,COUNTY,VIOLENT_x,PROPERTY_x,F_DRUGOFF_x,F_SEXOFF_x,F_ALLOTHER_x,F_TOTAL_x,M_TOTAL_x,S_TOTAL_x,...,F_TOTAL_y,M_TOTAL_y,S_TOTAL_y,Unnamed: 0_y_y,YEAR_y,STATE_y,total_population_y,violent_arrest_rate_2018,f_drugoff_arrest_rate_2018,_merge
0,1682,Alameda County,4318,4640,5749,260,3502,18469,37247,431,...,11037,28305,82,186,2013-2017,California,1629615,0.002536,0.000652,both
1,1683,Alpine County,8,4,2,1,1,16,83,0,...,11,41,0,187,2013-2017,California,1203,0.004156,0.000831,both
2,1684,Amador County,100,59,101,5,199,464,801,2,...,288,701,1,188,2013-2017,California,37306,0.001930,0.000831,both
3,1685,Butte County,641,602,542,34,429,2248,9026,1,...,2239,8853,1,189,2013-2017,California,225207,0.003486,0.001017,both
4,1686,Calaveras County,211,83,123,14,70,501,968,3,...,320,897,0,190,2013-2017,California,45057,0.003263,0.000644,both
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
53,1735,Tulare County,2183,2080,1681,170,1767,7881,15368,301,...,6596,15163,360,239,2013-2017,California,458809,0.005061,0.002029,both
54,1736,Tuolumne County,160,252,209,14,261,896,2062,83,...,732,2019,12,240,2013-2017,California,53899,0.003766,0.001614,both
55,1737,Ventura County,2275,2425,2040,120,2083,8943,25762,1734,...,6811,26612,630,241,2013-2017,California,847834,0.002679,0.000863,both
56,1738,Yolo County,585,634,614,39,662,2534,5426,73,...,1586,4602,6,242,2013-2017,California,212605,0.002705,0.000564,both


In [84]:
df = df_all[['COUNTY','violent_arrest_rate_2009','violent_arrest_rate_2018','f_drugoff_arrest_rate_2009','f_drugoff_arrest_rate_2018']]
df

Unnamed: 0,COUNTY,violent_arrest_rate_2009,violent_arrest_rate_2018,f_drugoff_arrest_rate_2009,f_drugoff_arrest_rate_2018
0,Alameda County,0.002963,0.002536,0.003946,0.000652
1,Alpine County,0.006938,0.004156,0.001735,0.000831
2,Amador County,0.002629,0.001930,0.002655,0.000831
3,Butte County,0.002941,0.003486,0.002487,0.001017
4,Calaveras County,0.004533,0.003263,0.002642,0.000644
...,...,...,...,...,...
53,Tulare County,0.005244,0.005061,0.004038,0.002029
54,Tuolumne County,0.002869,0.003766,0.003748,0.001614
55,Ventura County,0.002871,0.002679,0.002575,0.000863
56,Yolo County,0.003031,0.002705,0.003182,0.000564


**(12)** Did drug arrests go down from 2009 to 2018? (they sure better! This is what's called a "sanity check" of your data and analysis. If you find drug arrests went *up*, you know something went wrong with your code or your understanding of the situations. To verify this, compute the difference between the 2018 drug rate and that of 2009 and review those values sorted from smallest to largest. How many of the values were less than zero (meaning the rate decreased). For how many counties did the rate increase? Calculate the average percentage change in felony drug arrests across all counties.

As a reminder, percentage change can be calculated as follows where $x_{2018}$ is the respective rate for the year 2018:
$$\frac{x_{2018} - x_{2009}}{x_{2009}} \times 100$$

In [85]:
percent_change = (df['f_drugoff_arrest_rate_2018'] - df['f_drugoff_arrest_rate_2009']) / df['f_drugoff_arrest_rate_2009'] * 100
percent_change.sort_values()


51   -94.758402
37   -89.374008
40   -85.903547
23   -85.745652
9    -85.495774
        ...    
16   -31.775924
57   -26.167177
24     1.608074
52    27.129014
17    28.153616
Length: 58, dtype: float64

In [86]:
percent_change.values.mean()

-66.31166718090417

**Note this average percentage change in the felony drug arrest rate value - you will need to submit this at the end of this week for the final Quiz.**

**(13)** Now we want to look at whether violent crime decreased following drug legalization. Did the average violent arrest rate decrease? By how much? (Note: We're assuming that arrest rates are proportionate to crime rates. If policing increased so that there were more arrests per crime committed, that would impact our interpretation of these results. But this is just an exercise, so we'll keep it simple)

In [87]:
percent_change = (df['violent_arrest_rate_2018'] - df['violent_arrest_rate_2009']) / df['violent_arrest_rate_2009'] * 100
percent_change.sort_values()

21   -49.413863
37   -47.787617
12   -40.466548
1    -40.097672
14   -29.317117
        ...    
17    24.563705
13    26.602818
54    31.258036
24    42.683678
15    51.025472
Length: 58, dtype: float64

In [88]:
percent_change.values.mean()

-6.770989120747971

**Note this average percentage change in the violent crime arrest rate value - you will need to submit this at the end of this week for the final Quiz.**

## EXERCISE 5: Diving deeper into the post-legalization changes

**(14)** So we've determined that both drug arrests and violent crime arrests were decreasing over this period. But maybe *all* crime was just falling, and this isn't about drug legalization. 

This is the problem with a "pre-to-post" analysis: yes, our results are *consistent* with the idea that drug legalization reduced violent crime, but lots of things happened between 2009 and 2018, not just drug legalization, so we don't know that drug legalization *caused* the decline in violent crime. 

So let's do a kind of difference-in-difference analysis. We know that drug legalization should have had a bigger effect on counties that had higher drug arrest rates prior to drug legalization. After all, in a county that had no drug arrests, legalization wouldn't do anything, would it? 

So let's split our sample into two groups: high drug arrests in 2009, and low drug arrests in 2009 (split the counties at the average drug arrest rate in 2009). 

In [89]:
average = df['f_drugoff_arrest_rate_2009'].mean()
high_drug_arrests_2009 = df.loc[df['f_drugoff_arrest_rate_2009']>average]
low_drug_arrests_2009 = df.loc[df['f_drugoff_arrest_rate_2009']<=average]


**(15)** Now we can ask: did violent crime fall *more* from 2009 to 2018 in the counties that had lots of drug arrests in 2009 (where legalization likely had more of an effect) than in counties with fewer drug arrests in 2009 (where legalization likely mattered less)? Calculate this using what we call a difference-in-differences, which can be computed as follows:

(the change in violent crime rate for counties with lots of drug arrests in 2009) - (the change in violent crime rate for counties with few drug arrests in 2009)

In [93]:
change_high = high_drug_arrests_2009['violent_arrest_rate_2018'] - high_drug_arrests_2009['violent_arrest_rate_2009']
change_low = low_drug_arrests_2009['violent_arrest_rate_2018'] - low_drug_arrests_2009['violent_arrest_rate_2009']
print(change_high.mean())
print(change_low.mean())

-0.00043197415958967417
-0.00018458037097611614


**(16)** Hmmm... we showed that there was a greater *absolute* decline in violent arrest rates in counties more impacted by drug legalization. But was there also a greater *proportionate* decline?

Repeat the above calculation but for percentage change:

(the percentage change in violent crime rate for counties with lots of drug arrests in 2009) - (the percentage change in violent crime rate for counties with few drug arrests in 2009)

In [94]:
def percent_change(x_new,x_base):
    return (x_new - x_base) / x_base * 100 

In [95]:
pchange_high = percent_change(high_drug_arrests_2009['violent_arrest_rate_2018'],high_drug_arrests_2009['violent_arrest_rate_2009'])
pchange_low = percent_change(low_drug_arrests_2009['violent_arrest_rate_2018'],low_drug_arrests_2009['violent_arrest_rate_2009'])
print(pchange_high.mean())
print(pchange_low.mean())

-9.736209750365193
-4.822415564142368


**Note your output here: the percentage change for the case of both the high and the low 2009 drug arrest rate groups in 2009 - you will need to submit this at the end of this week for the final Quiz.**

**(17)** What are your conclusions about the relationship between violent crime and drug legalization, give your analysis above?