### Merging datasets: adding MSA codes and states classifications to FBI crime data.
#### Author: Marla
#### Last modified: Dec 3, 11:50pm
This notebook adds MSA codes to the FBI crime data. MSA codes are taken from census data file. It also merges in data on firearms using zip codes.

Since MSA names codes can be different in different years as statistical areas change we seperate out both the crime data and the census data by year and merge each individuall then append the datasets back together again. This works reasonably well but there are ~40 MSAs in the FBI data whose names do not match directly on the string names to the string names in the census data. These will have to be manually modified, but we will do that at a later date for now just using this rough match to move forward with EDA. 

Note that CBSA codes comprise of Metropolitan and Micropolitan Statistical areas. 

**Inputs:**
    - murder_data.csv
    - MSA_codes_from_census.csv
    - state-firearms
**Output:** 
    - df_crime_gunlaw_msacodes.csv

In [1]:
#set up
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt

### Add Census MSA Codes to FBI MSA Names

In [2]:
#prep different data frames
df_crime=pd.read_csv('raw data/murder_data.csv')
df_msa_census_code=pd.read_csv('raw data/MSA_codes_from_census.csv')


In [3]:
#general clean up of msa names
df_crime = df_crime.rename(columns={'MSA': 'MSA_name'})
df_msa_census_code=df_msa_census_code.replace({ " Micro Area" : "" }, regex=True)
df_msa_census_code=df_msa_census_code.replace({ " Metro Area" : "" }, regex=True)
df_crime['MSA_original_name']=df_crime['MSA_name']
df_crime['MSA_name'].replace('M.S.A','',regex=True,inplace=True)
df_crime['MSA_name'].replace('M.D.','',regex=True,inplace=True)
df_crime['MSA_name'].replace(' 1','',regex=True,inplace=True)
df_crime['MSA_name'].replace(' 4','',regex=True,inplace=True)
df_crime['MSA_name'].replace(' 2','',regex=True,inplace=True)
df_crime['MSA_name'].replace(' 3','',regex=True,inplace=True)
df_crime['MSA_name'].replace(', 5','',regex=True,inplace=True)
df_crime['MSA_name'] = df_crime['MSA_name'].str.strip()

In [4]:
#set both datasets so that MSA_name is the index in the code dataframe
df_msa_census_code=df_msa_census_code.set_index('MSA_name')


In [5]:
#seperate out years for crime data and census data codes, make seperate dataframes fore each. 
#this makes merging easier since both the FBI data and census data MSA names vary between years.
crime_dic={}
code_dic={}
for y in range(2006, 2017): 
    df_new_crime = df_crime.loc[df_crime['Year'] == y]
    crime_dic[y]= df_new_crime
    df_new_code = df_msa_census_code.loc[df_msa_census_code['year']==y]
    code_dic[y]=df_new_code




In [6]:
print(df_crime.shape)
print(df_msa_census_code.shape)

(4114, 7)
(5694, 2)


#### Clean MSA Names
The MSA names in the crime and census list do not always match perfectly. The mismatches vary slightly between years as MSA names vary between years in both the FBI and the census data. We go through year by year and correct as many of these as possible. Two important notes: 
- There are still a handful of MSAs in the FBI data for each year where no matching MSA can be found in the census data. These are listed below the following code by year. They are dropped when the FBI and census codes are joined. 
- There are some cases were two FBI MSAs match to one census MSA.  

In [7]:
#clean up names of msa


#2006
crime_dic[2006].MSA_name = crime_dic[2006].MSA_name.replace('Aguadilla-Isabela-San Sebastian, Puerto Rico','Aguadilla-Isabela-San Sebastián, Puerto Rico')
crime_dic[2006].MSA_name = crime_dic[2006].MSA_name.replace('Fort Worth-Arlington, TX','Dallas-Fort Worth-Arlington, TX')
crime_dic[2006].MSA_name = crime_dic[2006].MSA_name.replace('Dallas-Plano-Irving, TX','Dallas-Fort Worth-Arlington, TX')
crime_dic[2006].MSA_name = crime_dic[2006].MSA_name.replace('Detroit-Livonia-Dearborn, MI','Detroit-Warren-Livonia, MI')
crime_dic[2006].MSA_name = crime_dic[2006].MSA_name.replace('Fort Lauderdale-Pompano Beach-Deerfield Beach, FL','Miami-Fort Lauderdale-Miami Beach, FL')
crime_dic[2006].MSA_name = crime_dic[2006].MSA_name.replace('Miami-Miami Beach-Kendall, FL','Miami-Fort Lauderdale-Miami Beach, FL')
crime_dic[2006].MSA_name = crime_dic[2006].MSA_name.replace('Port St. Lucie-Fort Pierce, FL','Port St. Lucie, FL')
crime_dic[2006].MSA_name = crime_dic[2006].MSA_name.replace('Greenville-Mauldin-Easley, SC','Greenville, SC')
crime_dic[2006].MSA_name = crime_dic[2006].MSA_name.replace('Seattle-Bellevue-Everett, WA','Seattle-Tacoma-Bellevue, WA')
crime_dic[2006].MSA_name = crime_dic[2006].MSA_name.replace('Little Rock-North Little Rock-Conway, AR','Little Rock-North Little Rock, AR')
crime_dic[2006].MSA_name = crime_dic[2006].MSA_name.replace('Los Angeles-Long Beach-Glendale, CA','Los Angeles-Long Beach-Santa Ana, CA')
crime_dic[2006].MSA_name = crime_dic[2006].MSA_name.replace('New York-White Plains-Wayne, NY-NJ','New York-Northern New Jersey-Long Island, NY-NJ-PA')
crime_dic[2006].MSA_name = crime_dic[2006].MSA_name.replace('Philadelphia, PA','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2006].MSA_name = crime_dic[2006].MSA_name.replace('Oakland-Fremont-Hayward, CA','San Francisco-Oakland-Fremont, CA')
crime_dic[2006].MSA_name = crime_dic[2006].MSA_name.replace('San Francisco-San Mateo-Redwood City, CA','San Francisco-Oakland-Fremont, CA')
crime_dic[2006].MSA_name = crime_dic[2006].MSA_name.replace('Santa Barbara-Santa Maria-Goleta, CA','Santa Barbara-Santa Maria, CA')
crime_dic[2006].MSA_name = crime_dic[2006].MSA_name.replace('Tacoma, WA','Seattle-Tacoma-Bellevue, WA')
crime_dic[2006].MSA_name = crime_dic[2006].MSA_name.replace('Wilmington, DE-MD-NJ','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')

#need to find a way to replace this one with regular expressions
#crime_dic[2006].MSA_name = crime_dic[2006].MSA_name.replace('Nashville-Davidsonî ºMurfreesboro-Franklin, TN','Nashville-Davidson--Murfreesboro, TN')
#crime_dic[2006].MSA_name = crime_dic[2006].MSA_name.replace('Scrantonî ºWilkes-Barre, PA','Scranton--Wilkes-Barre, PA')

#2007
crime_dic[2007].MSA_name = crime_dic[2007].MSA_name.replace('Oakland-Fremont-Hayward, CA','San Francisco-Oakland-Fremont, CA')
crime_dic[2007].MSA_name = crime_dic[2007].MSA_name.replace('Los Angeles-Long Beach-Glendale, CA','Los Angeles-Long Beach-Santa Ana, CA')
crime_dic[2007].MSA_name = crime_dic[2007].MSA_name.replace('Bradenton-Sarasota-Venice, FL','Sarasota-Bradenton-Venice, FL')
crime_dic[2007].MSA_name = crime_dic[2007].MSA_name.replace('Lakeland-Winter Haven, FL', 'Lakeland, FL')
crime_dic[2007].MSA_name = crime_dic[2007].MSA_name.replace('Miami-Miami Beach-Kendall, FL','Miami-Fort Lauderdale-Pompano Beach, FL')
crime_dic[2007].MSA_name = crime_dic[2007].MSA_name.replace('Louisville/Jefferson County, KY-IN','Louisville-Jefferson County, KY-IN')
crime_dic[2007].MSA_name = crime_dic[2007].MSA_name.replace('Boston-Quincy, MA','Boston-Cambridge-Quincy, MA-NH')
crime_dic[2007].MSA_name = crime_dic[2007].MSA_name.replace('Wilmington, DE-MD-NJ','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2007].MSA_name = crime_dic[2007].MSA_name.replace('Detroit-Livonia-Dearborn, MI','Detroit-Warren-Livonia, MI')
crime_dic[2007].MSA_name = crime_dic[2007].MSA_name.replace('Warren-Troy-Farmington Hills, MI','Detroit-Warren-Livonia, MI')
crime_dic[2007].MSA_name = crime_dic[2007].MSA_name.replace('Atlantic City-Hammonton, NJ','Atlantic City, NJ')
crime_dic[2007].MSA_name = crime_dic[2007].MSA_name.replace('Philadelphia, PA','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2007].MSA_name = crime_dic[2007].MSA_name.replace('ScrantonÃÂ¢Ã¥ÃÃ¥ÃWilkes-Barre, PA','Scranton--Wilkes-Barre, PA')
crime_dic[2007].MSA_name = crime_dic[2007].MSA_name.replace('Charleston-North Charleston-Summerville, SC','Charleston-North Charleston, SC')
crime_dic[2007].MSA_name = crime_dic[2007].MSA_name.replace('Myrtle Beach-North Myrtle Beach-Conway, SC','Myrtle Beach-Conway-North Myrtle Beach, SC')
crime_dic[2007].MSA_name = crime_dic[2007].MSA_name.replace('Nashville-DavidsonÃÂ¢Ã¥ÃÃ¥ÃMurfreesboroÃÂ¢Ã¥ÃÃ¥ÃFranklin, TN','Nashville-Davidson--Murfreesboro--Franklin, TN')
crime_dic[2007].MSA_name = crime_dic[2007].MSA_name.replace('Dallas-Plano-Irving, TX','Dallas-Fort Worth-Arlington, TX')
crime_dic[2007].MSA_name = crime_dic[2007].MSA_name.replace('Fort Worth-Arlington, TX','Dallas-Fort Worth-Arlington, TX')
crime_dic[2007].MSA_name = crime_dic[2007].MSA_name.replace('Kennewick-Pasco-Richland, WA','Kennewick-Richland-Pasco, WA')
crime_dic[2007].MSA_name = crime_dic[2007].MSA_name.replace('Seattle-Bellevue-Everett, WA','Seattle-Tacoma-Bellevue, WA')

#2008
crime_dic[2008].MSA_name = crime_dic[2008].MSA_name.replace('San Francisco-San Mateo-Redwood City, CA','San Francisco-Oakland-Fremont, CA')
crime_dic[2008].MSA_name = crime_dic[2008].MSA_name.replace('Santa Ana-Anaheim-Irvine, CA','Los Angeles-Long Beach-Santa Ana, CA')
crime_dic[2008].MSA_name = crime_dic[2008].MSA_name.replace('Madera-Chowchilla, CA','Madera, CA')
crime_dic[2008].MSA_name = crime_dic[2008].MSA_name.replace('SacramentoÃÂ¢Ã¥ÃÃ¥ÃArden-ArcadeÃÂ¢Ã¥ÃÃ¥ÃRoseville, CA','Sacramento--Arden-Arcade--Roseville, CA')
crime_dic[2008].MSA_name = crime_dic[2008].MSA_name.replace('Oakland-Fremont-Hayward, CA','San Francisco-Oakland-Fremont, CA')
crime_dic[2008].MSA_name = crime_dic[2008].MSA_name.replace('Los Angeles-Long Beach-Glendale, CA','Los Angeles-Long Beach-Santa Ana, CA')
crime_dic[2008].MSA_name = crime_dic[2008].MSA_name.replace('Denver-Aurora-Broomfield, CO','Denver-Aurora, CO')
crime_dic[2008].MSA_name = crime_dic[2008].MSA_name.replace('Miami-Miami Beach-Kendall, FL','Miami-Fort Lauderdale-Pompano Beach, FL')
crime_dic[2008].MSA_name = crime_dic[2008].MSA_name.replace('Panama City-Lynn Haven-Panama City Beach, FL','Panama City-Lynn Haven, FL')
crime_dic[2008].MSA_name = crime_dic[2008].MSA_name.replace('Fort Lauderdale-Pompano Beach-Deerfield Beach, FL','Miami-Fort Lauderdale-Pompano Beach, FL')
crime_dic[2008].MSA_name = crime_dic[2008].MSA_name.replace('Louisville/Jefferson County, KY-IN','Louisville-Jefferson County, KY-IN')
crime_dic[2008].MSA_name = crime_dic[2008].MSA_name.replace('Boston-Quincy, MA','Boston-Cambridge-Quincy, MA-NH')
crime_dic[2008].MSA_name = crime_dic[2008].MSA_name.replace('Durham-Chapel Hill, NC','Durham, NC')
crime_dic[2008].MSA_name = crime_dic[2008].MSA_name.replace('New York-White Plains-Wayne, NY-NJ','New York-Northern New Jersey-Long Island, NY-NJ-PA')
crime_dic[2008].MSA_name = crime_dic[2008].MSA_name.replace('Camden, NJ','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2008].MSA_name = crime_dic[2008].MSA_name.replace('Wilmington, DE-MD-NJ','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2008].MSA_name = crime_dic[2008].MSA_name.replace('ScrantonÃÂ¢Ã¥ÃÃ¥ÃWilkes-Barre, PA','Scranton--Wilkes-Barre, PA')
crime_dic[2008].MSA_name = crime_dic[2008].MSA_name.replace('Philadelphia, PA','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2008].MSA_name = crime_dic[2008].MSA_name.replace('Nashville-DavidsonÃÂ¢Ã¥ÃÃ¥Ã MurfreesboroÃÂ¢Ã¥ÃÃ¥ÃFranklin, TN','Nashville-Davidson--Murfreesboro--Franklin, TN')
crime_dic[2008].MSA_name = crime_dic[2008].MSA_name.replace('Dallas-Plano-Irving, TX','Dallas-Fort Worth-Arlington, TX')
crime_dic[2008].MSA_name = crime_dic[2008].MSA_name.replace('Fort Worth-Arlington, TX','Dallas-Fort Worth-Arlington, TX')
crime_dic[2008].MSA_name = crime_dic[2008].MSA_name.replace('Wenatchee-East Wenatchee, WA','Wenatchee, WA')
crime_dic[2008].MSA_name = crime_dic[2008].MSA_name.replace('Aguadilla-Isabela-San Sebastian, Puerto Rico','Aguadilla-Isabela-San Sebastián, Puerto Rico')
crime_dic[2008].MSA_name = crime_dic[2008].MSA_name.replace('San German-Cabo Rojo, Puerto Rico','San Germán-Cabo Rojo, Puerto Rico')
crime_dic[2008].MSA_name = crime_dic[2008].MSA_name.replace('Mayaguez, Puerto Rico','Mayagüez, Puerto Rico')

#2009
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('Phoenix-Mesa-Glendale, AZ','Phoenix-Mesa-Scottsdale, AZ')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('San Francisco-San Mateo-Redwood City, CA','San Francisco-Oakland-Fremont, CA')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('SacramentoÃÂ¢Ã¥ÃÃ¥ÃArden-ArcadeÃÂ¢Ã¥ÃÃ¥Ã Roseville, CA','Sacramento--Arden-Arcade--Roseville, CA')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('Bakersfield-Delano, CA','Bakersfield, CA')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('Los Angeles-Long Beach-Glendale, CA','Los Angeles-Long Beach-Santa Ana, CA')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('Oakland-Fremont-Hayward, CA','San Francisco-Oakland-Fremont, CA')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('San Francisco-San Mateo-Redwood City, CA','San Francisco-Oakland-Fremont, CA')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('Santa Ana-Anaheim-Irvine, CA','Los Angeles-Long Beach-Santa Ana, CA')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('Crestview-Fort Walton Beach-Destin, FL','Fort Walton Beach-Crestview-Destin, FL')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('Fort Lauderdale-Pompano Beach-Deerfield Beach, FL','Miami-Fort Lauderdale-Pompano Beach, FL')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('Miami-Miami Beach-Kendall, FL','Miami-Fort Lauderdale-Pompano Beach, FL')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('North Port-Bradenton-Sarasota, FL','Bradenton-Sarasota-Venice, FL')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('Orlando-Kissimmee-Sanford, FL','Orlando-Kissimmee, FL')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('Louisville/Jefferson County, KY-IN','Louisville-Jefferson County, KY-IN')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('Boston-Quincy, MA','Boston-Cambridge-Quincy, MA-NH')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('Cambridge-Newton-Framingham, MA','Boston-Cambridge-Quincy, MA-NH')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('Wilmington, DE-MD-NJ','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('Detroit-Livonia-Dearborn, MI','Detroit-Warren-Livonia, MI')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('Warren-Troy-Farmington Hills, MI','Detroit-Warren-Livonia, MI')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('New York-White Plains-Wayne, NY-NJ','New York-Northern New Jersey-Long Island, NY-NJ-PA')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('Wilmington, DE-MD-NJ','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('New York-White Plains-Wayne, NY-NJ','New York-Northern New Jersey-Long Island, NY-NJ-PA')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('Philadelphia, PA','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('ScrantonÃÂ¢Ã¥ÃÃ¥ÃWilkes-Barre, PA','Scranton--Wilkes-Barre, PA')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('Charlotte-Gastonia-Rock Hill, NC-SC','Charlotte-Gastonia-Concord, NC-SC')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('Nashville-DavidsonÃÂ¢Ã¥ÃÃ¥ÃMurfreesboroÃÂ¢Ã¥ÃÃ¥Ã Franklin, TN','Nashville-Davidson--Murfreesboro--Franklin, TN')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('Austin-Round Rock-San Marcos, TX','Austin-Round Rock, TX')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('Dallas-Plano-Irving, TX','Dallas-Fort Worth-Arlington, TX')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('Fort Worth-Arlington, TX','Dallas-Fort Worth-Arlington, TX')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('San Antonio-New Braunfels, TX','San Antonio, TX')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('Portland-Vancouver-Hillsboro, OR-WA','Portland-Vancouver-Beaverton, OR-WA')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('Seattle-Bellevue-Everett, WA','Seattle-Tacoma-Bellevue, WA')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('Tacoma, WA','Seattle-Tacoma-Bellevue, WA')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('Aguadilla-Isabela-San Sebastian, Puerto Rico','Aguadilla-Isabela-San Sebastián, Puerto Rico')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('Mayaguez, Puerto Rico','Mayagüez, Puerto Rico')
crime_dic[2009].MSA_name = crime_dic[2009].MSA_name.replace('San German-Cabo Rojo, Puerto Rico','San Germán-Cabo Rojo, Puerto Rico')


#2010
crime_dic[2010].MSA_name = crime_dic[2010].MSA_name.replace('Los Angeles-Long Beach-Glendale, CA','Los Angeles-Long Beach-Santa Ana, CA')
crime_dic[2010].MSA_name = crime_dic[2010].MSA_name.replace('Oakland-Fremont-Hayward, CA','San Francisco-Oakland-Fremont, CA')
crime_dic[2010].MSA_name = crime_dic[2010].MSA_name.replace('Sacramentoâ°ÃÃArden-Arcadeâ°ÃÃRoseville, CA','Sacramento--Arden-Arcade--Roseville, CA')
crime_dic[2010].MSA_name = crime_dic[2010].MSA_name.replace('San Francisco-San Mateo-Redwood City, CA','San Francisco-Oakland-Fremont, CA')
crime_dic[2010].MSA_name = crime_dic[2010].MSA_name.replace('Santa Ana-Anaheim-Irvine, CA','Los Angeles-Long Beach-Santa Ana, CA')
crime_dic[2010].MSA_name = crime_dic[2010].MSA_name.replace('Fort Lauderdale-Pompano Beach-Deerfield Beach, FL','Miami-Fort Lauderdale-Pompano Beach, FL')
crime_dic[2010].MSA_name = crime_dic[2010].MSA_name.replace('Miami-Miami Beach-Kendall, FL','Miami-Fort Lauderdale-Pompano Beach, FL')
crime_dic[2010].MSA_name = crime_dic[2010].MSA_name.replace('Boston-Quincy, MA','Boston-Cambridge-Quincy, MA-NH')
crime_dic[2010].MSA_name = crime_dic[2010].MSA_name.replace('Wilmington, DE-MD-NJ','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2010].MSA_name = crime_dic[2010].MSA_name.replace('Detroit-Livonia-Dearborn, MI','Detroit-Warren-Livonia, MI')
crime_dic[2010].MSA_name = crime_dic[2010].MSA_name.replace('Warren-Troy-Farmington Hills, MI','Detroit-Warren-Livonia, MI')
crime_dic[2010].MSA_name = crime_dic[2010].MSA_name.replace('Camden, NJ','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2010].MSA_name = crime_dic[2010].MSA_name.replace('Wilmington, DE-MD-NJ','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2010].MSA_name = crime_dic[2010].MSA_name.replace('New York-White Plains-Wayne, NY-NJ','New York-Northern New Jersey-Long Island, NY-NJ-PA')
crime_dic[2010].MSA_name = crime_dic[2010].MSA_name.replace('Philadelphia, PA','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2010].MSA_name = crime_dic[2010].MSA_name.replace('Scrantonâ°ÃÃWilkes-Barre, PA','Scranton--Wilkes-Barre, PA')
crime_dic[2010].MSA_name = crime_dic[2010].MSA_name.replace('Nashville-Davidsonâ°ÃÃMurfreesboroâ°ÃÃFranklin, TN','Nashville-Davidson--Murfreesboro--Franklin, TN')
crime_dic[2010].MSA_name = crime_dic[2010].MSA_name.replace('Dallas-Plano-Irving, TX','Dallas-Fort Worth-Arlington, TX')
crime_dic[2010].MSA_name = crime_dic[2010].MSA_name.replace('Fort Worth-Arlington, TX','Dallas-Fort Worth-Arlington, TX')
crime_dic[2010].MSA_name = crime_dic[2010].MSA_name.replace('Tacoma, WA','Seattle-Tacoma-Bellevue, WA')

#2011
crime_dic[2011].MSA_name = crime_dic[2011].MSA_name.replace('Los Angeles-Long Beach-Glendale, CA','Los Angeles-Long Beach-Santa Ana, CA')
crime_dic[2011].MSA_name = crime_dic[2011].MSA_name.replace('Oakland-Fremont-Hayward, CA','San Francisco-Oakland-Fremont, CA')
crime_dic[2011].MSA_name = crime_dic[2011].MSA_name.replace('Sacramento-Arden-Arcade-Roseville, CA','Sacramento--Arden-Arcade--Roseville, CA')
crime_dic[2011].MSA_name = crime_dic[2011].MSA_name.replace('San Francisco-San Mateo-Redwood City, CA','San Francisco-Oakland-Fremont, CA')
crime_dic[2011].MSA_name = crime_dic[2011].MSA_name.replace('Santa Ana-Anaheim-Irvine, CA','Los Angeles-Long Beach-Santa Ana, CA')
crime_dic[2011].MSA_name = crime_dic[2011].MSA_name.replace('Fort Lauderdale-Pompano Beach-Deerfield Beach, FL','Miami-Fort Lauderdale-Pompano Beach, FL')
crime_dic[2011].MSA_name = crime_dic[2011].MSA_name.replace('Miami-Miami Beach-Kendall, FL','Miami-Fort Lauderdale-Pompano Beach, FL')
crime_dic[2011].MSA_name = crime_dic[2011].MSA_name.replace('Chicago-Joilet-Naperville, IL','Chicago-Joliet-Naperville, IL-IN-WI')
crime_dic[2011].MSA_name = crime_dic[2011].MSA_name.replace('Houma-Bayou Caneâ°ÃÃThibodaux, LA','Houma-Bayou Cane-Thibodaux, LA')
crime_dic[2011].MSA_name = crime_dic[2011].MSA_name.replace('Boston-Quincy, MA','Boston-Cambridge-Quincy, MA-NH')
crime_dic[2011].MSA_name = crime_dic[2011].MSA_name.replace('Wilmington, DE-MD-NJ','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2011].MSA_name = crime_dic[2011].MSA_name.replace('Detroit-Livonia-Dearborn, MI','Detroit-Warren-Livonia, MI')
crime_dic[2011].MSA_name = crime_dic[2011].MSA_name.replace('Warren-Troy-Farmington Hills, MI','Detroit-Warren-Livonia, MI')
crime_dic[2011].MSA_name = crime_dic[2011].MSA_name.replace('Camden, NJ','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2011].MSA_name = crime_dic[2011].MSA_name.replace('New York-White Plains-Wayne, NY-NJ','New York-Northern New Jersey-Long Island, NY-NJ-PA')
crime_dic[2011].MSA_name = crime_dic[2011].MSA_name.replace('Philadelphia, PA','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2011].MSA_name = crime_dic[2011].MSA_name.replace('Scranton-Wilkes-Barre, PA','Scranton--Wilkes-Barre, PA')
crime_dic[2011].MSA_name = crime_dic[2011].MSA_name.replace('Nashville-Davidson-Murfreesboro-Franklin, TN','Nashville-Davidson--Murfreesboro--Franklin, TN')
crime_dic[2011].MSA_name = crime_dic[2011].MSA_name.replace('Dallas-Plano-Irving, TX','Dallas-Fort Worth-Arlington, TX')
crime_dic[2011].MSA_name = crime_dic[2011].MSA_name.replace('Fort Worth-Arlington, TX','Dallas-Fort Worth-Arlington, TX')
crime_dic[2011].MSA_name = crime_dic[2011].MSA_name.replace('Seattleâ°ÃÃBellevue-Everett, WA','Seattle-Tacoma-Bellevue, WA')
crime_dic[2011].MSA_name = crime_dic[2011].MSA_name.replace('Aguadilla-Isabela-San Sebastian, Puerto Rico','Aguadilla-Isabela-San Sebastián, Puerto Rico')
crime_dic[2011].MSA_name = crime_dic[2011].MSA_name.replace('Mayaguez, Puerto Rico','Mayagüez, Puerto Rico')
crime_dic[2011].MSA_name = crime_dic[2011].MSA_name.replace('San German-Cabo Rojo, Puerto Rico','San Germán-Cabo Rojo, Puerto Rico')

#2012
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Anniston-Oxford-Jacksonville, AL','Anniston-Oxford, AL')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Texarkana, TX-AR','Texarkana, TX-Texarkana, AR')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Phoenix-Mesa-Scottsdale, AZ','Phoenix-Mesa-Glendale, AZ')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Anaheim-Santa Ana-Irvine, CA','Los Angeles-Long Beach-Santa Ana, CA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Bakersfield, CA','Bakersfield-Delano, CA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Madera, CA','Madera-Chowchilla, CA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Oakland-Hayward-Berkeley, CA','San Francisco-Oakland-Fremont, CA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Los Angeles-Long Beach-Glendale, CA','Los Angeles-Long Beach-Santa Ana, CA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Los Angeles-Long Beach-Glendale, CA','Los Angeles-Long Beach-Santa Ana, CA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Sacramento--Roseville--Arden-Arcade, CA','Sacramento--Arden-Arcade--Roseville, CA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('San Diego-Carlsbad, CA','San Diego-Carlsbad-San Marcos, CA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('San Francisco-Oakland-Hayward, CA','San Francisco-Oakland-Fremont, CA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('San Francisco-Redwood City-South San Francisco, CA','San Francisco-Oakland-Fremont, CA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('San Luis Obispo-Paso Robles-Arroyo Grande, CA','San Luis Obispo-Paso Robles, CA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Santa Maria-Santa Barbara, CA','Santa Barbara-Santa Maria-Goleta, CA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Santa Rosa, CA','Santa Rosa-Petaluma, CA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Stockton-Lodi, CA','Stockton, CA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Denver-Aurora-Lakewood, CO','Denver-Aurora-Broomfield, CO')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Fort Collins-Loveland, CO','Fort Collins, CO')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Worcester, MA-CT','Worcester, MA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Boston-Cambridge-Newton, MA-NH','Boston-Cambridge-Quincy, MA-NH')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Fort Lauderdale-Pompano Beach-Deerfield Beach, FL','Miami-Fort Lauderdale-Pompano Beach, FL')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Miami-Miami Beach-Kendall, FL','Miami-Fort Lauderdale-Pompano Beach, FL')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Naples-Immokalee-Marco Island, FL','Naples-Marco Island, FL')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Panama City, FL','Panama City-Lynn Haven-Panama City Beach, FL')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Atlanta-Sandy Springs-Roswell, GA','Atlanta-Sandy Springs-Marietta, GA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Hinesville, GA','Hinesville-Fort Stewart, GA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Kahului-Wailuku-Lahaina, HI','Kahului-Wailuku, HI')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Boise City, ID','Boise City-Nampa, ID')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Bloomington, IL','Bloomington-Normal, IL')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Cape Girardeau, MO-IL','Cape Girardeau-Jackson, MO-IL')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Chicago-Naperville-Arlington Heights, IL','Chicago-Joliet-Naperville, IL-IN-WI')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Elgin, IL','Chicago-Joliet-Naperville, IL-IN-WI')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Kankakee, IL','Kankakee-Bradley, IL')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Cincinnati, OH-KY-IN','Cincinnati-Middletown, OH-KY-IN')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Indianapolis-Carmel-Anderson, IN','Indianapolis-Carmel-Anderson, IN')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Lafayette-West Lafayette, IN','Lafayette, IN')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Chicago-Naperville-Elgin, IL-IN-WI','Chicago-Joliet-Naperville, IL-IN-WI')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('New York-Jersey City-White Plains, NY-NJ,','New York-Newark-Jersey City, NY-NJ-PA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Cincinnati, OH-KY-IN','Cincinnati-Middletown, OH-KY-IN')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Elizabethtown-Fort Knox, KY','Elizabethtown, KY')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Houma-Thibodaux, LA','Houma-Bayou Cane-Thibodaux, LA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('New Orleans-Metairie, LA','New Orleans-Metairie-Kenner, LA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Worcester, MA-CT','Worcester, MA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Boston, MA','Boston-Cambridge-Quincy, MA-NH')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Cambridge-Newton-Framingham, MA','Boston-Cambridge-Quincy, MA-NH')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Boston-Cambridge-Newton, MA-NH','Boston-Cambridge-Quincy, MA-NH')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Providence-Warwick, RI-MA','Providence-New Bedford-Fall River, RI-MA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Salisbury, MD-DE','Salisbury, MD')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Baltimore-Columbia-Towson, MD','Baltimore-Towson, MD')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('California-Lexington Park, MD','Lexington Park, MD')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Wilmington, DE-MD-NJ','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Portland-South Portland, ME','Portland-South Portland-Biddeford, ME')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Detroit-Dearborn-Livonia, MI','Detroit-Warren-Livonia, MI')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Detroit-Warren-Dearborn, MI','Detroit-Warren-Livonia, MI')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Warren-Troy-Farmington Hills, MI','Detroit-Warren-Livonia, MI')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('La Crosse-Onalaska, WI-MN','La Crosse, WI-MN')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Raleigh, NC','Raleigh-Cary, NC')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Camden, NJ','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Trenton, NJ','Trenton-Ewing, NJ')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Vineland-Bridgeton, NJ','Vineland-Millville-Bridgeton, NJ')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Wilmington, DE-MD-NJ','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('New York-Newark-Jersey City, NY-NJ-PA','New York-Northern New Jersey-Long Island, NY-NJ-PA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Newark, NJ-PA','New York-Northern New Jersey-Long Island, NY-NJ-PA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Las Vegas-Henderson-Paradise, NV','Las Vegas-Paradise, NV')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Reno, NV','Reno-Sparks, NV')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Buffalo-Cheektowaga-Niagara Falls, NY','Buffalo-Niagara Falls, NY')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('New York-Newark-Jersey City, NY-NJ-PA','New York-Northern New Jersey-Long Island, NY-NJ-PA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Albany, OR','Albany-Lebanon, OR')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Eugene, OR','Eugene-Springfield, OR')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Philadelphia, PA, 6','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Chambersburg-Waynesboro, PA','Chambersburg, PA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Scranton--Wilkes-Barre--Hazleton, PA','Scranton--Wilkes-Barre, PA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Charleston-North Charleston, SC','Charleston-North Charleston-Summerville, SC')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Greenville-Anderson-Mauldin, SC','Greenville-Mauldin-Easley, SC')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Hilton Head Island-Bluffton-Beaufort, SC','Hilton Head Island-Beaufort, SC')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Austin-Round Rock, TX','Austin-Round Rock-San Marcos, TX')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Dallas-Plano-Irving, TX','Dallas-Fort Worth-Arlington, TX')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Houston-The Woodlands-Sugar Land, TX','Houston-Sugar Land-Baytown, TX')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Killeen-Temple, TX','Killeen-Temple-Fort Hood, TX')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Kennewick-Richland, WA','Kennewick-Pasco-Richland, WA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Olympia-Tumwater, WA','Olympia, WA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Seattle-Bellevue-Everett, WA','Seattle-Tacoma-Bellevue, WA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Spokane-Spokane Valley, WA','Spokane, WA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Tacoma-Lakewood, WA','Seattle-Tacoma-Bellevue, WA')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Janesville-Beloit, WI','Janesville, WI')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Parkersburg-Vienna, WV','Parkersburg-Marietta-Vienna, WV-OH')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Aguadilla-Isabela, Puerto Rico','Aguadilla-Isabela-San Sebastián, Puerto Rico')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('Mayaguez, Puerto Rico','Mayagüez, Puerto Rico')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('San German, Puerto Rico','San Germán-Cabo Rojo, Puerto Rico')
crime_dic[2012].MSA_name = crime_dic[2012].MSA_name.replace('San Juan-Carolina-Caguas, Puerto Rico','San Juan-Caguas-Guaynabo, Puerto Rico')

#2013
crime_dic[2013].MSA_name = crime_dic[2013].MSA_name.replace('Anaheim-Santa Ana-Irvine, CA','Los Angeles-Long Beach-Anaheim, CA')
crime_dic[2013].MSA_name = crime_dic[2013].MSA_name.replace('Los Angeles-Long Beach-Glendale, CA','Los Angeles-Long Beach-Anaheim, CA')
crime_dic[2013].MSA_name = crime_dic[2013].MSA_name.replace('Oakland-Hayward-Berkeley, CA','San Francisco-Oakland-Hayward, CA')
crime_dic[2013].MSA_name = crime_dic[2013].MSA_name.replace('Sacramentoâ°ÃÃRosevilleâ°ÃÃArden-Arcade, CA','Sacramento--Roseville--Arden-Arcade, CA')
crime_dic[2013].MSA_name = crime_dic[2013].MSA_name.replace('San Francisco-Redwood City-South San Francisco, CA','San Francisco-Oakland-Hayward, CA')
crime_dic[2013].MSA_name = crime_dic[2013].MSA_name.replace('Fort Lauderdale-Pompano Beach-Deerfield Beach, FL','Miami-Fort Lauderdale-West Palm Beach, FL')
crime_dic[2013].MSA_name = crime_dic[2013].MSA_name.replace('Miami-Miami Beach-Kendall, FL','Miami-Fort Lauderdale-West Palm Beach, FL')
crime_dic[2013].MSA_name = crime_dic[2013].MSA_name.replace('Elgin, IL','Chicago-Naperville-Elgin, IL-IN-WI')
crime_dic[2013].MSA_name = crime_dic[2013].MSA_name.replace('Boston, MA','Boston-Cambridge-Newton, MA-NH')
crime_dic[2013].MSA_name = crime_dic[2013].MSA_name.replace('Cambridge-Newton-Framingham, MA','Boston-Cambridge-Newton, MA-NH')
crime_dic[2013].MSA_name = crime_dic[2013].MSA_name.replace('Wilmington, DE-MD-NJ','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2013].MSA_name = crime_dic[2013].MSA_name.replace('Warren-Troy-Farmington Hills, MI','Detroit-Warren-Dearborn, MI')
crime_dic[2013].MSA_name = crime_dic[2013].MSA_name.replace('Detroit-Dearborn-Livonia, MI','Detroit-Warren-Dearborn, MI')
crime_dic[2013].MSA_name = crime_dic[2013].MSA_name.replace('Camden, NJ','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2013].MSA_name = crime_dic[2013].MSA_name.replace('New York-Jersey City-White Plains, NY-NJ','New York-Newark-Jersey City, NY-NJ-PA')
crime_dic[2013].MSA_name = crime_dic[2013].MSA_name.replace('Wilmington, DE-MD-NJ','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2013].MSA_name = crime_dic[2013].MSA_name.replace('Newark, NJ-PA','New York-Newark-Jersey City, NY-NJ-PA')
crime_dic[2013].MSA_name = crime_dic[2013].MSA_name.replace('Philadelphia, PA','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2013].MSA_name = crime_dic[2013].MSA_name.replace('Scrantonâ°ÃÃWilkes-Barreâ°ÃÃHazleton, PA','Scranton--Wilkes-Barre--Hazleton, PA')
crime_dic[2013].MSA_name = crime_dic[2013].MSA_name.replace('Nashville-Davidsonâ°ÃÃMurfreesboroâ°ÃÃFranklin, TN','Nashville-Davidson--Murfreesboro--Franklin, TN')
crime_dic[2013].MSA_name = crime_dic[2013].MSA_name.replace('Dallas-Plano-Irving, TX','Dallas-Fort Worth-Arlington, TX')
crime_dic[2013].MSA_name = crime_dic[2013].MSA_name.replace('Fort Worth-Arlington, TX','Fort Worth-Arlington, TX')
crime_dic[2013].MSA_name = crime_dic[2013].MSA_name.replace('Seattle-Bellevue-Everett, WA','Seattle-Tacoma-Bellevue, WA')
crime_dic[2013].MSA_name = crime_dic[2013].MSA_name.replace('Tacoma-Lakewood, WA','Seattle-Tacoma-Bellevue, WA')
crime_dic[2013].MSA_name = crime_dic[2013].MSA_name.replace('Mayaguez, Puerto Rico','Mayagüez, Puerto Rico')
crime_dic[2013].MSA_name = crime_dic[2013].MSA_name.replace('San German, Puerto Rico','San Germán, Puerto Rico')

#2014
crime_dic[2014].MSA_name = crime_dic[2014].MSA_name.replace('Anaheim-Santa Ana-Irvine, CA','Los Angeles-Long Beach-Anaheim, CA')
crime_dic[2014].MSA_name = crime_dic[2014].MSA_name.replace('Los Angeles-Long Beach-Glendale, CA','Los Angeles-Long Beach-Anaheim, CA')
crime_dic[2014].MSA_name = crime_dic[2014].MSA_name.replace('Oakland-Hayward-Berkeley, CA','San Francisco-Oakland-Hayward, CA')
crime_dic[2014].MSA_name = crime_dic[2014].MSA_name.replace('Sacramentoâ°ÃÃRosevilleâ°ÃÃArden-Arcade, CA','Sacramento--Roseville--Arden-Arcade, CA')
crime_dic[2014].MSA_name = crime_dic[2014].MSA_name.replace('San Francisco-Redwood City-South San Francisco, CA','San Francisco-Oakland-Hayward, CA')
crime_dic[2014].MSA_name = crime_dic[2014].MSA_name.replace('Fort Lauderdale-Pompano Beach-Deerfield Beach, FL','Miami-Fort Lauderdale-West Palm Beach, FL')
crime_dic[2014].MSA_name = crime_dic[2014].MSA_name.replace('Miami-Miami Beach-Kendall, FL','Miami-Fort Lauderdale-West Palm Beach, FL')
crime_dic[2014].MSA_name = crime_dic[2014].MSA_name.replace('Chicago-Naperville-Arlington Heights, IL','Chicago-Naperville-Arlington Heights, IL')
crime_dic[2014].MSA_name = crime_dic[2014].MSA_name.replace('Elgin, IL','Chicago-Naperville-Elgin, IL-IN-WI')
crime_dic[2014].MSA_name = crime_dic[2014].MSA_name.replace('Wilmington, DE-MD-NJ','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2014].MSA_name = crime_dic[2014].MSA_name.replace('Detroit-Dearborn-Livonia, MI','Detroit-Warren-Dearborn, MI')
crime_dic[2014].MSA_name = crime_dic[2014].MSA_name.replace('Warren-Troy-Farmington Hills, MI','Detroit-Warren-Dearborn, MI')
crime_dic[2014].MSA_name = crime_dic[2014].MSA_name.replace('Camden, NJ','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2014].MSA_name = crime_dic[2014].MSA_name.replace('New York-Jersey City-White Plains, NY-NJ','New York-Newark-Jersey City, NY-NJ-PA')
crime_dic[2014].MSA_name = crime_dic[2014].MSA_name.replace('Newark, NJ-PA','New York-Newark-Jersey City, NY-NJ-PA')
crime_dic[2014].MSA_name = crime_dic[2014].MSA_name.replace('New York-Jersey City-White Plains, NY-NJ','New York-Newark-Jersey City, NY-NJ-PA')
crime_dic[2014].MSA_name = crime_dic[2014].MSA_name.replace('Philadelphia, PA','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2014].MSA_name = crime_dic[2014].MSA_name.replace('Scrantonâ°ÃÃWilkes-Barreâ°ÃÃHazleton, PA','Scranton--Wilkes-Barre--Hazleton, PA')
crime_dic[2014].MSA_name = crime_dic[2014].MSA_name.replace('Nashville-Davidsonâ°ÃÃMurfreesboroâ°ÃÃFranklin, TN','Nashville-Davidson--Murfreesboro--Franklin, TN')
crime_dic[2014].MSA_name = crime_dic[2014].MSA_name.replace('Seattle-Bellevue-Everett, WA','Seattle-Tacoma-Bellevue, WA')
crime_dic[2014].MSA_name = crime_dic[2014].MSA_name.replace('Tacoma-Lakewood, WA','Seattle-Tacoma-Bellevue, WA')
crime_dic[2014].MSA_name = crime_dic[2014].MSA_name.replace('Mayaguez, Puerto Rico','Mayagüez, Puerto Rico')
crime_dic[2014].MSA_name = crime_dic[2014].MSA_name.replace('San German, Puerto Rico','San Germán, Puerto Rico')

#2015
crime_dic[2015].MSA_name = crime_dic[2015].MSA_name.replace('Anaheim-Santa Ana-Irvine, CA','Los Angeles-Long Beach-Anaheim, CA')
crime_dic[2015].MSA_name = crime_dic[2015].MSA_name.replace('Los Angeles-Long Beach-Glendale, CA','Los Angeles-Long Beach-Anaheim, CA')
crime_dic[2015].MSA_name = crime_dic[2015].MSA_name.replace('Oakland-Hayward-Berkeley, CA','San Francisco-Oakland-Hayward, CA')
crime_dic[2015].MSA_name = crime_dic[2015].MSA_name.replace('Sacramentoâ°ÃÃRosevilleâ°ÃÃArden-Arcade, CA','Sacramento--Roseville--Arden-Arcade, CA')
crime_dic[2015].MSA_name = crime_dic[2015].MSA_name.replace('San Francisco-Redwood City-South San Francisco, CA','San Francisco-Oakland-Hayward, CA')
crime_dic[2015].MSA_name = crime_dic[2015].MSA_name.replace('Fort Lauderdale-Pompano Beach-Deerfield Beach, FL','Miami-Fort Lauderdale-West Palm Beach, FL')
crime_dic[2015].MSA_name = crime_dic[2015].MSA_name.replace('Miami-Miami Beach-Kendall, FL','Miami-Fort Lauderdale-West Palm Beach, FL')
crime_dic[2015].MSA_name = crime_dic[2015].MSA_name.replace('Macon-Bibb County, GA','Macon, GA')
crime_dic[2015].MSA_name = crime_dic[2015].MSA_name.replace('Chicago-Naperville-Arlington Heights, IL','Chicago-Naperville-Elgin, IL-IN-WI')
crime_dic[2015].MSA_name = crime_dic[2015].MSA_name.replace('Elgin, IL','Chicago-Naperville-Elgin, IL-IN-WI')
crime_dic[2015].MSA_name = crime_dic[2015].MSA_name.replace('Wilmington, DE-MD-NJ','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2015].MSA_name = crime_dic[2015].MSA_name.replace('Detroit-Dearborn-Livonia, MI','Detroit-Warren-Dearborn, MI')
crime_dic[2015].MSA_name = crime_dic[2015].MSA_name.replace('Warren-Troy-Farmington Hills, MI','Detroit-Warren-Dearborn, MI')
crime_dic[2015].MSA_name = crime_dic[2015].MSA_name.replace('Camden, NJ','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2015].MSA_name = crime_dic[2015].MSA_name.replace('Wilmington, DE-MD-NJ','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2015].MSA_name = crime_dic[2015].MSA_name.replace('Philadelphia, PA','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2015].MSA_name = crime_dic[2015].MSA_name.replace('Nashville-Davidsonâ°ÃÃMurfreesboroâ°ÃÃFranklin, TN','Nashville-Davidson--Murfreesboro--Franklin, TN')
crime_dic[2015].MSA_name = crime_dic[2015].MSA_name.replace('Dallas-Plano-Irving, TX','Dallas-Fort Worth-Arlington, TX')
crime_dic[2015].MSA_name = crime_dic[2015].MSA_name.replace('Fort Worth-Arlington, TX','Dallas-Fort Worth-Arlington, TX')
crime_dic[2015].MSA_name = crime_dic[2015].MSA_name.replace('Seattle-Bellevue-Everett, WA','Seattle-Tacoma-Bellevue, WA')
crime_dic[2015].MSA_name = crime_dic[2015].MSA_name.replace('Tacoma-Lakewood, WA','Seattle-Tacoma-Bellevue, WA')
crime_dic[2015].MSA_name = crime_dic[2015].MSA_name.replace('Mayaguez, Puerto Rico','Mayagüez, Puerto Rico')
crime_dic[2015].MSA_name = crime_dic[2015].MSA_name.replace('San German, Puerto Rico','San Germán, Puerto Rico')


#2106
crime_dic[2016].MSA_name = crime_dic[2016].MSA_name.replace('Anaheim-Santa Ana-Irvine, CA','Los Angeles-Long Beach-Anaheim, CA')
crime_dic[2016].MSA_name = crime_dic[2016].MSA_name.replace('Los Angeles-Long Beach-Anaheim, CA','Los Angeles-Long Beach-Anaheim, CA')
crime_dic[2016].MSA_name = crime_dic[2016].MSA_name.replace('Oakland-Hayward-Berkeley, CA','San Francisco-Oakland-Hayward, CA')
crime_dic[2016].MSA_name = crime_dic[2016].MSA_name.replace('Sacramento - Roseville - Arden - Arcade, CA','Sacramento--Roseville--Arden-Arcade, CA')
crime_dic[2016].MSA_name = crime_dic[2016].MSA_name.replace('San Francisco-Redwood City-South San Francisco, CA','San Francisco-Oakland-Hayward, CA')
crime_dic[2016].MSA_name = crime_dic[2016].MSA_name.replace('Fort Lauderdale-Pompano Beach-Deerfield Beach, FL','Fort Lauderdale-Pompano Beach-Deerfield Beach, FL')
crime_dic[2016].MSA_name = crime_dic[2016].MSA_name.replace('Miami-Miami Beach-Kendall, FL','Miami-Fort Lauderdale-West Palm Beach, FL')
crime_dic[2016].MSA_name = crime_dic[2016].MSA_name.replace('Chicago-Naperville-Arlington Heights, IL','Chicago-Naperville-Elgin, IL-IN-WI')
crime_dic[2016].MSA_name = crime_dic[2016].MSA_name.replace('Elgin, IL','Chicago-Naperville-Elgin, IL-IN-WI')
crime_dic[2016].MSA_name = crime_dic[2016].MSA_name.replace('Boston, MA','Boston-Cambridge-Newton, MA-NH')
crime_dic[2016].MSA_name = crime_dic[2016].MSA_name.replace('Cambridge-Newton-Framingham, MA','Boston-Cambridge-Newton, MA-NH')
crime_dic[2016].MSA_name = crime_dic[2016].MSA_name.replace('Detroit-Dearborn-Livonia, MI','Detroit-Warren-Dearborn, MI')
crime_dic[2016].MSA_name = crime_dic[2016].MSA_name.replace('Warren-Troy-Farmington Hills, MI','Detroit-Warren-Dearborn, MI')
crime_dic[2016].MSA_name = crime_dic[2016].MSA_name.replace('Camden, NJ','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2016].MSA_name = crime_dic[2016].MSA_name.replace('New York-Jersey City-White Plains, NY-NJ','New York-Newark-Jersey City, NY-NJ-PA')
crime_dic[2016].MSA_name = crime_dic[2016].MSA_name.replace('Wilmington, DE-MD-NJ','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2016].MSA_name = crime_dic[2016].MSA_name.replace('Newark, NJ-PA','New York-Newark-Jersey City, NY-NJ-PA')
crime_dic[2016].MSA_name = crime_dic[2016].MSA_name.replace('Philadelphia, PA','Philadelphia-Camden-Wilmington, PA-NJ-DE-MD')
crime_dic[2016].MSA_name = crime_dic[2016].MSA_name.replace('Nashville - Davidson - Murfreesboro - Franklin, TN','Nashville-Davidson--Murfreesboro--Franklin, TN')
crime_dic[2016].MSA_name = crime_dic[2016].MSA_name.replace('Dallas-Plano-Irving, TX','Dallas-Fort Worth-Arlington, TX')
crime_dic[2016].MSA_name = crime_dic[2016].MSA_name.replace('Fort Worth-Arlington, TX','Dallas-Fort Worth-Arlington, TX')
crime_dic[2016].MSA_name = crime_dic[2016].MSA_name.replace('Seattle-Bellevue-Everett, WA','Seattle-Tacoma-Bellevue, WA')
crime_dic[2016].MSA_name = crime_dic[2016].MSA_name.replace('Tacoma-Lakewood, WA','Seattle-Tacoma-Bellevue, WA')
crime_dic[2016].MSA_name = crime_dic[2016].MSA_name.replace('Mayaguez, Puerto Rico','Mayagüez, Puerto Rico')
crime_dic[2016].MSA_name = crime_dic[2016].MSA_name.replace('San German, Puerto Rico','San Germán, Puerto Rico')


#need to find a way to replace this one with regular expressions
crime_dic[2007].MSA_name = crime_dic[2007].MSA_name.replace('SacramentoÃ¢ÂÂArden-ArcadeÃ¢ÂÂRoseville, CA','Sacramento--Arden-Arcade--Roseville, CA')



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[name] = value


### Cases Where no matching census MSA could be found for an FBI MSA

##### In FBI data the following no equivalent in Census data for 2006: 
Bethesda-Gaithersburg-Frederick,MD; CamdenNJ; Edison, NJ; Carson City, NV; Warren-Troy-Farmington Hills, MI; West Palm Beach-Boca Raton-Boynton Beach, FL; Lewiston, ID-WA; Nassau-Suffolk, NY; Newark-Union, NJ-PA; Santa Ana-Anaheim-Irvine, CA

##### In FBI data the following no equivalent in Census data for 2007: 
Santa Ana-Anaheim-Irvine, CA; West Palm Beach-Boca Raton-Boynton Beach, FL; Cambridge-Newton-Framingham, MA; Peabody, MA; Bethesda-Frederick-Gaithersburg, MD; New York-White Plains-Wayne, NY-NJ; Newark-Union, NJ-PA; Carson City, NV; Nassau-Suffolk, NY; Lewiston, ID-WA; Tacoma, WA

##### In FBI data the following no equivalent in Census data for 2008: 
West Palm Beach-Boca Raton-Boynton Beach, FL; Peabody, MA; Cambridge-Newton-Framingham, MA; Bethesda-Frederick-Rockville, MD; Rockingham County-Strafford County, NH;  Edison-New Brunswick, NJ; Newark-Union, NJ-PA; Carson City, NV; Nassau-Suffolk, NY

##### In FBI data the following no equivalent in Census data for 2009: 
West Palm Beach-Boca Raton-Boynton Beach, FL; Bethesda-Rockville-Frederick, MD; Rockingham County-Strafford County, NH; Nassau-Suffolk, NY;Newark-Union, NJ-PA

##### In FBI data the following no equivalent in Census data for 2010: 
West Palm Beach-Boca Raton-Boynton Beach, FL; Cambridge-Newton-Framingham, MA; Peabody, MA; Bethesda-Rockville-Frederick, MD; Rockingham County-Strafford County, NH; Edison-New Brunswick, NJ; Newark-Union, NJ-PA

##### In FBI data the following no equivalent in Census data for 2011: 
West Palm Beach-Boca Raton-Boynton Beach, FL; Lake County-Kenosha County, IL-WI; Gary, IN; Cambridge-Newton-Framingham, MA; Peabody, MA; Bethesda-Rockville-Frederick, MD; Rockingham County-Strafford County, NH; Edison-New Brunswick, NJ; Newark-Union, NJ-PA; Newark-Union, NJ-PA

##### In FBI data the following no equivalent in Census data for 2012: 
West Palm Beach-Boca Raton-Delray Beach, FL; Lake County-Kenosha County, IL-WI; Gary, IN; Silver Spring-Frederick-Rockville, MD; Rockingham County-Strafford County, NH; Dutchess County-Putnam County, NY; Nassau County-Suffolk County, NY; Montgomery County-Bucks County-Chester County, PA; Seattle-Tacoma-Bellevue, WA; Lake County-Kenosha County, IL-WI; Arecibo, Puerto Rico; 

##### In FBI data the following no equivalent in Census data for 2013: 
San Rafael, CA; West Palm Beach-Boca Raton-Delray Beach, FL; Gary, IN; Cambridge-Newton-Framingham, MA; Silver Spring-Frederick-Rockville, MD; Rockingham County-Strafford County, NH; Dutchess County-Putnam County, NY;;  Nassau County-Suffolk County, NY; Montgomery County-Bucks County-Chester County, PA; 

##### In FBI data the following no equivalent in Census data for 2014: 
San Rafael, CA; West Palm Beach-Boca Raton-Delray Beach, FL; Lake County-Kenosha County, IL-WI; Silver Spring-Frederick-Rockville, MD; Dutchess County-Putnam County, NY; Nassau County-Suffolk County, NY; Lake County-Kenosha County, IL-WI; 

##### In FBI data the following no equivalent in Census data for 2015: 
San Rafael, CA; West Palm Beach-Boca Raton-Delray Beach, FL; Lake County-Kenosha County, IL-WI; Silver Spring-Frederick-Rockville, MD; Enid, OK; Montgomery County-Bucks County-Chester County, PA

##### In FBI data the following no equivalent in Census data for 2016: 
San Rafael, CA; West Palm Beach-Boca Raton-Delray Beach, FL; Lake County-Kenosha County, IL-WI; Silver Spring-Frederick-Rockville, MD; Rockingham County-Strafford County, NH; Nassau County-Suffolk County, NY; Montgomery County-Bucks County-Chester County, PA; Lake County-Kenosha County, IL-WI

In [8]:
#merge the census data and crime data for each year
dic_merge={}
for y in range(2006, 2017):
    dic_merge[y] = crime_dic[y].join(code_dic[y], on='MSA_name', how='inner')

In [9]:
#check the length in each year
for y in range(2006, 2017):
    print(y, ":", dic_merge[y].shape)

2006 : (335, 9)
2007 : (337, 9)
2008 : (336, 9)
2009 : (359, 9)
2010 : (356, 9)
2011 : (355, 9)
2012 : (364, 9)
2013 : (363, 9)
2014 : (358, 9)
2015 : (362, 9)
2016 : (368, 9)


In [10]:
#concatenate the years together to have one dataframe. 
df_full_crime_codes = pd.concat([dic_merge[2006], dic_merge[2007], dic_merge[2008], dic_merge[2009], dic_merge[2010],
                                dic_merge[2011], dic_merge[2012], dic_merge[2013], dic_merge[2014], dic_merge[2015],
                                dic_merge[2016]])


In [11]:
#view the results
list(df_full_crime_codes)

#get rid of the extra year column
df_full_crime_codes = df_full_crime_codes.drop('Year', 1) 
df_full_crime_codes.shape

(3893, 8)

### Add states

In [15]:
#read in data states
df_cbsa_state=pd.read_csv('raw data/cbsa_states.csv')

#rename variable, drop duplicates in terms of cbsa code, re index
df_cbsa_state = df_cbsa_state.rename(columns={'CBSA Code': 'MSA'})
df_cbsa_state=df_cbsa_state.drop_duplicates(subset='MSA', keep='first')
df_cbsa_state=df_cbsa_state.set_index('MSA')

df_cbsa_state.shape

(961, 11)

In [16]:
#merge state names in
df_crime_state = df_full_crime_codes.join(df_cbsa_state, on='MSA', how='left')
df_crime_state.shape

(3893, 19)

### Add Zipcodes

In [17]:
#read in table that matches zipcodes and MSAs (called CBSAs in this dataset)
df_zip=pd.read_csv('raw data/Zipcodes_to_MSA.csv')

#rename variable, drop duplicates in terms of cbsa code, re index
df_zip = df_zip.rename(columns={'CBSA': 'MSA'})
df_zip=df_zip.drop_duplicates(subset='MSA', keep='first')
df_zip=df_zip.set_index('MSA')

In [18]:
list(df_zip)

['ZCTA5',
 'MEMI',
 'POPPT',
 'HUPT',
 'AREAPT',
 'AREALANDPT',
 'ZPOP',
 'ZHU',
 'ZAREA',
 'ZAREALAND',
 'MPOP',
 'MHU',
 'MAREA',
 'MAREALAND',
 'ZPOPPCT',
 'ZHUPCT',
 'ZAREAPCT',
 'ZAREALANDPCT',
 'MPOPPCT',
 'MHUPCT',
 'MAREAPCT',
 'MAREALANDPCT']

In [19]:
#merge 
df_msa_state_zip_crime = df_crime_state.join(df_zip, how='left', on='MSA' )
df_msa_state_zip_crime.shape

(3893, 41)

In [20]:
for var in list(df_zip):
    del df_msa_state_zip_crime[var]

In [21]:
print('Crime Shape:', df_full_crime_codes.shape)
print('Zipcode Shape:',  df_zip.shape)
print('Crime and Zipcode Shape:' , df_msa_state_zip_crime.shape)


Crime Shape: (3893, 8)
Zipcode Shape: (955, 22)
Crime and Zipcode Shape: (3893, 19)


### Merge in gun laws data


In [22]:
#import gun dataset
df_gun_laws=pd.read_csv('raw data/state-firearms/raw_data.csv')

#keep only relevant years
df_gun_laws = df_gun_laws[df_gun_laws.year > 2005]

#rename column in crime data
df_msa_state_zip_crime = df_msa_state_zip_crime.rename(columns={'State Name': 'state'})
df_msa_state_zip_crime = df_msa_state_zip_crime.rename(columns={'Year': 'year'})

In [23]:
list(df_msa_state_zip_crime)

['Unnamed: 0',
 'MSA_name',
 'Total',
 'Estimated',
 'Rate',
 'MSA_original_name',
 'MSA',
 'year',
 'Metropolitan Division Code',
 'CSA Code',
 'CBSA Title',
 'Metropolitan/Micropolitan Statistical Area',
 'Metropolitan Division Title',
 'CSA Title',
 'County/County Equivalent',
 'state',
 'FIPS State Code',
 'FIPS County Code',
 'Central/Outlying County']

In [24]:
#merge in
df_crime_gunlaw_msacodes = df_msa_state_zip_crime.merge(df_gun_laws, on=['state', 'year'], how='left')

In [25]:
#check merge
print(df_crime_gunlaw_msacodes.shape)
print(df_msa_state_zip_crime.shape)

(3893, 153)
(3893, 19)


## Export to csv

In [26]:
#export a CSV
df_crime_gunlaw_msacodes.to_csv('cleaned data/df_crime_gunlaw_msacodes.csv')