# Unzip Dataset Archive

Run once when you start using this notebook. Make sure project_data.zip is in the local runtime storage.

In [None]:
!unzip project_data.zip

Archive:  project_data.zip
  inflating: cdc-fertility-2019.csv  
  inflating: __MACOSX/._cdc-fertility-2019.csv  
  inflating: cdc-firearm-deaths-2019.csv  
  inflating: __MACOSX/._cdc-firearm-deaths-2019.csv  
  inflating: cdc-homicides-2019.csv  
  inflating: __MACOSX/._cdc-homicides-2019.csv  
  inflating: cdc-infant-mortality-2019.csv  
  inflating: __MACOSX/._cdc-infant-mortality-2019.csv  
  inflating: cdc-life-expectancy-2019.csv  
  inflating: __MACOSX/._cdc-life-expectancy-2019.csv  
  inflating: cdc-marriage-2019.csv   
  inflating: __MACOSX/._cdc-marriage-2019.csv  
  inflating: cdc-teen-births-2019.csv  
  inflating: __MACOSX/._cdc-teen-births-2019.csv  
  inflating: fbi-homicides-2019.csv  
  inflating: __MACOSX/._fbi-homicides-2019.csv  
  inflating: mass_shootings.csv      
  inflating: __MACOSX/._mass_shootings.csv  
  inflating: partisan-gun-data.csv   
  inflating: __MACOSX/._partisan-gun-data.csv  
  inflating: state_gdp_raw_in_millions.csv  
  inflating: __MACOSX/._

# Import

Last one is an open-source package us_state_abbrev.py so double check that it's present in runtime directory.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import us_state_abbrev as usa

# Data Cleaning - Shootings

Our dependent variable to predict here is gun violence numbers. We'll consider two types.

The first dataset to consider is from the FBI for 2019 on random homicides with weapon type. It has incomplete data for Alabama, Florida, and Illinois. Either drop or account for as outliers.

Initially, it's total numbers, but the last bit of code converts to per capita (per 100,000 population).

In [None]:
homicides_fbi = pd.read_csv("/content/fbi-homicides-2019.csv")
# Drop suffix rows with notes
homicides_fbi = homicides_fbi.drop(homicides_fbi.index[51:])
# Clean up column names
homicides_fbi = homicides_fbi.rename(columns={'Total\nmurders1': 'Murders',
                                    'Total\nfirearms': 'Firearm',
                                    'Firearms\n(type\nunknown)': 'Unknown',
                                    'Knives or\ncutting\ninstruments':'Cutting',
                                    'Other\nweapons': 'Other',
                                    'Hands, fists,\nfeet, etc.2':'Fists, Etc.'})
# Simplify dataframe
homicides_fbi['Non-Firearm'] = homicides_fbi['Cutting'] + \
                               homicides_fbi['Other'] + \
                               homicides_fbi['Fists, Etc.']
# Reorganize dataframe
homicides_fbi = homicides_fbi[['State', 'Murders', 'Non-Firearm', 'Firearm',
                      'Handguns', 'Rifles', 'Unknown']]
# Clean up state names
homicides_fbi['State'][0] = "Alabama"
homicides_fbi['State'][9] = "Florida"
homicides_fbi['State'][13] = "Illinois"
# Make sure all non-name data is numeric
homicides_fbi['Murders'][4] = 1679
homicides_fbi['Murders'][43] = 1379
homicides_fbi['Murders'] = pd.to_numeric(homicides_fbi['Murders'])
homicides_fbi['Firearm'][4] = 1142
homicides_fbi['Firearm'][43] = 1064
homicides_fbi['Firearm'] = pd.to_numeric(homicides_fbi['Firearm'])
# Drop DC data because not a state
homicides_fbi = homicides_fbi.drop(8)
# calculate per capita, step 1
homicides_fbi = homicides_fbi.reset_index(drop=True)
homicides_fbi['Murders'] = \
                  homicides_fbi['Murders'].divide(population['Population'])
homicides_fbi['Non-Firearm'] = \
                  homicides_fbi['Non-Firearm'].divide(population['Population'])
homicides_fbi['Firearm'] = \
                  homicides_fbi['Firearm'].divide(population['Population'])
homicides_fbi['Handguns'] = \
                  homicides_fbi['Handguns'].divide(population['Population'])
homicides_fbi['Rifles'] = \
                  homicides_fbi['Rifles'].divide(population['Population'])
homicides_fbi['Unknown'] = \
                  homicides_fbi['Unknown'].divide(population['Population'])
# calculate per capita, step 2
per_capita = 100_000
homicides_fbi['Murders'] = \
                  homicides_fbi['Murders'].apply(lambda x: x*per_capita)
homicides_fbi['Non-Firearm'] = \
                  homicides_fbi['Non-Firearm'].apply(lambda x: x*per_capita)
homicides_fbi['Firearm'] = \
                  homicides_fbi['Firearm'].apply(lambda x: x*per_capita)
homicides_fbi['Handguns'] = \
                  homicides_fbi['Handguns'].apply(lambda x: x*per_capita)
homicides_fbi['Rifles'] = \
                  homicides_fbi['Rifles'].apply(lambda x: x*per_capita)
homicides_fbi['Unknown'] = \
                  homicides_fbi['Unknown'].apply(lambda x: x*per_capita)

#homicides_fbi

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  homicides_fbi['State'][9] = "Florida"
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  homicides_fbi['State'][13] = "Illinois"
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  homicides_fbi['Murders'][43] = 1379
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  homicides_fbi['Firearm'][4] = 1142
A value is trying t

Now for mass shootings. These are 1982-2023. We should probably update these with more current mass shootings, since unfortunately they're becoming that common. Mainly, we're interested in weapon type (simplified) and state. There are also columns for age and mental health flags, which we'll go and include to give us more options. 

For the weapon type, we visually examined the data set to get an idea of what weapons were used and not used, since the labeling was very non-standardized, and that informed our choices below on how to simplify and standardize that column. In particular, we noticed only semiautomatic rifles (typically equivalent to the colloquial term "assault weapon") above 5.56mm or heavier, and not any manual types, such as bolt-action, with two exceptions, which were .22 caliber rifles. We gave these two their own separate markings. A greater portion of handguns were revolvers, marked "Revolver." Semiautomatic handguns were marked "Handgun." Shotguns were marked "Shotguns." Anything else would be marked "Other."

In [None]:
# dataset on mass shootings over time
massacres = pd.read_csv("/content/mass_shootings.csv")
massacres = massacres[['location',
                       'age_of_shooter',
                       'prior_signs_mental_health_issues',
                       'weapon_type']]
# isolate states only
for i in massacres.index:
  state = massacres['location'][i].split()[-1]
  if state == "TN":
    massacres['location'][i] = "Tennessee"
  else:
    massacres['location'][i] = state
# standardize column for mental health issues
# by assuming unresolved question means No
massacres = massacres.rename(columns= \
                          {'prior_signs_mental_health_issues':'Mental Health'})
for i in massacres.index:
  mh = massacres['Mental Health'][i]
  if "-" in mh or "Unclear" in mh or "TBD" in mh:
    massacres['Mental Health'][i] = "No"
  elif "yes" in mh:
    massacres['Mental Health'][i] = "Yes"
# clean up weapon type column
for i in massacres.index:
  entry = massacres['weapon_type'][i]
  if i == 58 or i == 130:
    massacres['weapon_type'][i] = ".22"
  elif ("rifle" in entry or "Rifle" in entry
    or "assault" in entry or "Assault" in entry):
    massacres['weapon_type'][i] = "Rifle"
  elif "shotgun" in entry or "Shotgun" in entry:
    massacres['weapon_type'][i] = "Shotgun"
  elif "handgun" in entry or "Handgun" in entry:
    massacres['weapon_type'][i] = "Handgun"
  elif "revolver" in entry or "Revolver" in entry:
    massacres['weapon_type'][i] = "Revolver"
  else:
    massacres['weapon_type'][i] = "Other"
# drop DC since it's not a state
massacres = massacres.drop(74)

#massacres

# Data Cleaning - Partisan Policy Data

Next is the partisan dataset from Gabby Giffords, former representative from Arizona, who lobbies on gun policy. While this is a partisan dataset, the reasoning is that an opposite partisan would not contest the distribution of grades themselves so much as they would invert the scale, so that "good" becomes "bad" and vice versa. We don't care about which part of the scale means which, so as long as this assumption holds about agreeing on the distribution then we think this is a good baseline for what states have what gun control policies.

We're also grabbing population data from this since it has state-wise 2019 population data, which is very handy for doing per capita calculations on later 2019 data.

In [None]:
partisan_data = pd.read_csv("/content/partisan-gun-data.csv")
# grab population data for doing per capita calculations
population = partisan_data[['state','pop2019']]
population = population.rename(columns={"state": "State", 
                                        "pop2019":"Population"})
population = population.sort_values("State")
population = population.reset_index()
# get policy data itself from partisan data
policy = partisan_data[['state', 'grade2019']]
policy = policy.rename(columns={"state": "State", 
                                "grade2019": "Severity Gun Control"})

#population
#policy

# Cleaning Data - Economics

Most of these datasets are over multiple years, when in fact we just want 2019 to compare to the FBI dataset. So read in CSV with Pandas and transform the dataframe appropriately. Also, as per above, we aren't considering Alabama, Florida, or Illinois in this analysis. We'll still talk about them elsewhere.

These are total levels. We'll take the Q4 data or the last month's data, whichever's appropriate, to grab data from the end of the year as representative of where things stood because of things happening in the state that year.

In [None]:
# dataset on GDP per state in millions of dollars
gdp = pd.read_csv("/content/state_gdp_raw_in_millions.csv")
gdp = gdp[['Geography', '2019 Q4']]
gdp = gdp.rename(columns={'Geography': 'State', '2019 Q4': '2019'})
# drop US aggregate, and DC since it's not a state
gdp = gdp.drop([0,9])

#gdp

In [None]:
# dataset on unemployment rate per state in percent
unemployment = pd.read_csv("/content/unemployment_rate_raw.csv")
unemployment = unemployment[['Geography', '2020-01-01']]
unemployment = unemployment.rename(columns={'Geography': 'State', 
                                            '2020-01-01': '2019'})
# drop US aggregate and DC since it's not a state
unemployment = unemployment.drop([0,9])

#unemployment

In [None]:
# dataset on average weekly earnings per state in dollars
earnings = pd.read_csv("/content/weekly_earnings_raw.csv")
earnings = earnings[['Geography', '2020-01-01']]
earnings = earnings.rename(columns={'Geography': 'State', '2020-01-01': '2019'})
# drop US aggregate and DC since it's not a state
earnings = earnings.drop([0,9])

#earnings

# Cleaning Data - Health

These are from the [CDC's state tracker](https://www.cdc.gov/nchs/pressroom/stats_of_the_states.htm). Same order as the states as the datasets above, which is alphabetical.

This is from 2019, except for covid-19 data which is from Q3 2022. For analysis, we may not want to plot all of these because that would take up a lot of room. We could probably do correlation matrices or something -- and/or then maybe plot the correlation values?

One of the main ones we probably do want to plot is a joint study of drug overdoses and firearm-injury deaths as a possible parallel to mental health. But that could be frought with assumptions about mental health, which we should note explicitly. Probably also dig into literature a little (last minute) to consider whether it's appropriate.

Heat maps could also be cool.

Some of these have DC in them which we drop in each case for consistency with the above datasets, particular the base partisan policy rankings dataset.

In [None]:
# CDC dataset on covid-19 death rates as of Q3 2022
covid = pd.read_csv("/content/cdc-covid19-2022.csv")
covid = covid.loc[covid['Quarters'] == "Q3 2022"]
covid = covid[['STATE', 'RATE']]
covid = covid.rename(columns={"STATE": "State", "RATE": "Death Rate"})

#covid

In [None]:
# CDC dataset on divorce rates as of 2019
divorce = pd.read_csv("/content/cdc-divorce-2019.csv")
divorce = divorce.loc[divorce['YEAR'] == 2019]
divorce = divorce[['STATE', 'RATE']]
divorce = divorce.rename(columns={"STATE": "State", "RATE": "Divorce Rate"})
# drop DC since it's not a state
divorce = divorce.drop(110)
divorce = divorce.reset_index(drop=True)

#divorce

In [None]:
# CDC dataset on drug overdose deaths as of 2019
drugs = pd.read_csv("/content/cdc-drug-od-2019.csv")
drugs = drugs.loc[drugs['YEAR'] == 2019]
drugs = drugs[['STATE', 'RATE']]
drugs = drugs.rename(columns={"STATE": "State", "RATE": "Death Rate"})
drugs = drugs.reset_index(drop=True)

#drugs

In [None]:
# CDC dataset on total birth rate as of 2019
fertility = pd.read_csv("/content/cdc-fertility-2019.csv")
fertility = fertility.loc[fertility['YEAR'] == 2019]
fertility = fertility[['STATE', 'FERTILITY RATE']]
fertility = fertility.rename(columns={"STATE": "State", 
                                      "FERTILITY RATE": "Birth Rate"})
fertility = fertility.reset_index(drop=True)

#fertility

In [None]:
# CDC dataset on firearm-related deaths as of 2019
firearms = pd.read_csv("/content/cdc-firearm-deaths-2019.csv")
firearms = firearms.loc[firearms['YEAR'] == 2019]
firearms = firearms[['STATE', 'RATE']]
firearms = firearms.rename(columns={"STATE": "State", "RATE": "Death Rate"})
firearms = firearms.reset_index(drop=True)

#firearms

For 2019 homicide data, it would be good to do an analysis that compares this to FBI which is a different source.

In [None]:
# CDC dataset on homicides as of 2019
homicides_cdc = pd.read_csv("/content/cdc-homicides-2019.csv")
homicides_cdc = homicides_cdc.loc[homicides_cdc['YEAR'] == 2019]
homicides_cdc = homicides_cdc[['STATE', 'RATE']]
homicides_cdc = homicides_cdc.rename(columns={"STATE": "State", 
                                              "RATE": "Death Rate"})
homicides_cdc = homicides_cdc.reset_index(drop=True)

#homicides_cdc

In [None]:
# CDC dataset on infanty mortality as of 2019
infant = pd.read_csv("/content/cdc-infant-mortality-2019.csv")
infant = infant.loc[infant['YEAR'] == 2019]
infant = infant[['STATE', 'RATE']]
infant = infant.rename(columns={"STATE": "State", "RATE": "Death Rate"})
infant = infant.reset_index(drop=True)

#infant

In [None]:
# CDC dataset on life expectancy as of 2019
lifespan = pd.read_csv("/content/cdc-life-expectancy-2019.csv")
lifespan = lifespan.loc[lifespan['YEAR'] == 2019]
lifespan = lifespan[['STATE', 'RATE']]
lifespan = lifespan.rename(columns={"STATE": "State", "RATE": "Years Lived"})
lifespan = lifespan.reset_index(drop=True)

#lifespan

In [None]:
# CDC dataset on marriage rates as of 2019
marriage = pd.read_csv("/content/cdc-marriage-2019.csv")
marriage = marriage.loc[marriage['YEAR'] == 2019]
marriage = marriage[['STATE', 'RATE']]
marriage = marriage.rename(columns={"STATE": "State", "RATE": "Marriage Rate"})
# drop DC since it's not a state
marriage = marriage.drop(110)
marriage = marriage.reset_index(drop=True)

#marriage

In [None]:
# CDC dataset on teen birth rate as of 2019
teen_births = pd.read_csv("/content/cdc-teen-births-2019.csv")
teen_births = teen_births.loc[teen_births['YEAR'] == 2019]
teen_births = teen_births[['STATE', 'RATE']]
teen_births = teen_births.rename(columns={"STATE" : "State",
                                          "RATE" : "Birth Rate"})
teen_births = teen_births.reset_index(drop=True)

#teen_births

Now we'll want to standardize state names between the CDC files, which use abbreviations, and the other dataframes, which use full names. Our choice is to use full names for clarity. To speed this up, there's a handy open-source Python dictionary created just for this kind of purpose. So we'll grab that from the import and use it to modify each of the CDC tables.

*NOTE: Do this once, because the dataframes are changed after that. To reset, run the above cells again first.*

In [None]:
state_dict = usa.abbrev_to_us_state
# covid-19 death rates
for index, state in covid.iterrows():
  fullname = state_dict[state['State']]
  covid['State'][index] = fullname
# divorce rates
for index, state in divorce.iterrows():
  fullname = state_dict[state['State']]
  divorce['State'][index] = fullname
# drug OD death rates
for index, state in drugs.iterrows():
  fullname = state_dict[state['State']]
  drugs['State'][index] = fullname
# fertility rates
for index, state in fertility.iterrows():
  fullname = state_dict[state['State']]
  fertility['State'][index] = fullname
# firearm injury death rates
for index, state in firearms.iterrows():
  fullname = state_dict[state['State']]
  firearms['State'][index] = fullname
# homicide rates
for index, state in homicides_cdc.iterrows():
  fullname = state_dict[state['State']]
  homicides_cdc['State'][index] = fullname
# infant mortality rates
for index, state in infant.iterrows():
  fullname = state_dict[state['State']]
  infant['State'][index] = fullname
# life expectancies
for index, state in lifespan.iterrows():
  fullname = state_dict[state['State']]
  lifespan['State'][index] = fullname
# marriage rates
for index, state in marriage.iterrows():
  fullname = state_dict[state['State']]
  marriage['State'][index] = fullname
# teen birth rates
for index, state in teen_births.iterrows():
  fullname = state_dict[state['State']]
  teen_births['State'][index] = fullname

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  covid['State'][index] = fullname
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  divorce['State'][index] = fullname
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  drugs['State'][index] = fullname
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  fertility['State'][index] = fullname
A value is trying to be set on

Finally, let's match homicides from the CDC and homicides from the FBI and statistically analyze how they compare to fill in the three gaps (Alabama, Florida, and Illinois) in the FBI's dataset with estimates. Specifically, I'm going to average the differences between the two sets for the other 47 states, then add/subtract that from the CDC's value for the three missing states.

For the other stats, use the column median. Not great, but all we have, since the CDC doesn't really track that.

In [None]:
diff = homicides_cdc['Death Rate'] - homicides_fbi['Murders']
# correct Alabama
homicides_fbi['Murders'][0] = homicides_cdc['Death Rate'][0] - diff.mean()
homicides_fbi['Non-Firearm'][0] = homicides_fbi['Non-Firearm'].median()
homicides_fbi['Firearm'][0] = homicides_fbi['Firearm'].median()
homicides_fbi['Handguns'][0] = homicides_fbi['Handguns'].median()
homicides_fbi['Rifles'][0] = homicides_fbi['Rifles'].median()
homicides_fbi['Unknown'][0] = homicides_fbi['Unknown'].median()
# correct Florida
homicides_fbi['Murders'][8] = homicides_cdc['Death Rate'][8] - diff.mean()
homicides_fbi['Non-Firearm'][8] = homicides_fbi['Non-Firearm'].median()
homicides_fbi['Firearm'][8] = homicides_fbi['Firearm'].median()
homicides_fbi['Handguns'][8] = homicides_fbi['Handguns'].median()
homicides_fbi['Rifles'][8] = homicides_fbi['Rifles'].median()
homicides_fbi['Unknown'][8] = homicides_fbi['Unknown'].median()
# correct Illinois
homicides_fbi['Murders'][12] = homicides_cdc['Death Rate'][12] - diff.mean()
homicides_fbi['Non-Firearm'][12] = homicides_fbi['Non-Firearm'].median()
homicides_fbi['Firearm'][12] = homicides_fbi['Firearm'].median()
homicides_fbi['Handguns'][12] = homicides_fbi['Handguns'].median()
homicides_fbi['Rifles'][12] = homicides_fbi['Rifles'].median()
homicides_fbi['Unknown'][12] = homicides_fbi['Unknown'].median()

#homicides_fbi

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  homicides_fbi['Murders'][0] = homicides_cdc['Death Rate'][0] - diff.mean()
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  homicides_fbi['Non-Firearm'][0] = homicides_fbi['Non-Firearm'].median()
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  homicides_fbi['Firearm'][0] = homicides_fbi['Firearm'].median()
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/inde

# Wrapping Up

Now let's just save everything as new CSVs so that they can be used in a data analysis notebook or something.

*NOTE: If the files aren't appearing in the left-hand bar, right-click on that space and click Refresh.*

In [None]:
!mkdir /content/original_data
!mv /content/*.csv /content/original_data
!mv /content/project_data.zip /content/original_data/

In [None]:
policy.to_csv("/content/gun_policy_by_state.csv", index=False)
homicides_fbi.to_csv("/content/homicides_by_state_fbi.csv", index=False)
massacres.to_csv("/content/mass_shootings.csv", index=False)
gdp.to_csv("/content/gdp_by_state.csv", index=False)
unemployment.to_csv("/content/unemployment_rate_by_state.csv", index=False)
earnings.to_csv("/content/earnings_by_state.csv", index=False)
covid.to_csv("/content/covid_deaths_by_state.csv", index=False)
divorce.to_csv("/content/divorce_rate_by_state.csv", index=False)
drugs.to_csv("/content/drug_od_deaths_by_state.csv", index=False)
fertility.to_csv("/content/fertility_rate_by_state.csv", index=False)
firearms.to_csv("/content/firearm_deaths_by_state.csv", index=False)
homicides_cdc.to_csv("/content/homicides_by_state_cdc.csv", index=False)
infant.to_csv("/content/infant_mortality_by_state.csv", index=False)
lifespan.to_csv("/content/life_expectancy_by_state.csv", index=False)
marriage.to_csv("/content/marriage_rate_by_state.csv", index=False)
teen_births.to_csv("/content/teen_birth_rate_by_state.csv", index=False)

In [None]:
!zip -r /content/cleaned_project_data.zip /content/*.csv
!rm /content/*.csv

from google.colab import files
files.download("/content/cleaned_project_data.zip")

  adding: content/covid_deaths_by_state.csv (deflated 37%)
  adding: content/divorce_rate_by_state.csv (deflated 39%)
  adding: content/drug_od_deaths_by_state.csv (deflated 37%)
  adding: content/earnings_by_state.csv (deflated 38%)
  adding: content/fertility_rate_by_state.csv (deflated 38%)
  adding: content/firearm_deaths_by_state.csv (deflated 38%)
  adding: content/gdp_by_state.csv (deflated 37%)
  adding: content/gun_policy_by_state.csv (deflated 42%)
  adding: content/homicides_by_state_cdc.csv (deflated 37%)
  adding: content/homicides_by_state_fbi.csv (deflated 46%)
  adding: content/infant_mortality_by_state.csv (deflated 37%)
  adding: content/life_expectancy_by_state.csv (deflated 41%)
  adding: content/marriage_rate_by_state.csv (deflated 39%)
  adding: content/mass_shootings.csv (deflated 74%)
  adding: content/teen_birth_rate_by_state.csv (deflated 38%)
  adding: content/unemployment_rate_by_state.csv (deflated 39%)


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The cleaned up data files should be zipped and ready to go, with a download initiated. 

Just unzip them in the next notebook with a `!zip <filepath/filename>` command in a code cell.