<a href="https://colab.research.google.com/github/ttussing/US-Gun-Exploration-Visualization/blob/master/US_Gun_Exploration.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
#Things to do 
#Describe how we gathered and cleaned the data 
#2-3 datasets 
#Feature Engineering 
#Describe features 
#Descriptive statistics 
#Describe why we chose the data? 

In [0]:
import pandas as pd
import numpy as np

In [0]:
world_url = 'https://raw.githubusercontent.com/ttussing/US-Gun-Exploration-Visualization/master/World_firearms.csv'
state_url = 'https://raw.githubusercontent.com/ttussing/US-Gun-Exploration-Visualization/master/Firearm%20Deaths%20Per%20State%20Per%20Year/All_Firearm_Deaths_State_Year.csv'
state_owner_url = 'https://raw.githubusercontent.com/ttussing/US-Gun-Exploration-Visualization/master/Gun%20Owner%20Stats/Gun_Owner_Statistics_Per_State.csv'
ms_url = 'https://raw.githubusercontent.com/ttussing/US-Gun-Exploration-Visualization/master/Mass%20Shooting/Mass_Shooting_Data_2013-2019.csv'
df_world = pd.read_csv(world_url)
df_state = pd.read_csv(state_url)
df_st_owner = pd.read_csv(state_owner_url)
df_ms = pd.read_csv(ms_url)

## World Firearms Data
**Allows us to see which countries have the highest firearm ownership rates and which have the highest murder rates.**

[Article Describing the Data Set](https://www.theguardian.com/news/datablog/2012/jul/22/gun-homicides-ownership-world-list#data)


---



**Description of the Dataset:**

- A collection of data about the world's firearms murders and gun ownership statistics were compiled in a [Google Sheet](https://docs.google.com/spreadsheets/d/1chqUZHuY6cXYrRYkuE0uwXisGaYvr7durZHJhpLGycs/edit#gid=0). 
- The data is compiled from 2 data sources:

  - The data on firearm homicides was collected by the UNODC through its annual crime survey. This includes the percent of homicides from using a firearm, the number of homicides from using a firearm, and the rate of homicide rate per 100,000 in the population. 
   
  - The data on gun ownership was collected by the small arms survey. This includes the average firearms per 100 people, the average number of civilian firearms, and rank by the rate of gun ownership. The data has been normalized for a rate per 100,000 population. 

- Limitations:
  - Data is missing for Russia, China, and Afghanistan
  - While the Google Spreadsheet says that it is being automatically updated every 5 minutes, the columns for homicide per country that was collected from the UNODC doesn't have information about what year it is for.
   - The data is from an annual crime survey, and it doesn't say if the numbers are being averaged across all years, or if the numbers are for the latest year
  - The data from the small arms survey is only from 2007. 


**Takeaways from this Dataset:**

- The United States has the highest gun ownership in the World. 

- Centra/South America leads on all murder stats, far ahead of US
is ahead in murder only when filtering for high HDI countries.

- The US has less than 5% of the world's population but is home to about 35-50% of the world's civilian-owned guns. 

In [0]:
df_world.head()

Unnamed: 0_level_0,ISO code,Source,% of homicides by firearm,Number of homicides by firearm,"Homicide by firearm rate per 100,000 pop",Rank by rate of ownership,Average firearms per 100 people,Average total all civilian firearms
Country/Territory,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Albania,AL,CTS,65.9,56.0,1.76,70.0,8.6,270000.0
Algeria,DZ,CTS,4.8,20.0,0.06,78.0,7.6,1900000.0
Angola,AO,,,,,34.0,17.3,2800000.0
Anguilla,AI,WHO-MDB,24.0,1.0,7.14,,,
Argentina,AR,Ministry of Justice,52.0,1198.0,3.02,62.0,10.2,3950000.0


In [0]:
df_world = df_world.rename(columns={'Country/Territory':'country', 
                   'ISO code': 'iso_code',
                   'Source' : 'source', 
                   '% of homicides by firearm' : 'percent_hom_farm',
                   'Number of homicides by firearm' : 'num_hom_farm', 
                   'Homicide by firearm rate per 100,000 pop' : 'hom_farm_rate_100k',
                   'Rank by rate of ownership' : "rank_rate_ownership",
                   'Average firearms per 100 people' : 'avg_farm_100ppl',
                   'Average total all civilian firearms' : 'avg_tot_civ_farm'
                  }) 


### Feature Descriptions

**country:** The country name

**iso_code:** unique code for every country or territory

**source:** original data sources

**percent_hom_farm:** percent of homicides by firearms for every country

**num_hom_farm:** number of homicides by firearm

**hom_farm_rate_100k:** homicide by firearm rate per 100,000 people

**rank_rate_ownership:** rank by rate of ownership

**avg_farm_100ppl:** average firearms per 100 people

**avg_tot_civ_farm:** average of all total civilian firearms

**Compare the homicides and firearm ownership of the US to the average across the rest of the world**

In [13]:
# Descriptive Statistics for the United States

dfus = df_world[df_world.country == 'United States']
dfus

Unnamed: 0,country,iso_code,source,percent_hom_farm,num_hom_farm,hom_farm_rate_100k,rank_rate_ownership,avg_farm_100ppl,avg_tot_civ_farm
176,United States,US,CTS,60.0,9146.0,2.97,1.0,88.8,270000000.0


In [14]:
# Clean Null values and list the average of each column across all countries

print("Average percent of homicides by firearms for all countries is " + str(df_world.percent_hom_farm[df_world.percent_hom_farm.isna() == False].mean()))
print("Average number of homicides by firearm for all countries is " + str(df_world.num_hom_farm[df_world.num_hom_farm.isna() == False].mean()))
print("Average homicide by firearm rate for every 100k people for all countries is " + str(df_world.hom_farm_rate_100k[df_world.hom_farm_rate_100k.isna() == False].mean()))
print("Average rank_rate_ownership for all countries is " + str(df_world.rank_rate_ownership[df_world.rank_rate_ownership.isna() == False].mean()))
print("Average firearms per 100 people for all countries is " + str(df_world.avg_farm_100ppl[df_world.avg_farm_100ppl.isna() == False].mean()))
print("Average total civilians firearms for all countries is " + str(df_world.avg_tot_civ_farm[df_world.avg_tot_civ_farm.isna() == False].mean()))

Average percent of homicides by firearms for all countries is 32.22931034482759
Average number of homicides by firearm for all countries is 1100.0603448275863
Average homicide by firearm rate for every 100k people for all countries is 4.885862068965518
Average rank_rate_ownership for all countries is 88.9090909090909
Average firearms per 100 people for all countries is 10.236931818181821
Average total civilians firearms for all countries is 3659138.6363636362


## United States Firearms Data

**The firearm injury related deaths per state and year due to suicides, homicides and unintentional causes.**

---

**Description of the Dataset:**

- Data on fatal injuries due to firearm was taken from the [CDC](https://www.cdc.gov/injury/wisqars/fatal.html) fatal injury data. 

  - WISQARS Fatal’s mortality reports provide tables of the total numbers of injury-related deaths and the death rates per 100,000 population. The reports lists deaths according to cause (mechanism) and intent (manner) of injury by state, race, Hispanic origin, sex, and age groupings.
  
- Data was collected from 1999 - 2017.

- Mortality data reported in WISQARS come from death certificate data reported to the National Center for Health Statistics (NCHS), CDC. NCHS collects, compiles, verifies and prepares these data for release to the public.

- Data was filtered on firearms as the mechanism for 3 data exports:
  - The intent was filtered on suicides, homicides and unintentional causes separately and exported as 3 csv files. 
  - For ease of having one dataset, these csv files were combined in excel with the intent listed under "Type".

- Limitations: 
  - For sub-national geography, not present or publish death counts of 9 or fewer or death rates based on counts of nine or fewer are presnt as "P" for the value. Since these values are not reliable, they will be replaced with "0" for numerical representation. 

**citation:** Centers for Disease Control and Prevention, National Centers for Injury Prevention and Control. Web-based Injury Statistics Query and Reporting System (WISQARS) [online]. (2005) {cited 2019 May 16}. Available from: www.cdc.gov/injury/wisqars

---
Disclaimer: This data is used for a school research project to report on firearms analysis and may not be repurposed for anything other than statistical reporting and analysis. Make no attempt to learn the identity of any person or establishment included in these data.
Make no disclosure or other use of the identity of any person or establishment discovered inadvertently and advise the NCHS Confidentiality Officer of any such discovery.

"The data source for WISQARS Fatal Injury Data Visualization is the National Vital Statistics System (NVSS) operated by the National Center for Health Statistics. WISQARS provides death counts and death rates for the United States and by state, county, age, race, Hispanic ethnicity, sex, leading cause of death, injury intent, and injury mechanism categories. WISQARS can be used to query death data for the years 2001 - 2017, of which the underlying cause of death is specified using ICD-10 codes. The National Center for Health Statistics (NCHS) in an agreement with the National Association of Public Health Statistics and Information Systems (NAPHSIS) has implemented a new, more restrictive rule for reporting state- and county-level death data for years 2008 and later from NVSS in order to avoid inadvertent disclosure of a decedent's identity. Therefore, the Statistics, Programming and Economics Branch, Division of Analysis, Research, and Practice Integration, NCIPC has modified WISQARS to accommodate the new data suppression rule; i.e., no figure, including totals, should be less than 10 in tabulations for sub-national geographic areas, regardless of the number of years combined with the 2008 and later data. Tabulations, charts, and maps produced by WISQARS using only NVSS death data for years prior to 2008 are not affected by this new rule. Therefore, queries of state-level data for years 1999 through 2007 will remain unrestricted; queries of state-level data that include 2008 or later are restricted. As a WISQARS user, please read the following data use restrictions and click "I Agree." You will then be given access to this WISQARS module."

In [0]:
df_state.head()

Unnamed: 0_level_0,Sex,Race,State,Ethnicity,Age Group,First Year,Last Year,Cause of Death,Year,Deaths,Population,Crude Rate,Age-Adjusted Rate
Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Unitentional,Both Sexes,All Races,Alabama,Both,All Ages,1999,2017,Unintentional Firearm,1999.0,51,4430143,1.15,1.15722772
Unitentional,Both Sexes,All Races,Alabama,Both,All Ages,1999,2017,Unintentional Firearm,2000.0,34,4451687,0.76,0.76222845
Unitentional,Both Sexes,All Races,Alabama,Both,All Ages,1999,2017,Unintentional Firearm,2001.0,41,4467634,0.92,0.926071313
Unitentional,Both Sexes,All Races,Alabama,Both,All Ages,1999,2017,Unintentional Firearm,2002.0,45,4480089,1.0,0.997164737
Unitentional,Both Sexes,All Races,Alabama,Both,All Ages,1999,2017,Unintentional Firearm,2003.0,38,4503491,0.84,0.82537211


## Gun Ownership Per State

**Gun Statistics per State**

[Data Source](http://demographicdata.org/facts-and-figures/gun-ownership-statistics/#)


---

**Description of the Dataset:** 

- This dataset contains a complete set of crime rates, gun deaths and gun ownership rates. There are a total of 51 records and 12 fields which include crime types such as violent, murder, rape, robbery, assault, property, burglary, larceny and motor-related crimes. All data are grouped by state.

- The only statistic we are interested in for this dataset is gun ownership column to merge with the US Firearms Data

In [4]:
df_st_owner.head()

Unnamed: 0,State Name,Gun Murder Rate per 100K (2010),Gun Ownership (2007),"Violent Crime (per 100,000) 2013",Murder and nonnegligent manslaughter (per 100K) 2013,Forcible rape (per 100K) 2013,Robbery (per 100K) 2013,Aggravated assault (per 100K) 2013,Property Crime (per 100K) 2013,Burglary (per 100K) 2013,Larceny (per 100K) 2013,Motor Theft (per 100K) 2013
0,Alabama,2.8,51.7%,449.9,7.1,26.9,104.1,311.8,3502.2,984.7,2312.8,204.8
1,Alaska,2.7,57.8%,603.2,4.1,79.7,86.1,433.2,2739.4,403.3,2128.0,208.1
2,Arizona,3.6,31.1%,428.9,5.5,34.7,112.7,276.0,3539.2,807.8,2439.1,292.3
3,Arkansas,3.2,55.3%,469.1,5.9,42.3,78.7,342.3,3660.1,1081.3,2384.7,194.1
4,California,3.4,21.3%,423.1,5.0,20.6,148.6,248.9,2758.7,646.1,1669.5,443.2


## Mass Shootings in the United States

In [0]:
df_ms.head()

Unnamed: 0_level_0,name_semicolon_delimited,killed,wounded,city,state,sources_semicolon_delimited
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
5/15/19,Unkown (victims included a 5-yr-old girl),0,4,Los Angeles,CA,https://abc7.com/3-men-5-year-old-girl-shot-at...
5/13/19,Unknown,4,1,St. Louis,MO,https://www.stltoday.com/news/local/crime-and-...
5/13/19,Unknown (victims included 1 juvenile),0,4,New Orleans,LA,https://www.wdsu.com/article/four-injured-in-c...
5/12/19,Unknown,0,4,Paulsboro,NJ,https://6abc.com/5-hurt-after-gunmen-open-fire...
5/11/19,Thomas Modzel,0,4,Effort,PA,https://wnep.com/2019/05/11/four-people-shot-a...
