


# Project: FBI Firearm Background Check Analysis

## Table of Contents
<ul>
<li><a href="#intro">Introduction</a></li>
<li><a href="#wrangling">Data Wrangling</a></li>
<li><a href="#eda">Exploratory Data Analysis</a></li>
<li><a href="#conclusions">Conclusions</a></li>
</ul>

<a id='intro'></a>
## Introduction

For this project, I will use the FBI NCIS Firearm Background Check and US Census data sets to explore the following questions:
<br>
<ul>
    <li> For each state, was there an increase or decrease in gun background checks per capita in 2016 compared to 2010?</li>
    <br>
    <li> Which state has had the highest growth in gun background checks per capita?</li>
    <br>   
    <li> Which state has highest percentage of <a href="#long_gun">long gun</a> background checks per capita?</li>
    </ul>
    <br>
    
As the census data is given for the years 2010 and 2016, I will use these dates for my analysis.


In [92]:
# Importing packages
%matplotlib inline

import matplotlib
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Importing Data
df1 = pd.read_csv('/Users/lanapalmer/jupyter/FBI_Gun_Data/us_census_data.csv')
df2 = pd.read_csv('/Users/lanapalmer/jupyter/FBI_Gun_Data/gun_data.csv')


<a id='wrangling'></a>
## Data Wrangling

### General Properties: US Census Data

In [93]:
#setting column width
pd.set_option('max_colwidth', 2000)

In [94]:
#Initial data exploration - Head
df1.head()

Unnamed: 0,Fact,Fact Note,Alabama,Alaska,Arizona,Arkansas,California,Colorado,Connecticut,Delaware,...,South Dakota,Tennessee,Texas,Utah,Vermont,Virginia,Washington,West Virginia,Wisconsin,Wyoming
0,"Population estimates, July 1, 2016, (V2016)",,4863300,741894,6931071,2988248,39250017,5540545,3576452,952065,...,865454.0,6651194.0,27862596,3051217,624594,8411808,7288000,1831102,5778708,585501
1,"Population estimates base, April 1, 2010, (V2016)",,4780131,710249,6392301,2916025,37254522,5029324,3574114,897936,...,814195.0,6346298.0,25146100,2763888,625741,8001041,6724545,1853011,5687289,563767
2,"Population, percent change - April 1, 2010 (estimates base) to July 1, 2016, (V2016)",,1.70%,4.50%,8.40%,2.50%,5.40%,10.20%,0.10%,6.00%,...,0.063,0.048,10.80%,10.40%,-0.20%,5.10%,8.40%,-1.20%,1.60%,3.90%
3,"Population, Census, April 1, 2010",,4779736,710231,6392017,2915918,37253956,5029196,3574097,897934,...,814180.0,6346105.0,25145561,2763885,625741,8001024,6724540,1852994,5686986,563626
4,"Persons under 5 years, percent, July 1, 2016, (V2016)",,6.00%,7.30%,6.30%,6.40%,6.30%,6.10%,5.20%,5.80%,...,0.071,0.061,7.20%,8.30%,4.90%,6.10%,6.20%,5.50%,5.80%,6.50%


In [95]:
#Initial data exploration - Tail
df1.tail(10)

Unnamed: 0,Fact,Fact Note,Alabama,Alaska,Arizona,Arkansas,California,Colorado,Connecticut,Delaware,...,South Dakota,Tennessee,Texas,Utah,Vermont,Virginia,Washington,West Virginia,Wisconsin,Wyoming
75,,,,,,,,,,,...,,,,,,,,,,
76,Value Flags,,,,,,,,,,...,,,,,,,,,,
77,-,"Either no or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest or upper interval of an open ended distribution.",,,,,,,,,...,,,,,,,,,,
78,D,Suppressed to avoid disclosure of confidential information,,,,,,,,,...,,,,,,,,,,
79,F,Fewer than 25 firms,,,,,,,,,...,,,,,,,,,,
80,FN,Footnote on this item in place of data,,,,,,,,,...,,,,,,,,,,
81,,Not available,,,,,,,,,...,,,,,,,,,,
82,S,Suppressed; does not meet publication standards,,,,,,,,,...,,,,,,,,,,
83,X,Not applicable,,,,,,,,,...,,,,,,,,,,
84,Z,Value greater than zero but less than half unit of measure shown,,,,,,,,,...,,,,,,,,,,


> The US Census Data file includes the following data for each state: 
<ul>
<li>Population estimate, as of July 1, 2016</li>
<li>Population estimate, as of April 1, 2010</a></li>
<li>Population percentage change, between April 1, 2010 and July 1, 2016 </a></li>
<li>Population census: April 1, 2010</li>
</ul>

> Followed by demographic data, including:
<ul>
<li>Percentage of people under 5, under 18, and over 65</li>
<li>Percentage of females</li>
<li>Percentage of race, including: White, Black or African American, Hispanic, and two or more races </li>
<li>Percentage of those without health insurance</li>
<li>Percentage of those with an undergraduate degree or highter</li>
<li>Percentage in civilian labor force</li>
<li>Percentage living in poverty</li>
<li>Per capita income</li>
<li>Number of veterans</li>
<li>Total employment</li>
</ul>

### General Properties: FBI NICS Firearm Background Check Data

In [76]:
df2.head(10)

Unnamed: 0,month,state,permit,permit_recheck,handgun,long_gun,other,multiple,admin,prepawn_handgun,...,returned_other,rentals_handgun,rentals_long_gun,private_sale_handgun,private_sale_long_gun,private_sale_other,return_to_seller_handgun,return_to_seller_long_gun,return_to_seller_other,totals
0,2017-09,Alabama,16717.0,0.0,5734.0,6320.0,221.0,317,0.0,15.0,...,0.0,0.0,0.0,9.0,16.0,3.0,0.0,0.0,3.0,32019
1,2017-09,Alaska,209.0,2.0,2320.0,2930.0,219.0,160,0.0,5.0,...,0.0,0.0,0.0,17.0,24.0,1.0,0.0,0.0,0.0,6303
2,2017-09,Arizona,5069.0,382.0,11063.0,7946.0,920.0,631,0.0,13.0,...,0.0,0.0,0.0,38.0,12.0,2.0,0.0,0.0,0.0,28394
3,2017-09,Arkansas,2935.0,632.0,4347.0,6063.0,165.0,366,51.0,12.0,...,0.0,0.0,0.0,13.0,23.0,0.0,0.0,2.0,1.0,17747
4,2017-09,California,57839.0,0.0,37165.0,24581.0,2984.0,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,123506
5,2017-09,Colorado,4356.0,0.0,15751.0,13448.0,1007.0,1062,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,35873
6,2017-09,Connecticut,4343.0,673.0,4834.0,1993.0,274.0,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,12117
7,2017-09,Delaware,275.0,0.0,1414.0,1538.0,66.0,68,0.0,0.0,...,0.0,0.0,0.0,55.0,34.0,3.0,1.0,2.0,0.0,3502
8,2017-09,District of Columbia,1.0,0.0,56.0,4.0,0.0,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,61
9,2017-09,Florida,10784.0,0.0,39199.0,17949.0,2319.0,1721,1.0,18.0,...,0.0,0.0,0.0,11.0,9.0,0.0,0.0,1.0,0.0,77390


> The FBI NCIS Firearm Background Check Data is available <a href="https://www.fbi.gov/file-repository/nics_firearm_checks_-_month_year_by_state_type.pdf/view">at this link as a PDF.</a>
It is important to note that the data is a record of background checks, rather than actual gun sales, and as quoted in the original FBI pdf, <i> "based on varying state laws and purchase scenarios, a one-to-one correlation cannot be made between a firearm background check and a firearm sale"</i> (pg. 1)


> The data file includes the number of background checks per month per state. The permit, permit_recheck, handgun and long_gun, other, and multiple columns represent background checks for officially-licensed Federal Firearms Licensee (FFL) or criminal justice/law enforcement agency prior to the issuance of a firearm-related permit or transfer.

A <a id="long_gun">'Long Gun'</a> is described as, <i>"a weapon designed or redesigned, made or remade, and intended to be fired from the shoulder, and designed or redesigned and made or remade to use the energy of the explosive in (a) a fixed metallic cartridge to fire a single projectile through a rifled bore for each single pull of the trigger; or (b) a fixed shotgun shell to fire through a smooth bore either a number of ball shot or a single projectile for each single pull of the trigger"</i>(pg. 1).

> Other types of transactions are:
<ul>
 <li>Pre-Pawn—background checks requested by an officially-licensed FFL on prospective firearm transferees seeking to pledge or pawn a firearm as security for the payment or repayment of money, prior to actually pledging or pawning the firearm</li>
    <br>
    <li>Redemption—background checks requested by an officially-licensed FFL on prospective firearm transferees attempting to regain possession of a firearm after pledging or pawning a firearm as security at a pawn shop</li> 
    <br>
    <li>Returned/Disposition—background checks requested by criminal justice/law enforcement agencies prior to returning a firearm in its possession to the respective transferee, to ensure the individual is not prohibited</li>
    <br>
    <li>Rentals—background checks requested by an officially-licensed FFL on prospective firearm transferees attempting to possess a firearm when the firearm is loaned or rented for use off the premises of the business</li>
    <br>
    <li>Private Sale—background checks requested by an officially-licensed FFL on prospective firearm transferees attempting to possess a firearm from a private party seller who is not an officially-licensed FFL</li>
    <br>
    <li>Return to Seller-Private Sale—background checks requested by an officially-licensed FFL on prospective firearm transferees attempting to possess a firearm from a private party seller who is not an officially-licensed FFL. </li>
    
</ul>
(pg. 1).

### Data Cleaning: US Census Data

> It appears that row 64 contains a 'FIPS Code', and rows 65 - 85 contain coded footnotes on the data. I have decided to drop these rows, as well as the 'Fact Note' column for the purposes of this analysis.

In [77]:
df1.tail(22)

Unnamed: 0,Fact,Fact Note,Alabama,Alaska,Arizona,Arkansas,California,Colorado,Connecticut,Delaware,...,South Dakota,Tennessee,Texas,Utah,Vermont,Virginia,Washington,West Virginia,Wisconsin,Wyoming
63,"Land area in square miles, 2010",,50645.33,570640.95,113594.08,52035.48,155779.22,103641.89,4842.36,1948.54,...,75811,41234.9,261231.71,82169.62,9216.66,39490.09,66455.52,24038.21,54157.80,97093.14
64,FIPS Code,,"""01""","""02""","""04""","""05""","""06""","""08""","""09""","""10""",...,"""46""","""47""","""48""","""49""","""50""","""51""","""53""","""54""","""55""","""56"""
65,,,,,,,,,,,...,,,,,,,,,,
66,NOTE: FIPS Code values are enclosed in quotes to ensure leading zeros remain intact.,,,,,,,,,,...,,,,,,,,,,
67,,,,,,,,,,,...,,,,,,,,,,
68,Value Notes,,,,,,,,,,...,,,,,,,,,,
69,1,Includes data not distributed by county.,,,,,,,,,...,,,,,,,,,,
70,,,,,,,,,,,...,,,,,,,,,,
71,Fact Notes,,,,,,,,,,...,,,,,,,,,,
72,(a),Includes persons reporting only one race,,,,,,,,,...,,,,,,,,,,


In [96]:
#Removing "Fact Note" Column
df1.drop(['Fact Note'], axis=1, inplace= True) 

In [97]:
#Checking that "Fact Note" Column is dropped.
df1.head()

Unnamed: 0,Fact,Alabama,Alaska,Arizona,Arkansas,California,Colorado,Connecticut,Delaware,Florida,...,South Dakota,Tennessee,Texas,Utah,Vermont,Virginia,Washington,West Virginia,Wisconsin,Wyoming
0,"Population estimates, July 1, 2016, (V2016)",4863300,741894,6931071,2988248,39250017,5540545,3576452,952065,20612439,...,865454.0,6651194.0,27862596,3051217,624594,8411808,7288000,1831102,5778708,585501
1,"Population estimates base, April 1, 2010, (V2016)",4780131,710249,6392301,2916025,37254522,5029324,3574114,897936,18804592,...,814195.0,6346298.0,25146100,2763888,625741,8001041,6724545,1853011,5687289,563767
2,"Population, percent change - April 1, 2010 (estimates base) to July 1, 2016, (V2016)",1.70%,4.50%,8.40%,2.50%,5.40%,10.20%,0.10%,6.00%,9.60%,...,0.063,0.048,10.80%,10.40%,-0.20%,5.10%,8.40%,-1.20%,1.60%,3.90%
3,"Population, Census, April 1, 2010",4779736,710231,6392017,2915918,37253956,5029196,3574097,897934,18801310,...,814180.0,6346105.0,25145561,2763885,625741,8001024,6724540,1852994,5686986,563626
4,"Persons under 5 years, percent, July 1, 2016, (V2016)",6.00%,7.30%,6.30%,6.40%,6.30%,6.10%,5.20%,5.80%,5.50%,...,0.071,0.061,7.20%,8.30%,4.90%,6.10%,6.20%,5.50%,5.80%,6.50%


In [98]:
#Dropping rows 64 to 84
df1.drop(df1.index[64:85], inplace= True)

In [99]:
#Transpose rows and columns so that it matches the NCIS data set.
df1 = df1.T

In [100]:
#Check that df1 is transposed.
df1.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,54,55,56,57,58,59,60,61,62,63
Fact,"Population estimates, July 1, 2016, (V2016)","Population estimates base, April 1, 2010, (V2016)","Population, percent change - April 1, 2010 (estimates base) to July 1, 2016, (V2016)","Population, Census, April 1, 2010","Persons under 5 years, percent, July 1, 2016, (V2016)","Persons under 5 years, percent, April 1, 2010","Persons under 18 years, percent, July 1, 2016, (V2016)","Persons under 18 years, percent, April 1, 2010","Persons 65 years and over, percent, July 1, 2016, (V2016)","Persons 65 years and over, percent, April 1, 2010",...,"Total nonemployer establishments, 2015","All firms, 2012","Men-owned firms, 2012","Women-owned firms, 2012","Minority-owned firms, 2012","Nonminority-owned firms, 2012","Veteran-owned firms, 2012","Nonveteran-owned firms, 2012","Population per square mile, 2010","Land area in square miles, 2010"
Alabama,4863300,4780131,1.70%,4779736,6.00%,6.40%,22.60%,23.70%,16.10%,13.80%,...,322025,374153,203604,137630,92219,272651,41943,316984,94.4,50645.33
Alaska,741894,710249,4.50%,710231,7.30%,7.60%,25.20%,26.40%,10.40%,7.70%,...,55521,68032,35402,22141,13688,51147,7953,56091,1.2,570640.95
Arizona,6931071,6392301,8.40%,6392017,6.30%,7.10%,23.50%,25.50%,16.90%,13.80%,...,451951,499926,245243,182425,135313,344981,46780,427582,56.3,113594.08
Arkansas,2988248,2916025,2.50%,2915918,6.40%,6.80%,23.60%,24.40%,16.30%,14.40%,...,198380,231959,123158,75962,35982,189029,25915,192988,56,52035.48


In [101]:
#Checking Data Type of each column
df1.dtypes

0     object
1     object
2     object
3     object
4     object
       ...  
59    object
60    object
61    object
62    object
63    object
Length: 64, dtype: object

> As the above shows, I will need to convert the data from string type into integer type. I will begin with:
<ul>
    <li>Column [0] - Population Estimate, July 1, 2016</li>
    <br>
        <li>Column [3] - Population Census, April 1, 2010</li>
    </ul>

In [102]:
#Extracting integers from strings
df1[0] = df1[0].str.extract('(\d+)').astype(int)

In [108]:
#Checking that type is now integer
df1.dtypes

0      int64
1     object
2     object
3      int64
4     object
       ...  
59    object
60    object
61    object
62    object
63    object
Length: 64, dtype: object

In [110]:
#Extracting integers from strings
df1[3] = df1[3].str.extract('(\d+)').astype(int)

AttributeError: Can only use .str accessor with string values!

In [107]:
#Checking that type is now integer
df1.dtypes

0      int64
1     object
2     object
3      int64
4     object
       ...  
59    object
60    object
61    object
62    object
63    object
Length: 64, dtype: object

### Data Cleaning: FBI NCIS Firearm Background Check data

> As I've decided to focus on the years provided in the US Census data (2010 and 2016), my first task is to create a new dataframe containing the annual total for each state for these years.

In [113]:
df2.head(100)

Unnamed: 0,month,state,permit,permit_recheck,handgun,long_gun,other,multiple,admin,prepawn_handgun,...,returned_other,rentals_handgun,rentals_long_gun,private_sale_handgun,private_sale_long_gun,private_sale_other,return_to_seller_handgun,return_to_seller_long_gun,return_to_seller_other,totals
0,2017-09,Alabama,16717.0,0.0,5734.0,6320.0,221.0,317,0.0,15.0,...,0.0,0.0,0.0,9.0,16.0,3.0,0.0,0.0,3.0,32019
1,2017-09,Alaska,209.0,2.0,2320.0,2930.0,219.0,160,0.0,5.0,...,0.0,0.0,0.0,17.0,24.0,1.0,0.0,0.0,0.0,6303
2,2017-09,Arizona,5069.0,382.0,11063.0,7946.0,920.0,631,0.0,13.0,...,0.0,0.0,0.0,38.0,12.0,2.0,0.0,0.0,0.0,28394
3,2017-09,Arkansas,2935.0,632.0,4347.0,6063.0,165.0,366,51.0,12.0,...,0.0,0.0,0.0,13.0,23.0,0.0,0.0,2.0,1.0,17747
4,2017-09,California,57839.0,0.0,37165.0,24581.0,2984.0,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,123506
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,2017-08,Pennsylvania,24329.0,0.0,40628.0,13170.0,78.0,0,172.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,79020
96,2017-08,Puerto Rico,0.0,0.0,1169.0,227.0,42.0,34,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1490
97,2017-08,Rhode Island,0.0,0.0,883.0,596.0,68.0,154,0.0,0.0,...,1.0,0.0,0.0,7.0,3.0,0.0,0.0,1.0,0.0,1718
98,2017-08,South Carolina,12255.0,2077.0,8268.0,6421.0,412.0,361,0.0,17.0,...,0.0,0.0,0.0,22.0,11.0,0.0,1.0,0.0,0.0,31663
