<a href="https://colab.research.google.com/github/freedom-780/FBI-Firearm-Background-Check/blob/main/Firearm_Background_Check.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a id='intro'></a>
## Introduction

> The dataset from The National Instant Criminal Background Check System (NICS) is managed by the FBI  according to the Brady Handgun Violence Prevention Act of 1993 (Brady Act) -- fully implemented on Novemeber 30, 1998 -- and there are differences in state law in regards to requirements. Those that have valid ATF permits and for services and repair picked u by the same person don't require a background check. Data is collected with ATF Form 4473

Types of background checks:

* Handgun—(a) any firearm which has a short stock and is designed to be held and fired by the use of a single hand; and (b) any combination of parts from which a firearm described in paragraph (a) can be assembled.
* Long Gun—a weapon designed or redesigned, made or remade, and intended to be fired from the shoulder, and designed or redesigned and made or remade to use the energy of the explosive in (a) a fixed metallic cartridge to fire a single projectile through a rifled bore for each single pull of the trigger; or (b) a fixed shotgun shell to fire through a smooth bore either a number of ball shot or a single projectile for each single pull of the trigger.
* Other(might want to drop columns or check if they are outliers)—refers to frames, receivers, and other firearms that are neither handguns nor long guns (rifles or shotguns), such as firearms having a pistol grip that expel a shotgun shell, or National Firearms Act firearms, including silencers.

Questions 

* Which type of gun sale is increasing the most?
* Any trends between type of gun ownership and mass shootings?(https://github.com/StanfordGeospatialCenter/MSA/tree/master/Data)
* WHat are the trends in background checks and mass shootings for states with the weakest policies?
* Are there any intresting overall trends, like spikes in gun ownership for certain periods of time such such as policy changes?











In [None]:
# import packages 
import pandas as pd 
pd.options.display.float_format = '{:,.2f}'.format
import numpy as np
import seaborn as sns 
import matplotlib.pyplot as plt
%matplotlib inline



<a id='wrangling'></a>
## Data Wrangling

> **Tip**: In this section of the report, you will load in the data, check for cleanliness, and then trim and clean your dataset for analysis. Make sure that you document your steps carefully and justify your cleaning decisions.

### General Properties

In [None]:
# Import data at gun permits and mass shootings 

gun_data = "https://github.com/freedom-780/FBI-Firearm-Background-Check/blob/main/gun_data.xlsx?raw=true"
gun_background_check = pd.read_excel(gun_data)

standford_msa = "https://raw.githubusercontent.com/freedom-780/FBI-Firearm-Background-Check/main/Stanford_MSA_Database.csv"
mass_shootings = pd.read_csv(standford_msa)






In [None]:
# Check to see the column names to figure out which ones are important
gun_background_check.columns

Index(['month', 'state', 'permit', 'permit_recheck', 'handgun', 'long_gun',
       'other', 'multiple', 'admin', 'prepawn_handgun', 'prepawn_long_gun',
       'prepawn_other', 'redemption_handgun', 'redemption_long_gun',
       'redemption_other', 'returned_handgun', 'returned_long_gun',
       'returned_other', 'rentals_handgun', 'rentals_long_gun',
       'private_sale_handgun', 'private_sale_long_gun', 'private_sale_other',
       'return_to_seller_handgun', 'return_to_seller_long_gun',
       'return_to_seller_other', 'totals'],
      dtype='object')

In [None]:
# Check the number of columns an decide which ones to keep
mass_shootings.columns

Index(['CaseID', 'Title', 'Location', 'City', 'State', 'Latitude', 'Longitude',
       'Number of Civilian Fatalities', 'Number of Civilian Injured',
       'Number of Enforcement Fatalities', 'Number of Enforcement Injured',
       'Total Number of Fatalities', 'Total Number of Victims', 'Description',
       'Date', 'Day of Week', 'Date - Detailed', 'Shooter Name',
       'Number of shooters', 'Shooter Age(s)', 'Average Shooter Age',
       'Shooter Sex', 'Shooter Race', 'Type of Gun - Detailed',
       'Type of Gun - General', 'Number of Shotguns', 'Number of Rifles',
       'Number of Handguns', 'Total Number of Guns',
       'Number of Automatic Guns', 'Number of Semi-Automatic Guns',
       'Fate of Shooter at the scene', 'Fate of Shooter',
       'Shooter's Cause of Death', 'School Related', 'Place Type',
       'Relationship to Incident Location', 'Targeted Victim/s - Detailed',
       'Targeted Victim/s - General', 'Possible Motive - Detailed',
       'Possible Motive - Genera

Disgard the columns not needs for analysis. 

For the background check 'month', 'state, 'handgun', 'long_gun', 'other', 'multiple',private_sale_handgun', 'private_sale_long_gun', 'private_sale_other', 'totals' data only the columns most useful for the analysis are

For mass shooting: 

In [None]:
# drop columns that are not needed for analysis
gun_background_check.drop(gun_background_check.columns.difference(['month', 'state', 'handgun', 'long_gun', 'other', 'multiple', 'private_sale_handgun', 'private_sale_long_gun', 'private_sale_other', 'totals']), 1, inplace=True)

In [None]:
# Check to see if the right columns were dropped 
gun_background_check.columns

Index(['month', 'state', 'handgun', 'long_gun', 'other', 'multiple',
       'private_sale_handgun', 'private_sale_long_gun', 'private_sale_other',
       'totals'],
      dtype='object')

In [None]:
# get info and see if data types are correct 
gun_background_check.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12485 entries, 0 to 12484
Data columns (total 10 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   month                  12485 non-null  object 
 1   state                  12485 non-null  object 
 2   handgun                12465 non-null  float64
 3   long_gun               12466 non-null  float64
 4   other                  5500 non-null   float64
 5   multiple               12485 non-null  int64  
 6   private_sale_handgun   2750 non-null   float64
 7   private_sale_long_gun  2750 non-null   float64
 8   private_sale_other     2750 non-null   float64
 9   totals                 12485 non-null  int64  
dtypes: float64(6), int64(2), object(2)
memory usage: 975.5+ KB


In [None]:
# Drop the columns that are not needed 
mass_shootings.drop(mass_shootings.columns.difference(['State','Date','Type of Gun - General','Number of Shotguns','Number of Rifles','Number of Handguns','Total Number of Guns','Number of Automatic Guns','Number of Semi-Automatic Guns','Possible Motive - General','History of Mental Illness - General']), 1, inplace=True)

In [None]:
# Check if the correct columns were dropped
mass_shootings.columns

Index(['State', 'Date', 'Type of Gun - General', 'Number of Shotguns',
       'Number of Rifles', 'Number of Handguns', 'Total Number of Guns',
       'Number of Automatic Guns', 'Number of Semi-Automatic Guns',
       'Possible Motive - General', 'History of Mental Illness - General'],
      dtype='object')

In [None]:
#Check data types of mass shooting
mass_shootings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 335 entries, 0 to 334
Data columns (total 11 columns):
 #   Column                               Non-Null Count  Dtype 
---  ------                               --------------  ----- 
 0   State                                335 non-null    object
 1   Date                                 335 non-null    object
 2   Type of Gun - General                335 non-null    object
 3   Number of Shotguns                   335 non-null    object
 4   Number of Rifles                     335 non-null    object
 5   Number of Handguns                   335 non-null    object
 6   Total Number of Guns                 335 non-null    object
 7   Number of Automatic Guns             335 non-null    object
 8   Number of Semi-Automatic Guns        335 non-null    object
 9   Possible Motive - General            333 non-null    object
 10  History of Mental Illness - General  335 non-null    object
dtypes: object(11)
memory usage: 28.9+ KB


The goal is to change the columns that start with "Number" into float64 which will be accomplished in 4 steps

In [None]:
# Check a dictionary with desired data types to change 
mass_change_dtype = {'Number of Shotguns': 'float64', 
                     'Number of Rifles': 'float64', 
                     'Total Number of Guns': 'float64',
                     'Number of Automatic Guns': 'float64',
                     'Number of Semi-Automatic Guns': 'float64',
                     'Number of Handguns': 'float64'
                     }

In [None]:
''' step 1: look at the unique values in each variable
to determine what type of cleaning is required '''

for x in list(mass_change_dtype.keys()):
  print(x)
  print(mass_shootings[x].unique())



Number of Shotguns
['1' '0' '2' 'Unknown' 'Handgun']
Number of Rifles
['3' '0' '1' '0 (1)' '2' 'Unknown']
Total Number of Guns
['7' '1' '2' '3' '6' '10' '4' '5' 'Unknown' '0']
Number of Automatic Guns
['0' '1(0)' '2' '1' 'Unknown']
Number of Semi-Automatic Guns
['1' '0' '1(2)' '3' '2' '4' 'Unknown']
Number of Handguns
['3' '1' '2' '0' '2 (1)' '4' '7' '5' 'Unknown']


In [None]:
''' step 2: find the rows in each key that require cleaning '''

# seclect only rows with a parenthesis 
mask = mass_shootings['Number of Rifles'].str.contains('\(')
mask_1 = mass_shootings['Number of Automatic Guns'].str.contains('\(')
mask_2 = mass_shootings['Number of Semi-Automatic Guns'].str.contains('\(')
mask_3 = mass_shootings['Number of Handguns'].str.contains('\(')

print(mass_shootings.loc[mask,"Number of Rifles"])
print(mass_shootings.loc[mask_1,"Number of Automatic Guns"])
print(mass_shootings.loc[mask_2,"Number of Semi-Automatic Guns"])
print(mass_shootings.loc[mask_3,"Number of Handguns"])


15    0 (1)
Name: Number of Rifles, dtype: object
15    1(0)
Name: Number of Automatic Guns, dtype: object
15    1(2)
Name: Number of Semi-Automatic Guns, dtype: object
15    2 (1)
Name: Number of Handguns, dtype: object


So, the values with parenthes are all in a single row. Now it's time to clean this row. 

In [None]:
# clean data with ")"

# select only rows with parenthesis 
for x in list(mass_change_dtype.keys()):
  # select only rows with parenthesis
  mask_4 = mass_shootings[x].str.contains('\(')
  # make sure there are any relevant rows with a true mask value
  if mask_4.any():
    # set only rows with the issue equal to the first column
    mass_shootings.loc[mask_4, x] = mass_shootings.loc[mask_4, x].str.split('\(', expand=True).iloc[:,0].str.strip()


In [None]:
# Check to see if the cleaning was done correctly 

for x in list(mass_change_dtype.keys()):
  print(mass_shootings[x].unique())

['1' '0' '2' 'Unknown' 'Handgun']
['3' '0' '1' '2' 'Unknown']
['7' '1' '2' '3' '6' '10' '4' '5' 'Unknown' '0']
['0' '1' '2' 'Unknown']
['1' '0' '3' '2' '4' 'Unknown']
['3' '1' '2' '0' '4' '7' '5' 'Unknown']


In [92]:
# drop rows with unknown values 
for i in mass_shootings:
  mass_shootings.drop(mass_shootings.index[mass_shootings[i] == 'Unknown'], inplace = True)
  
    
  

In [93]:

for x in list(mass_change_dtype.keys()):
  print(mass_shootings[x].unique())

['1' '0' '2']
['3' '0' '1' '2']
['7' '1' '2' '3' '6' '10' '4' '5']
['0' '1' '2']
['1' '0' '3' '2' '4']
['3' '1' '2' '0' '4' '7' '5']


In [95]:
#convert the columns in the keys to float64
mass_shootings = mass_shootings.astype(mass_change_dtype)

In [96]:
mass_shootings.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 147 entries, 0 to 333
Data columns (total 11 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   State                                147 non-null    object 
 1   Date                                 147 non-null    object 
 2   Type of Gun - General                147 non-null    object 
 3   Number of Shotguns                   147 non-null    float64
 4   Number of Rifles                     147 non-null    float64
 5   Number of Handguns                   147 non-null    float64
 6   Total Number of Guns                 147 non-null    float64
 7   Number of Automatic Guns             147 non-null    float64
 8   Number of Semi-Automatic Guns        147 non-null    float64
 9   Possible Motive - General            147 non-null    object 
 10  History of Mental Illness - General  147 non-null    object 
dtypes: float64(6), object(5)
memory 

Since the Columns are the proper data type for both values, its time to do some final cleaning 

In [98]:
# print statistics about permit data 

print(f"Shape of gun background data: {gun_background_check.shape}")
print(f"Number of mission values in gun background data:\
{gun_background_check.isnull().sum().sum()}")
print(f"Number of duplicate values in gun background data:\
{gun_background_check.duplicated().sum()}")

Shape of gun background data: (12485, 10)
Number of mission values in gun background data:36229
Number of duplicate values in gun background data:0


In [99]:
gun_background_check.isnull().sum()

month                       0
state                       0
handgun                    20
long_gun                   19
other                    6985
multiple                    0
private_sale_handgun     9735
private_sale_long_gun    9735
private_sale_other       9735
totals                      0
dtype: int64

since private sales has so many null values(about 10,000 out of about 12,000), these columns should just be dropped

In [None]:
# print statistics about mass shootings data

print(f"Shape of mass shooting data: {mass_shootings.shape}")
print(f"Number of mission values in mass shooting data: \
{mass_shootings.isnull().sum().sum()}")
print(f"Number of duplicate values in mass shooting data: \
{mass_shootings.duplicated().sum()}")

In [None]:
mass_shootings_data.i

In [None]:
mass_shootings_data.columns

In [None]:
us_census_data.columns


May want to change colums labels for U.S. census data and make it more concises  

May want to change some of the int data types to floats

The us census dtypes are obejcts meaning strings which means they need to change from strings to floats

> **Tip**: You should _not_ perform too many operations in each cell. Create cells freely to explore your data. One option that you can take with this project is to do a lot of explorations in an initial notebook. These don't have to be organized, but make sure you use enough comments to understand the purpose of each code cell. Then, after you're done with your analysis, create a duplicate notebook where you will trim the excess and organize your steps so that you have a flowing, cohesive report.

> **Tip**: Make sure that you keep your reader informed on the steps that you are taking in your investigation. Follow every code cell, or every set of related code cells, with a markdown cell to describe to the reader what was found in the preceding cell(s). Try to make it so that the reader can then understand what they will be seeing in the following cell(s).

### Data Cleaning (Replace this with more specific notes!)

In [None]:
# After discussing the structure of the data and any problems that need to be
#   cleaned, perform those cleaning steps in the second part of this section.


In [None]:
print(f"Shape of data: {permit_data.shape}")
print(f"Number of missing values in the data:\
{permit_data.isnull().sum().sum()}")
print(f"Number of duplicated values: {permit_data.duplicated().sum()}")

Remove missing values 

In [None]:
permit_data.dropna(inplace=True)

# Verify missing values 

print(f"Number of missing values in the data:\
{permit_data.isnull().sum().sum()}")

In [None]:
print(f"Shape of data: {us_census_data.shape}")
print(f"Number of missing values in the data:\
{us_census_data.isnull().sum().sum()}")
print(f"Number of duplicated values: {us_census_data.duplicated().sum()}")

In [None]:
permit_data.duplicated().sum()

In [None]:
us_census_data.isnull().sum()

In [None]:
us_census_data.duplicated().sum()

<a id='eda'></a>
## Exploratory Data Analysis

> **Tip**: Now that you've trimmed and cleaned your data, you're ready to move on to exploration. Compute statistics and create visualizations with the goal of addressing the research questions that you posed in the Introduction section. It is recommended that you be systematic with your approach. Look at one variable at a time, and then follow it up by looking at relationships between variables.

### Research Question 1 (Replace this header name!)

In [None]:
# What census data ismost associated withhigh gun per capita?Which states have hadthe highest growth ingun registrations?What is the overalltrend of gunpurchases?


### Research Question 2  (Replace this header name!)
#What census data ismost associated withhigh gun per capita?Which states have hadthe highest growth ingun registrations?What is the overalltrend of gunpurchases?

In [None]:
# Continue to explore the data to address your additional research
#   questions. Add more headers as needed if you have more questions to
#   investigate.


<a id='conclusions'></a>
## Conclusions

> **Tip**: Finally, summarize your findings and the results that have been performed. Make sure that you are clear with regards to the limitations of your exploration. If you haven't done any statistical tests, do not imply any statistical conclusions. And make sure you avoid implying causation from correlation!

> **Tip**: Once you are satisfied with your work, you should save a copy of the report in HTML or PDF form via the **File** > **Download as** submenu. Before exporting your report, check over it to make sure that the flow of the report is complete. You should probably remove all of the "Tip" quotes like this one so that the presentation is as tidy as possible. Congratulations!