<a href="https://colab.research.google.com/github/freedom-780/FBI-Firearm-Background-Check/blob/main/Firearm_Background_Check.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a id='intro'></a>
## Introduction

> The dataset from The National Instant Criminal Background Check System (NICS) is managed by the FBI  according to the Brady Handgun Violence Prevention Act of 1993 (Brady Act) -- fully implemented on Novemeber 30, 1998 -- and there are differences in state law in regards to requirements. Those that have valid ATF permits and for services and repair picked u by the same person don't require a background check. Data is collected with ATF Form 4473

Types of background checks:

* Handgun—(a) any firearm which has a short stock and is designed to be held and fired by the use of a single hand; and (b) any combination of parts from which a firearm described in paragraph (a) can be assembled.
* Long Gun—a weapon designed or redesigned, made or remade, and intended to be fired from the shoulder, and designed or redesigned and made or remade to use the energy of the explosive in (a) a fixed metallic cartridge to fire a single projectile through a rifled bore for each single pull of the trigger; or (b) a fixed shotgun shell to fire through a smooth bore either a number of ball shot or a single projectile for each single pull of the trigger.
* Other(might want to drop columns or check if they are outliers)—refers to frames, receivers, and other firearms that are neither handguns nor long guns (rifles or shotguns), such as firearms having a pistol grip that expel a shotgun shell, or National Firearms Act firearms, including silencers.

Questions 

* Comparisons of regular gun sales to private gun sales?
* which type of gun sale is increasing the most?
* Any trends between type of gun ownership and mass shootings?(https://github.com/StanfordGeospatialCenter/MSA/tree/master/Data)
* WHat are the trends in background checks and mass shootings for states with the weakest poklicies?
* Are there any intresting overall trends, like spikes in gun ownership for certain periods of time such such as policy changes?
* prices of guns and mass shootings?











In [99]:
# import packages 
import pandas as pd 
pd.options.display.float_format = '{:,.2f}'.format
pd.set_option('display.width', 70)
pd.set_option('display.max_columns', 8)
import numpy as np
import seaborn as sns 
import matplotlib.pyplot as plt
%matplotlib inline



<a id='wrangling'></a>
## Data Wrangling

> **Tip**: In this section of the report, you will load in the data, check for cleanliness, and then trim and clean your dataset for analysis. Make sure that you document your steps carefully and justify your cleaning decisions.

### General Properties

In [None]:
# Load data and print out a few lines. Perform operations to inspect data

gun_data = "https://github.com/freedom-780/FBI-Firearm-Background-Check/blob/main/gun_data.xlsx?raw=true"
gun_background_check = pd.read_excel(gun_data)

standford_msa = "https://raw.githubusercontent.com/freedom-780/FBI-Firearm-Background-Check/main/Stanford_MSA_Database.csv"
mass_shootings = pd.read_csv(standford_msa)






In [None]:
gun_background_check.columns

Index(['month', 'state', 'permit', 'permit_recheck', 'handgun', 'long_gun', 'other',
       'multiple', 'admin', 'prepawn_handgun', 'prepawn_long_gun', 'prepawn_other',
       'redemption_handgun', 'redemption_long_gun', 'redemption_other',
       'returned_handgun', 'returned_long_gun', 'returned_other', 'rentals_handgun',
       'rentals_long_gun', 'private_sale_handgun', 'private_sale_long_gun',
       'private_sale_other', 'return_to_seller_handgun',
       'return_to_seller_long_gun', 'return_to_seller_other', 'totals'],
      dtype='object')

In [None]:
mass_shootings.columns

Index(['CaseID', 'Title', 'Location', 'City', 'State', 'Latitude', 'Longitude',
       'Number of Civilian Fatalities', 'Number of Civilian Injured',
       'Number of Enforcement Fatalities', 'Number of Enforcement Injured',
       'Total Number of Fatalities', 'Total Number of Victims', 'Description',
       'Date', 'Day of Week', 'Date - Detailed', 'Shooter Name',
       'Number of shooters', 'Shooter Age(s)', 'Average Shooter Age', 'Shooter Sex',
       'Shooter Race', 'Type of Gun - Detailed', 'Type of Gun - General',
       'Number of Shotguns', 'Number of Rifles', 'Number of Handguns',
       'Total Number of Guns', 'Number of Automatic Guns',
       'Number of Semi-Automatic Guns', 'Fate of Shooter at the scene',
       'Fate of Shooter', 'Shooter's Cause of Death', 'School Related',
       'Place Type', 'Relationship to Incident Location',
       'Targeted Victim/s - Detailed', 'Targeted Victim/s - General',
       'Possible Motive - Detailed', 'Possible Motive - General',
   

Disgard the columns not needs for analysis. 

For the background check 'month', 'state, 'handgun', 'long_gun', 'other', 'multiple',private_sale_handgun', 'private_sale_long_gun', 'private_sale_other', 'totals' data only the columns most useful for the analysis are

For mass shooting: 

In [None]:
# drop columns that are not needed
gun_background_check.drop(gun_background_check.columns.difference(['month', 'state', 'handgun', 'long_gun', 'other', 'multiple', 'private_sale_handgun', 'private_sale_long_gun', 'private_sale_other', 'totals']), 1, inplace=True)

In [None]:
gun_background_check.columns

Index(['month', 'state', 'handgun', 'long_gun', 'other', 'multiple',
       'private_sale_handgun', 'private_sale_long_gun', 'private_sale_other',
       'totals'],
      dtype='object')

In [None]:
gun_background_check.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12485 entries, 0 to 12484
Data columns (total 10 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   month                  12485 non-null  object 
 1   state                  12485 non-null  object 
 2   handgun                12465 non-null  float64
 3   long_gun               12466 non-null  float64
 4   other                  5500 non-null   float64
 5   multiple               12485 non-null  int64  
 6   private_sale_handgun   2750 non-null   float64
 7   private_sale_long_gun  2750 non-null   float64
 8   private_sale_other     2750 non-null   float64
 9   totals                 12485 non-null  int64  
dtypes: float64(6), int64(2), object(2)
memory usage: 975.5+ KB


In [None]:
mass_shootings.drop(mass_shootings.columns.difference(['State','Date','Type of Gun - General','Number of Shotguns','Number of Rifles','Number of Handguns','Total Number of Guns','Number of Automatic Guns','Number of Semi-Automatic Guns','Possible Motive - General','History of Mental Illness - General']), 1, inplace=True)

In [None]:
mass_shootings.columns

Index(['State', 'Date', 'Type of Gun - General', 'Number of Shotguns',
       'Number of Rifles', 'Number of Handguns', 'Total Number of Guns',
       'Number of Automatic Guns', 'Number of Semi-Automatic Guns',
       'Possible Motive - General', 'History of Mental Illness - General'],
      dtype='object')

In [None]:
mass_shootings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 335 entries, 0 to 334
Data columns (total 11 columns):
 #   Column                               Non-Null Count  Dtype 
---  ------                               --------------  ----- 
 0   State                                335 non-null    object
 1   Date                                 335 non-null    object
 2   Type of Gun - General                335 non-null    object
 3   Number of Shotguns                   335 non-null    object
 4   Number of Rifles                     335 non-null    object
 5   Number of Handguns                   335 non-null    object
 6   Total Number of Guns                 335 non-null    object
 7   Number of Automatic Guns             335 non-null    object
 8   Number of Semi-Automatic Guns        335 non-null    object
 9   Possible Motive - General            333 non-null    object
 10  History of Mental Illness - General  335 non-null    object
dtypes: object(11)
memory usage: 28.9+ KB


In [None]:
mass_change_dtype = {'Number of Shotguns': 'float64', 
                     'Number of Rifles': 'float64', 
                     'Total Number of Guns': 'float64',
                     'Number of Automatic Guns': 'float64',
                     'Number of Semi-Automatic Guns': 'float64'
                     }

In [None]:
mass_shootings.astype(mass_change_dtype)

ValueError: ignored

In [104]:
for i in mass_shootings:
  mass_shootings.drop(mass_shootings.index[mass_shootings[i] == 'Unknown'], inplace = True)
  mass_shootings.drop(mass_shootings.index[mass_shootings[i] == '0 (1)'], inplace = True)


In [107]:
mass_shootings = mass_shootings.astype(mass_change_dtype)

In [108]:
mass_shootings.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 146 entries, 0 to 333
Data columns (total 11 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   State                                146 non-null    object 
 1   Date                                 146 non-null    object 
 2   Type of Gun - General                146 non-null    object 
 3   Number of Shotguns                   146 non-null    float64
 4   Number of Rifles                     146 non-null    float64
 5   Number of Handguns                   146 non-null    object 
 6   Total Number of Guns                 146 non-null    float64
 7   Number of Automatic Guns             146 non-null    float64
 8   Number of Semi-Automatic Guns        146 non-null    float64
 9   Possible Motive - General            146 non-null    object 
 10  History of Mental Illness - General  146 non-null    object 
dtypes: float64(5), object(6)
memory 

for the mass shootings database, the following colu

In [None]:
# print statistics about permit data 

print(f"Shape of permit data: {gun_background_check.shape}")
print(f"Number of mission values in permit data: \
{gun_background_check.isnull().sum().sum()}")
print(f"Number of duplicate values in permit data: \
{gun_background_check.duplicated().sum()}")

Shape of permit data: (12485, 10)
Number of mission values in permit data: 36229
Number of duplicate values in permit data: 0


In [None]:
gun_background_check.isnull().sum()

month                       0
state                       0
handgun                    20
long_gun                   19
other                    6985
multiple                    0
private_sale_handgun     9735
private_sale_long_gun    9735
private_sale_other       9735
totals                      0
dtype: int64

In [None]:
# print statistics about mass shootings data

print(f"Shape of mass shooting data: {mass_shootings.shape}")
print(f"Number of mission values in mass shooting data: \
{mass_shootings.isnull().sum().sum()}")
print(f"Number of duplicate values in mass shooting data: \
{mass_shootings.duplicated().sum()}")

Shape of mass shooting data: (335, 11)
Number of mission values in mass shooting data: 2
Number of duplicate values in mass shooting data: 1


In [None]:
mass_shootings.isnull().any()

State                                  False
Date                                   False
Type of Gun - General                  False
Number of Shotguns                     False
Number of Rifles                       False
Number of Handguns                     False
Total Number of Guns                   False
Number of Automatic Guns               False
Number of Semi-Automatic Guns          False
Possible Motive - General               True
History of Mental Illness - General    False
dtype: bool

In [None]:
mass_shootings_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 335 entries, 0 to 334
Data columns (total 55 columns):
 #   Column                                Non-Null Count  Dtype  
---  ------                                --------------  -----  
 0   CaseID                                335 non-null    int64  
 1   Title                                 335 non-null    object 
 2   Location                              335 non-null    object 
 3   City                                  335 non-null    object 
 4   State                                 335 non-null    object 
 5   Latitude                              335 non-null    float64
 6   Longitude                             335 non-null    float64
 7   Number of Civilian Fatalities         335 non-null    int64  
 8   Number of Civilian Injured            335 non-null    int64  
 9   Number of Enforcement Fatalities      335 non-null    int64  
 10  Number of Enforcement Injured         335 non-null    int64  
 11  Total Number of Fat

In [None]:
mass_shootings_data.columns

Index(['CaseID', 'Title', 'Location', 'City', 'State', 'Latitude', 'Longitude',
       'Number of Civilian Fatalities', 'Number of Civilian Injured',
       'Number of Enforcement Fatalities', 'Number of Enforcement Injured',
       'Total Number of Fatalities', 'Total Number of Victims', 'Description',
       'Date', 'Day of Week', 'Date - Detailed', 'Shooter Name',
       'Number of shooters', 'Shooter Age(s)', 'Average Shooter Age',
       'Shooter Sex', 'Shooter Race', 'Type of Gun - Detailed',
       'Type of Gun - General', 'Number of Shotguns', 'Number of Rifles',
       'Number of Handguns', 'Total Number of Guns',
       'Number of Automatic Guns', 'Number of Semi-Automatic Guns',
       'Fate of Shooter at the scene', 'Fate of Shooter',
       'Shooter's Cause of Death', 'School Related', 'Place Type',
       'Relationship to Incident Location', 'Targeted Victim/s - Detailed',
       'Targeted Victim/s - General', 'Possible Motive - Detailed',
       'Possible Motive - Genera

In [None]:
us_census_data.columns


Index(['Fact', 'Fact Note', 'Alabama', 'Alaska', 'Arizona', 'Arkansas',
       'California', 'Colorado', 'Connecticut', 'Delaware', 'Florida',
       'Georgia', 'Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa', 'Kansas',
       'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts',
       'Michigan', 'Minnesota', 'Mississippi', 'Missouri', 'Montana',
       'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey', 'New Mexico',
       'New York', 'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma',
       'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina',
       'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virginia',
       'Washington', 'West Virginia', 'Wisconsin', 'Wyoming'],
      dtype='object')

In [None]:
permit_data.head(10)


Unnamed: 0,month,state,permit,permit_recheck,handgun,long_gun,other,multiple,admin,prepawn_handgun,prepawn_long_gun,prepawn_other,redemption_handgun,redemption_long_gun,redemption_other,returned_handgun,returned_long_gun,returned_other,rentals_handgun,rentals_long_gun,private_sale_handgun,private_sale_long_gun,private_sale_other,return_to_seller_handgun,return_to_seller_long_gun,return_to_seller_other,totals
0,2017-09,Alabama,16717.0,0.0,5734.0,6320.0,221.0,317,0.0,15.0,21.0,2.0,1378.0,1262.0,1.0,0.0,0.0,0.0,0.0,0.0,9.0,16.0,3.0,0.0,0.0,3.0,32019
1,2017-09,Alaska,209.0,2.0,2320.0,2930.0,219.0,160,0.0,5.0,2.0,0.0,200.0,154.0,2.0,28.0,30.0,0.0,0.0,0.0,17.0,24.0,1.0,0.0,0.0,0.0,6303
2,2017-09,Arizona,5069.0,382.0,11063.0,7946.0,920.0,631,0.0,13.0,6.0,0.0,1474.0,748.0,3.0,82.0,5.0,0.0,0.0,0.0,38.0,12.0,2.0,0.0,0.0,0.0,28394
3,2017-09,Arkansas,2935.0,632.0,4347.0,6063.0,165.0,366,51.0,12.0,13.0,0.0,1296.0,1824.0,4.0,0.0,0.0,0.0,0.0,0.0,13.0,23.0,0.0,0.0,2.0,1.0,17747
4,2017-09,California,57839.0,0.0,37165.0,24581.0,2984.0,0,0.0,0.0,0.0,0.0,535.0,397.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,123506
5,2017-09,Colorado,4356.0,0.0,15751.0,13448.0,1007.0,1062,0.0,0.0,0.0,0.0,0.0,0.0,0.0,202.0,46.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,35873
6,2017-09,Connecticut,4343.0,673.0,4834.0,1993.0,274.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,12117
7,2017-09,Delaware,275.0,0.0,1414.0,1538.0,66.0,68,0.0,0.0,1.0,0.0,26.0,16.0,3.0,0.0,0.0,0.0,0.0,0.0,55.0,34.0,3.0,1.0,2.0,0.0,3502
8,2017-09,District of Columbia,1.0,0.0,56.0,4.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,61
9,2017-09,Florida,10784.0,0.0,39199.0,17949.0,2319.0,1721,1.0,18.0,7.0,0.0,3657.0,1416.0,6.0,264.0,28.0,0.0,0.0,0.0,11.0,9.0,0.0,0.0,1.0,0.0,77390


In [None]:
print(f"Shape of Permit Data: {permit_data.shape} ")


Shape of Permit Data: (12485, 27) 


In [None]:
permit_data.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
permit,12461.0,6413.629404,23752.338269,0.0,0.0,518.0,4272.0,522188.0
permit_recheck,1100.0,1165.956364,9224.200609,0.0,0.0,0.0,0.0,116681.0
handgun,12465.0,5940.881107,8618.58406,0.0,865.0,3059.0,7280.0,107224.0
long_gun,12466.0,7810.847585,9309.84614,0.0,2078.25,5122.0,10380.75,108058.0
other,5500.0,360.471636,1349.478273,0.0,17.0,121.0,354.0,77929.0
multiple,12485.0,268.603364,783.185073,0.0,15.0,125.0,301.0,38907.0
admin,12462.0,58.89809,604.814818,0.0,0.0,0.0,0.0,28083.0
prepawn_handgun,10542.0,4.828021,10.907756,0.0,0.0,0.0,5.0,164.0
prepawn_long_gun,10540.0,7.834156,16.468028,0.0,0.0,1.0,8.0,269.0
prepawn_other,5115.0,0.165591,1.057105,0.0,0.0,0.0,0.0,49.0


May want to change colums labels for U.S. census data and make it more concises  

In [None]:
permit_data

Unnamed: 0,month,state,permit,permit_recheck,handgun,long_gun,other,multiple,admin,prepawn_handgun,prepawn_long_gun,prepawn_other,redemption_handgun,redemption_long_gun,redemption_other,returned_handgun,returned_long_gun,returned_other,rentals_handgun,rentals_long_gun,private_sale_handgun,private_sale_long_gun,private_sale_other,return_to_seller_handgun,return_to_seller_long_gun,return_to_seller_other,totals
0,2017-09,Alabama,16717.0,0.0,5734.0,6320.0,221.0,317,0.0,15.0,21.0,2.0,1378.0,1262.0,1.0,0.0,0.0,0.0,0.0,0.0,9.0,16.0,3.0,0.0,0.0,3.0,32019
1,2017-09,Alaska,209.0,2.0,2320.0,2930.0,219.0,160,0.0,5.0,2.0,0.0,200.0,154.0,2.0,28.0,30.0,0.0,0.0,0.0,17.0,24.0,1.0,0.0,0.0,0.0,6303
2,2017-09,Arizona,5069.0,382.0,11063.0,7946.0,920.0,631,0.0,13.0,6.0,0.0,1474.0,748.0,3.0,82.0,5.0,0.0,0.0,0.0,38.0,12.0,2.0,0.0,0.0,0.0,28394
3,2017-09,Arkansas,2935.0,632.0,4347.0,6063.0,165.0,366,51.0,12.0,13.0,0.0,1296.0,1824.0,4.0,0.0,0.0,0.0,0.0,0.0,13.0,23.0,0.0,0.0,2.0,1.0,17747
4,2017-09,California,57839.0,0.0,37165.0,24581.0,2984.0,0,0.0,0.0,0.0,0.0,535.0,397.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,123506
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12480,1998-11,Virginia,0.0,,14.0,2.0,,8,0.0,,,,,,,,,,,,,,,,,,24
12481,1998-11,Washington,1.0,,65.0,286.0,,8,1.0,,,,,,,,,,,,,,,,,,361
12482,1998-11,West Virginia,3.0,,149.0,251.0,,5,0.0,,,,,,,,,,,,,,,,,,408
12483,1998-11,Wisconsin,0.0,,25.0,214.0,,2,0.0,,,,,,,,,,,,,,,,,,241


In [None]:
us_census_data.head()

Unnamed: 0,Fact,Fact Note,Alabama,Alaska,Arizona,Arkansas,California,Colorado,Connecticut,Delaware,Florida,Georgia,Hawaii,Idaho,Illinois,Indiana,Iowa,Kansas,Kentucky,Louisiana,Maine,Maryland,Massachusetts,Michigan,Minnesota,Mississippi,Missouri,Montana,Nebraska,Nevada,New Hampshire,New Jersey,New Mexico,New York,North Carolina,North Dakota,Ohio,Oklahoma,Oregon,Pennsylvania,Rhode Island,South Carolina,South Dakota,Tennessee,Texas,Utah,Vermont,Virginia,Washington,West Virginia,Wisconsin,Wyoming
0,"Population estimates, July 1, 2016, (V2016)",,4863300,741894,6931071,2988248,39250017,5540545,3576452,952065,20612439,10310371,1428557,1683140,12801539,6633053,3134693,2907289,4436974,4681666,1331479,6016447,6811779,9928300,5519952,2988726,6093000,1042520,1907116,2940058,1334795,8944469,2081015.0,19745289.0,10146788.0,757952.0,11614373.0,3923561.0,4093465.0,12784227.0,1056426.0,4961119.0,865454.0,6651194.0,27862596,3051217,624594,8411808,7288000,1831102,5778708,585501
1,"Population estimates base, April 1, 2010, (V2...",,4780131,710249,6392301,2916025,37254522,5029324,3574114,897936,18804592,9688680,1360301,1567650,12831574,6484136,3046869,2853129,4339344,4533479,1328364,5773786,6547813,9884129,5303924,2968103,5988928,989414,1826334,2700691,1316461,8791953,2059198.0,19378110.0,9535688.0,672591.0,11536727.0,3751615.0,3831072.0,12702857.0,1052940.0,4625410.0,814195.0,6346298.0,25146100,2763888,625741,8001041,6724545,1853011,5687289,563767
2,"Population, percent change - April 1, 2010 (es...",,1.70%,4.50%,8.40%,2.50%,5.40%,10.20%,0.10%,6.00%,9.60%,6.40%,5.00%,7.40%,-0.20%,2.30%,2.90%,1.90%,2.20%,3.30%,0.20%,4.20%,4.00%,0.40%,4.10%,0.70%,1.70%,5.40%,4.40%,8.90%,1.40%,1.70%,0.011,0.019,0.064,0.127,0.007,0.046,0.068,0.006,0.003,0.073,0.063,0.048,10.80%,10.40%,-0.20%,5.10%,8.40%,-1.20%,1.60%,3.90%
3,"Population, Census, April 1, 2010",,4779736,710231,6392017,2915918,37253956,5029196,3574097,897934,18801310,9687653,1360301,1567582,12830632,6483802,3046355,2853118,4339367,4533372,1328361,5773552,6547629,9883640,5303925,2967297,5988927,989415,1826341,2700551,1316470,8791894,2059179.0,19378102.0,9535483.0,672591.0,11536504.0,3751351.0,3831074.0,12702379.0,1052567.0,4625364.0,814180.0,6346105.0,25145561,2763885,625741,8001024,6724540,1852994,5686986,563626
4,"Persons under 5 years, percent, July 1, 2016, ...",,6.00%,7.30%,6.30%,6.40%,6.30%,6.10%,5.20%,5.80%,5.50%,6.40%,6.40%,6.80%,6.00%,6.40%,6.40%,6.70%,6.20%,6.60%,4.90%,6.10%,5.30%,5.80%,6.40%,6.30%,6.10%,6.00%,7.00%,6.30%,4.80%,5.80%,0.062,0.059,0.06,0.073,0.06,0.068,0.058,0.056,0.052,0.059,0.071,0.061,7.20%,8.30%,4.90%,6.10%,6.20%,5.50%,5.80%,6.50%


May want to change some of the int data types to floats

In [None]:
permit_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12485 entries, 0 to 12484
Data columns (total 27 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   month                      12485 non-null  object 
 1   state                      12485 non-null  object 
 2   permit                     12461 non-null  float64
 3   permit_recheck             1100 non-null   float64
 4   handgun                    12465 non-null  float64
 5   long_gun                   12466 non-null  float64
 6   other                      5500 non-null   float64
 7   multiple                   12485 non-null  int64  
 8   admin                      12462 non-null  float64
 9   prepawn_handgun            10542 non-null  float64
 10  prepawn_long_gun           10540 non-null  float64
 11  prepawn_other              5115 non-null   float64
 12  redemption_handgun         10545 non-null  float64
 13  redemption_long_gun        10544 non-null  flo

In [None]:
permit_data[permit_data.isnull()].count

<bound method DataFrame.count of       month state  ...  return_to_seller_other  totals
0       NaN   NaN  ...                     NaN     NaN
1       NaN   NaN  ...                     NaN     NaN
2       NaN   NaN  ...                     NaN     NaN
3       NaN   NaN  ...                     NaN     NaN
4       NaN   NaN  ...                     NaN     NaN
...     ...   ...  ...                     ...     ...
12480   NaN   NaN  ...                     NaN     NaN
12481   NaN   NaN  ...                     NaN     NaN
12482   NaN   NaN  ...                     NaN     NaN
12483   NaN   NaN  ...                     NaN     NaN
12484   NaN   NaN  ...                     NaN     NaN

[12485 rows x 27 columns]>

The us census dtypes are obejcts meaning strings which means they need to change from strings to floats

In [None]:
us_census_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 85 entries, 0 to 84
Data columns (total 52 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Fact            80 non-null     object
 1   Fact Note       28 non-null     object
 2   Alabama         65 non-null     object
 3   Alaska          65 non-null     object
 4   Arizona         65 non-null     object
 5   Arkansas        65 non-null     object
 6   California      65 non-null     object
 7   Colorado        65 non-null     object
 8   Connecticut     65 non-null     object
 9   Delaware        65 non-null     object
 10  Florida         65 non-null     object
 11  Georgia         65 non-null     object
 12  Hawaii          65 non-null     object
 13  Idaho           65 non-null     object
 14  Illinois        65 non-null     object
 15  Indiana         65 non-null     object
 16  Iowa            65 non-null     object
 17  Kansas          65 non-null     object
 18  Kentucky    

> **Tip**: You should _not_ perform too many operations in each cell. Create cells freely to explore your data. One option that you can take with this project is to do a lot of explorations in an initial notebook. These don't have to be organized, but make sure you use enough comments to understand the purpose of each code cell. Then, after you're done with your analysis, create a duplicate notebook where you will trim the excess and organize your steps so that you have a flowing, cohesive report.

> **Tip**: Make sure that you keep your reader informed on the steps that you are taking in your investigation. Follow every code cell, or every set of related code cells, with a markdown cell to describe to the reader what was found in the preceding cell(s). Try to make it so that the reader can then understand what they will be seeing in the following cell(s).

### Data Cleaning (Replace this with more specific notes!)

In [None]:
# After discussing the structure of the data and any problems that need to be
#   cleaned, perform those cleaning steps in the second part of this section.


In [None]:
print(f"Shape of data: {permit_data.shape}")
print(f"Number of missing values in the data:\
{permit_data.isnull().sum().sum()}")
print(f"Number of duplicated values: {permit_data.duplicated().sum()}")

Shape of data: (12485, 27)
Number of missing values in the data:154595
Number of duplicated values: 0


Remove missing values 

In [None]:
permit_data.dropna(inplace=True)

# Verify missing values 

print(f"Number of missing values in the data:\
{permit_data.isnull().sum().sum()}")

Number of missing values in the data:0


In [None]:
print(f"Shape of data: {us_census_data.shape}")
print(f"Number of missing values in the data:\
{us_census_data.isnull().sum().sum()}")
print(f"Number of duplicated values: {us_census_data.duplicated().sum()}")

Shape of data: (85, 52)
Number of missing values in the data:1062
Number of duplicated values: 3


In [None]:
permit_data.duplicated().sum()

0

In [None]:
us_census_data.isnull().sum()

Fact               5
Fact Note         57
Alabama           20
Alaska            20
Arizona           20
Arkansas          20
California        20
Colorado          20
Connecticut       20
Delaware          20
Florida           20
Georgia           20
Hawaii            20
Idaho             20
Illinois          20
Indiana           20
Iowa              20
Kansas            20
Kentucky          20
Louisiana         20
Maine             20
Maryland          20
Massachusetts     20
Michigan          20
Minnesota         20
Mississippi       20
Missouri          20
Montana           20
Nebraska          20
Nevada            20
New Hampshire     20
New Jersey        20
New Mexico        20
New York          20
North Carolina    20
North Dakota      20
Ohio              20
Oklahoma          20
Oregon            20
Pennsylvania      20
Rhode Island      20
South Carolina    20
South Dakota      20
Tennessee         20
Texas             20
Utah              20
Vermont           20
Virginia     

In [None]:
us_census_data.duplicated().sum()

3

<a id='eda'></a>
## Exploratory Data Analysis

> **Tip**: Now that you've trimmed and cleaned your data, you're ready to move on to exploration. Compute statistics and create visualizations with the goal of addressing the research questions that you posed in the Introduction section. It is recommended that you be systematic with your approach. Look at one variable at a time, and then follow it up by looking at relationships between variables.

### Research Question 1 (Replace this header name!)

In [None]:
# What census data ismost associated withhigh gun per capita?Which states have hadthe highest growth ingun registrations?What is the overalltrend of gunpurchases?


### Research Question 2  (Replace this header name!)
#What census data ismost associated withhigh gun per capita?Which states have hadthe highest growth ingun registrations?What is the overalltrend of gunpurchases?

In [None]:
# Continue to explore the data to address your additional research
#   questions. Add more headers as needed if you have more questions to
#   investigate.


<a id='conclusions'></a>
## Conclusions

> **Tip**: Finally, summarize your findings and the results that have been performed. Make sure that you are clear with regards to the limitations of your exploration. If you haven't done any statistical tests, do not imply any statistical conclusions. And make sure you avoid implying causation from correlation!

> **Tip**: Once you are satisfied with your work, you should save a copy of the report in HTML or PDF form via the **File** > **Download as** submenu. Before exporting your report, check over it to make sure that the flow of the report is complete. You should probably remove all of the "Tip" quotes like this one so that the presentation is as tidy as possible. Congratulations!