# Exploratory Data Analysis: New York City Shooting Incidences 2006-2021.


##  Introduction
* There has been a rise in hate crimes and shooting incidences in the United States.It has been an issue that has become a national debate when it comes to gun control laws.Therefore, an understanding of criminal activity based on the statistical description,summary and insights gained from available data like *New York City Shooting Incidences dataset* will be helpful in formulating the best police enforcement and intervention strategies. 

* This trend means that the streets are becoming more violent and thus unsafe, and therefore needs an urgent solution.

From this data set,[Click here to get the data set.](https://www.kaggle.com/thedataperson/nypd-shooting-incident-data-20062021/download);

* We shall draw insights from shooting (location,date, time) and the shooter/victim (age, gender, race).
* We shall narrow down  on the location  with  the  most incidences and draw insights.
* Expect some investigative questions about our data set.
> Narrowing down on areas with the most incidences is essential because if law enforcers  and policy makers are able to  find ways of curtailing incidences in these areas, then some of those strategies can be used to either bring down or avoid  incidences in other areas. 
>> It is for this reason that the analysis will narrow down to places with most occurrences.

* Looking at locations with the lest occurrences might be equally important since there are things they might have done to keep the numbers low.

*Crime statistics and analysis* are important because of various reasons.

* They help criminal justice systems when it comes to **predictive policing**.
* In age of community  policing such statistics help **improve  relationship with the community**. 
* They are helpful when it comes to **law initiatives** with an aim to decrease crime.

[Click here for more details on why crime statistics are important.](https://www.waldenu.edu/online-bachelors-programs/bs-in-criminal-justice/resource/why-national-crime-statistics-are-important)


### Outline
* **We will clean the data:** Here we  shall find missing values and Fill Missing Values
* **Do Transformation of the Data:** Here  we  shall find and remove the duplicates
* **Do Exploratory Data Analysis:** Here we shall engage in some statistical description,summary and draw some insights.
We shall also  be asking and answering some investigative questions.
* Visualize the data using Matplotlib,Seabon and plotly.

* Make conclusions and recommendations.


In [2]:
#Import Libraries

import pandas as pd
import numpy as np
import plotly.express as px
import matplotlib.pyplot as plt
import matplotlib
import seaborn as sns

%matplotlib inline


# Set Style

sns.set_style("darkgrid") # Set the parameters that control the general style of the plots.


# Customize Matplotlib

matplotlib.rcParams["font.size"] = 14
matplotlib.rcParams["figure.figsize"] = (10,5)
matplotlib.rcParams["figure.facecolor"] = "#00000000"

#### How to use Kaggle API to download Kaggle data sets

1. Install the `opendatasets` library
2. Use the `opendatasets.download` helper function.
3. Get Kaggle Credentials.

          * Download `Kaggle.json` file.

          * Enter your user name and Kaggle API or store the `Kaggle.json` file in the same directory               

             with the Jupyter notebook. 

4. Query the directory where the dataset has been downloaded to using the OS Module. 

          * The module comes as a Python standard utility Module.


In [3]:
#Install Library for downloading Kaggle Datasets

#! pip install opendatasets --upgrade --quiet 



'''A Python library for downloading datasets from Kaggle, 
Google Drive, and other online sources
'''


import opendatasets as od 

# Download Dataset from Kaggle.

download_url = 'https://www.kaggle.com/thedataperson/nypd-shooting-incident-data-20062021/download'

od.download(download_url) #If you had the dataset the output will be "Skipping, found downloaded files"

Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds
Your Kaggle username:Your Kaggle Key:Downloading nypd-shooting-incident-data-20062021.zip to .\nypd-shooting-incident-data-20062021


100%|██████████| 831k/831k [00:01<00:00, 429kB/s]





In [4]:
# Import the OS module

'''
It helps with querying the directory where the dataset has been downloaded to;

 - That is done by interacting with the underlying operating system 

 - With the module one can;

 * create and remove a folder(directory). 
 * Fetch the contents of a directory.
 * Change and identifying the current directory among other operations.
 
 '''
 d
import os

data_dir = './nypd-shooting-incident-data-20062021'

os.listdir(data_dir)

['NYPD Shooting Incident - Data 2006-2021.csv']

In [5]:
#Confirm if your os uses  the / or \ 

data_filename = data_dir + '/NYPD Shooting Incident - Data 2006-2021.csv'

In [6]:
#Open the dataset

nyc_df = pd.read_csv(data_filename)

In [7]:
#NewYork shooting incidences dataframe

nyc_df

Unnamed: 0,INCIDENT_KEY,OCCUR_DATE,OCCUR_TIME,BORO,PRECINCT,JURISDICTION_CODE,LOCATION_DESC,STATISTICAL_MURDER_FLAG,PERP_AGE_GROUP,PERP_SEX,PERP_RACE,VIC_AGE_GROUP,VIC_SEX,VIC_RACE,X_COORD_CD,Y_COORD_CD,Latitude,Longitude,Lon_Lat
0,226323781,3/30/2021,23:45:00,QUEENS,100,0.0,MULTI DWELL - PUBLIC HOUS,False,,,,25-44,M,BLACK,1036867,153432,40.587664,-73.810560,POINT (-73.81055977899997 40.587663570000075)
1,226323779,3/30/2021,16:20:00,BROOKLYN,73,2.0,MULTI DWELL - PUBLIC HOUS,False,25-44,M,BLACK,25-44,M,BLACK,1009548,187629,40.681647,-73.908790,POINT (-73.90879049699998 40.68164709200005)
2,226323782,3/30/2021,23:15:00,BRONX,42,2.0,MULTI DWELL - PUBLIC HOUS,False,,,,18-24,M,BLACK,1012074,240410,40.826510,-73.899465,POINT (-73.89946470899997 40.82650984800006)
3,226321042,3/30/2021,13:35:00,MANHATTAN,7,0.0,,False,18-24,M,BLACK,18-24,M,BLACK,987721,202253,40.721822,-73.987479,POINT (-73.98747935099993 40.72182201900005)
4,226320600,3/30/2021,22:23:00,BRONX,45,0.0,,True,,,,18-24,M,BLACK HISPANIC,1032091,241976,40.830722,-73.827126,POINT (-73.82712605899997 40.83072232800004)
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23860,9953252,1/1/2006,2:22:00,MANHATTAN,28,0.0,NONE,True,25-44,M,BLACK,25-44,M,BLACK,998816,233545,40.807699,-73.947385,POINT (-73.94738485799998 40.807699163000045)
23861,139716503,1/1/2006,12:30:00,BROOKLYN,77,0.0,PVT HOUSE,True,,,,25-44,M,BLACK,996442,184160,40.672153,-73.956050,POINT (-73.95604992899997 40.67215322100003)
23862,9953246,1/1/2006,5:51:00,BRONX,44,0.0,NONE,False,25-44,M,WHITE HISPANIC,18-24,M,WHITE HISPANIC,1007418,243859,40.835990,-73.916276,POINT (-73.91627635999998 40.83598980000005)
23863,9953247,1/1/2006,3:30:00,BROOKLYN,67,0.0,,False,UNKNOWN,U,UNKNOWN,18-24,M,BLACK,999316,176460,40.651014,-73.945707,POINT (-73.94570651699998 40.651013998000046)
