# <center>**Predicting Crime: Uncovering Crime Patterns with Data Science**</center>
### <center>***By Jarred De Azevedo, Misha Khan, Kenneth Swindell, Chethan Agarwal, David Bond, Carson Hagman***</center>

#### Contributions: 

A: Header with contributions and Introduction - Jarred

B: Dataset Curation and Preprocessing - Misha

C: Data Exploration and Summary Statistics - Kenneth

D: ML Algorithm Design/Development, Training and Test Data Analysis - Chethan

E: Visualization - David 

F: Result Analysis and Conclusion - Carson

### <center>Introduction</center>

Understanding crime patterns is a crucial step toward building safer, more equitable communities. In this tutorial, we explore how data science can be used to analyze crime and policing data to uncover meaningful trends and inform public policy. Our project centers around two key questions: "Can historical crime trends help predict future criminal activity?" and "Are certain geographic areas more prone to specific types of crime?".

These questions are important because by answering them, we can contribute to a data-driven approach to public safety - one that goes beyond anecdotal evidence and offers insights grounded in empirical analysis; predictive models can help law enforcement allocate resources more efficiently, while geographic patterns of crime can inform urban planning and community intervention programs. At the same time, critically analyzing policing data allows us to consider potential biases and systemic issues in law enforcement practices.

Through the full data science pipeline: acquiring and curating data, parsing the data into a queryable format, exploratory data analysis, hypothesis testing and machine learning, and explaining the results with words and visualizations, we will guide readers step-by-step through how we approached this real-world problem. Our goal is to not only generate insights from the data but also provide a clear, reproducible framework for others to do the same.

### <center>Data Curation and Preprocessing</center>

The dataset we chose is NYPD Shooting Incidident Data (Historic) from the City of New York, published on Data.gov. This dataset contains detailed information on past shooting incidents. Each incidident contains information about the date, time, New York borough, location, and several details about the perpetrator and the victim. In order to prepare the data for analysis, a pandas dataframe has been created below.

Publisher data.cityofnewyork.us. (2025, April 19). City of New York - NYPD shooting incident data (historic). Catalog. https://catalog.data.gov/dataset/nypd-shooting-incident-data-historic 

In [5]:
import pandas as pd
import scipy.stats as stats

df = pd.read_csv('NYPD_Shooting_Incident_Data__Historic_.csv')
df.head()

# Convert date columns to datetime format
df['OCCUR_DATE'] = pd.to_datetime(df['OCCUR_DATE'], errors='coerce')

df

Unnamed: 0,INCIDENT_KEY,OCCUR_DATE,OCCUR_TIME,BORO,LOC_OF_OCCUR_DESC,PRECINCT,JURISDICTION_CODE,LOC_CLASSFCTN_DESC,LOCATION_DESC,STATISTICAL_MURDER_FLAG,...,PERP_SEX,PERP_RACE,VIC_AGE_GROUP,VIC_SEX,VIC_RACE,X_COORD_CD,Y_COORD_CD,Latitude,Longitude,Lon_Lat
0,231974218,2021-08-09,01:06:00,BRONX,,40,0.0,,,False,...,,,18-24,M,BLACK,1.006343e+06,234270.000000,40.809673,-73.920193,POINT (-73.92019278899994 40.80967347200004)
1,177934247,2018-04-07,19:48:00,BROOKLYN,,79,0.0,,,True,...,M,WHITE HISPANIC,25-44,M,BLACK,1.000083e+06,189064.671875,40.685610,-73.942913,POINT (-73.94291302299996 40.685609672000055)
2,255028563,2022-12-02,22:57:00,BRONX,OUTSIDE,47,0.0,STREET,GROCERY/BODEGA,False,...,(null),(null),25-44,M,BLACK,1.020691e+06,257125.000000,40.872349,-73.868233,POINT (-73.868233 40.872349)
3,25384540,2006-11-19,01:50:00,BROOKLYN,,66,0.0,,PVT HOUSE,True,...,U,UNKNOWN,18-24,M,BLACK,9.851073e+05,173349.796875,40.642490,-73.996912,POINT (-73.99691224999998 40.642489932000046)
4,72616285,2010-05-09,01:58:00,BRONX,,46,0.0,,MULTI DWELL - APT BUILD,True,...,M,BLACK,<18,F,BLACK,1.009854e+06,247502.562500,40.845984,-73.907461,POINT (-73.90746098599993 40.84598358900007)
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
28557,82565818,2012-01-10,16:52:00,MANHATTAN,,23,2.0,,MULTI DWELL - PUBLIC HOUS,False,...,M,BLACK HISPANIC,25-44,M,WHITE HISPANIC,1.000102e+06,229680.187500,40.797089,-73.942750,POINT (-73.94275038599994 40.79708909900006)
28558,52550581,2008-10-27,19:00:00,BROOKLYN,,83,0.0,,MULTI DWELL - APT BUILD,False,...,M,WHITE HISPANIC,25-44,M,WHITE HISPANIC,1.004686e+06,193261.375000,40.697119,-73.926302,POINT (-73.92630225199997 40.697119222000026)
28559,23354135,2006-07-10,19:47:00,BROOKLYN,,60,0.0,,,False,...,M,BLACK,<18,M,BLACK,9.841473e+05,150277.703125,40.579162,-74.000371,POINT (-74.00037110599999 40.57916181000007)
28560,59753078,2009-03-20,20:02:00,BROOKLYN,,72,0.0,,MULTI DWELL - APT BUILD,False,...,M,WHITE HISPANIC,18-24,M,WHITE HISPANIC,9.809204e+05,174343.578125,40.645217,-74.012000,POINT (-74.01199971799997 40.645217064000065)


### <center>Data Exploration and Summary Statistics</center>

In [None]:
## Insert Code Here

### <center>ML Algorithm Design/Development, Training and Test Data Analysis</center>

In [None]:
## Insert Code Here

### <center>Visualization</center>

In [None]:
## Insert Code Here

### <center>Result Analysis and Conclusion</center>

Insert Text Here