# Data Source

Data can be found in [Seattle Open Data](https://data.seattle.gov/Public-Safety/SPD-Crime-Data-2008-Present/tazs-3rd5/about_data). Data includes crime data from 2008 - present. The data is over 200 MB, too large to be included in the Github repository. Down below in the code section, "SPD_Crime_Data__2008-Present_20240212.csv" represents the data collected from Seattle Open Data.

In [1]:
# Libraries
import pandas as pd

In [2]:
# Inspecting Data
crime_df = pd.read_csv('SPD_Crime_Data__2008-Present_20240212.csv')
crime_df.columns

Index(['Report Number', 'Offense ID', 'Offense Start DateTime',
       'Offense End DateTime', 'Report DateTime', 'Group A B',
       'Crime Against Category', 'Offense Parent Group', 'Offense',
       'Offense Code', 'Precinct', 'Sector', 'Beat', 'MCPP',
       '100 Block Address', 'Longitude', 'Latitude'],
      dtype='object')

In [3]:
# Regex formuals
regx_time = r'\d\d:\d\d:\d\d (AM|PM)'
regx_year = r'([0-9]{4})'

# Extract the year from Report DateTime into a new column
crime_df['Date'] = crime_df['Report DateTime'].str.replace(regx_time, '', regex=True)
crime_df["Year"] = crime_df["Date"].str.extract(regx_year).astype(int)

# Filters Data to show crime data from 2022 and up
crime_df = crime_df[crime_df['Year'] >= 2022]


In [4]:
# Download new dataframe
crime_df.to_csv('crime_data_2022_up.csv', index=False)