# Crime Project 

This assignment focus on real data collected from data.police.uk on crimes reported across Northern Ireland in February 2023. This script also includes the use of Northern Ireland Wards and Counties. 

## Overview 

Give an overview of the script, briefly state the methods/objectives

Run the following cell by clicking the shift key + enter key, and this will install the inital modules needed for the first element of this script. 

In [None]:
import geopandas as gpd
import pandas as pd

The next step is to load in the crime data for the script. Again, to run each cell click shift key + enter key

In [None]:
crime = gpd.read_file("data_files/NI_crime_feb_23.csv")
crime.crs = 'epsg:4326'  # sets point crs for crime data
print(crime.head())  # displays first 5 rows of crime data 

From the above cell, you can see that there are 12264 rows, indicating that there were 12264 crimes reported across Northern Ireland in February 2023. You can also get the number of crimes reported by entering the following cell.

In [None]:
print(crime['Crime ID'].count())  # counts the number of crimes reported

You can also look at individual or a specific set of rows within a GeoDataFrame by using an index, '.loc':

In [None]:
print(crime.loc[1])

You can also further examine the dataset with different statements to find out specific information. For example, the cell should return all crimes report that have a 'Crime type' of 'Anti-social behaviour. The number of rows should be 3452. 

Also, by adding in 'Crime type' at the end of our script line, this will only show the 'Crime type' column, instead of all 13. 

In [None]:
print(crime.loc[crime['Crime type'] == 'Anti-social behaviour', 'Crime type'])

You can do this with each different crime type, but for now we will leave it here and move on to the next element of the script and turn this csv file of crime rates into a shapefile. 

Run the next cell to import the necessary shapefiles, load the data and print a subset of the data frame.

In [None]:
from shapely.geometry import Point

df = pd.read_csv('data_files/NI_crime_feb_23.csv')  # loads point data

print(df.head())  # prints initial subset of dataframe

As you can see from the above cell, the dataframe has more columns than we need. We can tidy this up by running the next cell, which will drop specific columns. 

In [None]:
df = df.drop(columns=['Crime ID', 'Falls within', 'LSOA code', 'LSOA name', 'Last outcome category', 'Context'])

print(df.head())

Now the dataframe consists only of the month the crime was reported, who it was reported by, the latitude and longitude points of the data, the location of the crime and the crime type. 

From this dataframe we will now define the geometry to begin the process of changing the csv file into a shapefile. 

In [None]:
# add a geometry column from the longitude and latitude coordinates for each crime reported

df['geometry'] = list(zip(df['Longitude'], df['Latitude']))
df['geometry'] = df['geometry'].apply(Point)
print(df)

Again, you can tidy up the dataframe by removing the columns we no longer need, i.e., the latitude and longitude columns because they are now in the new geometry column. Run the next cell to do this. 

In [None]:
df = df.drop(columns=['Longitude', 'Latitude'])

print(df)

Now, a new GeoDataFrame can be created from the Dataframe, using the EPSG code which represents WGS84 Lat/Lon. 

In [None]:
gdf = gpd.GeoDataFrame(df)
gdf.set_crs("EPSG:4326", inplace=True)  # sets the coordinates reference system
print(gdf)

Now it is time to save the GeoDataFrame as a shapefile, which you can then load onto a GIS software such as ArcGIS and analyse the data from there.

In [None]:
gdf.to_file('data_files/NIcrimefeb.shp')

Your shapefile should look similar to this once loaded into a GIS software. 

PICTURE.

Now that we have our crime data has been changed to a shapefile, we can now begin to look at the Wards data for Northern Ireland and merge the two together. The first step is to load the wards data. Do this by running the cell below.

In [None]:
wards = gpd.read_file('data_files/NI_wards.shp')  # load wards shapefile
wards.crs = 'epsg:4326'  # set wards crs
print(wards.head())  # display subset of wards data
print(wards['Ward'].count())  # counts the number of wards in the dataset