# Crime Project 

This assignment focus on real data collected from data.police.uk on crimes reported across Northern Ireland in February 2023. This script also includes the use of Northern Ireland Wards and Counties. 

## Overview 

Give an overview of the script, briefly state the methods/objectives

Run the following cell by clicking the shift key + enter key, and this will install the inital modules needed for the first element of this script. 

In [1]:
import geopandas as gpd
import pandas as pd

The next step is to load in the crime data for the script. Again, to run each cell click shift key + enter key

In [2]:
crime = gpd.read_file("data_files/NI_crime_feb_23.csv")
crime.crs = 'epsg:4326'  # sets point crs for crime data
print(crime.head())  # displays first 5 rows of crime data 

  Crime ID    Month                         Reported by   
0           2023-02  Police Service of Northern Ireland  \
1           2023-02  Police Service of Northern Ireland   
2           2023-02  Police Service of Northern Ireland   
3           2023-02  Police Service of Northern Ireland   
4           2023-02  Police Service of Northern Ireland   

                         Falls within  Longitude   Latitude   
0  Police Service of Northern Ireland  -5.960566  54.630260  \
1  Police Service of Northern Ireland  -6.275733  54.864132   
2  Police Service of Northern Ireland  -6.355642  54.474319   
3  Police Service of Northern Ireland  -5.951435  54.589790   
4  Police Service of Northern Ireland  -5.881625  54.603207   

                      Location LSOA code LSOA name             Crime type   
0      On or near Meyrick Park                      Anti-social behaviour  \
1     On or near Bridge Street                      Anti-social behaviour   
2  On or near Silverwood Court     

From the above cell, you can see that there are 12264 rows, indicating that there were 12264 crimes reported across Northern Ireland in February 2023. You can also get the number of crimes reported by entering the following cell.

In [3]:
print(crime['Crime ID'].count())  # counts the number of crimes reported

12264


You can also look at individual or a specific set of rows within a GeoDataFrame by using an index, '.loc':

In [4]:
print(crime.loc[1])

Crime ID                                                   
Month                                               2023-02
Reported by              Police Service of Northern Ireland
Falls within             Police Service of Northern Ireland
Longitude                                         -6.275733
Latitude                                          54.864132
Location                           On or near Bridge Street
LSOA code                                                  
LSOA name                                                  
Crime type                            Anti-social behaviour
Last outcome category                                      
Context                                                    
geometry                                               None
Name: 1, dtype: object


You can also further examine the dataset with different statements to find out specific information. For example, the cell should return all crimes report that have a 'Crime type' of 'Anti-social behaviour. The number of rows should be 3452. 

Also, by adding in 'Crime type' at the end of our script line, this will only show the 'Crime type' column, instead of all 13. 

In [5]:
print(crime.loc[crime['Crime type'] == 'Anti-social behaviour', 'Crime type'])

0       Anti-social behaviour
1       Anti-social behaviour
2       Anti-social behaviour
3       Anti-social behaviour
4       Anti-social behaviour
                ...          
3447    Anti-social behaviour
3448    Anti-social behaviour
3449    Anti-social behaviour
3450    Anti-social behaviour
3451    Anti-social behaviour
Name: Crime type, Length: 3452, dtype: object


You can do this with each different crime type, but for now we will leave it here and move on to the next element of the script and turn this csv file of crime rates into a shapefile. 

Run the next cell to import the necessary shapefiles, load the data and print a subset of the data frame.

In [6]:
from shapely.geometry import Point

df = pd.read_csv('data_files/NI_crime_feb_23.csv')  # loads point data

print(df.head())  # prints initial subset of dataframe

  Crime ID    Month                         Reported by   
0      NaN  2023-02  Police Service of Northern Ireland  \
1      NaN  2023-02  Police Service of Northern Ireland   
2      NaN  2023-02  Police Service of Northern Ireland   
3      NaN  2023-02  Police Service of Northern Ireland   
4      NaN  2023-02  Police Service of Northern Ireland   

                         Falls within  Longitude   Latitude   
0  Police Service of Northern Ireland  -5.960566  54.630260  \
1  Police Service of Northern Ireland  -6.275733  54.864132   
2  Police Service of Northern Ireland  -6.355642  54.474319   
3  Police Service of Northern Ireland  -5.951435  54.589790   
4  Police Service of Northern Ireland  -5.881625  54.603207   

                      Location  LSOA code  LSOA name             Crime type   
0      On or near Meyrick Park        NaN        NaN  Anti-social behaviour  \
1     On or near Bridge Street        NaN        NaN  Anti-social behaviour   
2  On or near Silverwood Cour

As you can see from the above cell, the dataframe has more columns than we need. We can tidy this up by running the next cell, which will drop specific columns. 

In [7]:
df = df.drop(columns=['Crime ID', 'Falls within', 'LSOA code', 'LSOA name', 'Last outcome category', 'Context'])

print(df.head())

     Month                         Reported by  Longitude   Latitude   
0  2023-02  Police Service of Northern Ireland  -5.960566  54.630260  \
1  2023-02  Police Service of Northern Ireland  -6.275733  54.864132   
2  2023-02  Police Service of Northern Ireland  -6.355642  54.474319   
3  2023-02  Police Service of Northern Ireland  -5.951435  54.589790   
4  2023-02  Police Service of Northern Ireland  -5.881625  54.603207   

                      Location             Crime type  
0      On or near Meyrick Park  Anti-social behaviour  
1     On or near Bridge Street  Anti-social behaviour  
2  On or near Silverwood Court  Anti-social behaviour  
3      On or near Empire Drive  Anti-social behaviour  
4      On or near Devon Parade  Anti-social behaviour  


Now the dataframe consists only of the month the crime was reported, who it was reported by, the latitude and longitude points of the data, the location of the crime and the crime type. 

From this dataframe we will now define the geometry to begin the process of changing the csv file into a shapefile. 

In [8]:
# add a geometry column from the longitude and latitude coordinates for each crime reported

df['geometry'] = list(zip(df['Longitude'], df['Latitude']))
df['geometry'] = df['geometry'].apply(Point)
print(df)

         Month                         Reported by  Longitude   Latitude   
0      2023-02  Police Service of Northern Ireland  -5.960566  54.630260  \
1      2023-02  Police Service of Northern Ireland  -6.275733  54.864132   
2      2023-02  Police Service of Northern Ireland  -6.355642  54.474319   
3      2023-02  Police Service of Northern Ireland  -5.951435  54.589790   
4      2023-02  Police Service of Northern Ireland  -5.881625  54.603207   
...        ...                                 ...        ...        ...   
12259  2023-02  Police Service of Northern Ireland  -5.840630  54.853524   
12260  2023-02  Police Service of Northern Ireland  -5.980109  54.619357   
12261  2023-02  Police Service of Northern Ireland  -5.925203  54.602102   
12262  2023-02  Police Service of Northern Ireland  -5.813526  54.860263   
12263  2023-02  Police Service of Northern Ireland  -5.934706  54.652491   

                          Location             Crime type   
0          On or near Meyr

Again, you can tidy up the dataframe by removing the columns we no longer need, i.e., the latitude and longitude columns because they are now in the new geometry column. Run the next cell to do this. 

In [9]:
df = df.drop(columns=['Longitude', 'Latitude'])

print(df)

         Month                         Reported by   
0      2023-02  Police Service of Northern Ireland  \
1      2023-02  Police Service of Northern Ireland   
2      2023-02  Police Service of Northern Ireland   
3      2023-02  Police Service of Northern Ireland   
4      2023-02  Police Service of Northern Ireland   
...        ...                                 ...   
12259  2023-02  Police Service of Northern Ireland   
12260  2023-02  Police Service of Northern Ireland   
12261  2023-02  Police Service of Northern Ireland   
12262  2023-02  Police Service of Northern Ireland   
12263  2023-02  Police Service of Northern Ireland   

                          Location             Crime type   
0          On or near Meyrick Park  Anti-social behaviour  \
1         On or near Bridge Street  Anti-social behaviour   
2      On or near Silverwood Court  Anti-social behaviour   
3          On or near Empire Drive  Anti-social behaviour   
4          On or near Devon Parade  Anti-socia

Now, a new GeoDataFrame can be created from the Dataframe, using the EPSG code which represents WGS84 Lat/Lon. 

In [10]:
gdf = gpd.GeoDataFrame(df)
gdf.set_crs("EPSG:4326", inplace=True)  # sets the coordinates reference system
print(gdf)

         Month                         Reported by   
0      2023-02  Police Service of Northern Ireland  \
1      2023-02  Police Service of Northern Ireland   
2      2023-02  Police Service of Northern Ireland   
3      2023-02  Police Service of Northern Ireland   
4      2023-02  Police Service of Northern Ireland   
...        ...                                 ...   
12259  2023-02  Police Service of Northern Ireland   
12260  2023-02  Police Service of Northern Ireland   
12261  2023-02  Police Service of Northern Ireland   
12262  2023-02  Police Service of Northern Ireland   
12263  2023-02  Police Service of Northern Ireland   

                          Location             Crime type   
0          On or near Meyrick Park  Anti-social behaviour  \
1         On or near Bridge Street  Anti-social behaviour   
2      On or near Silverwood Court  Anti-social behaviour   
3          On or near Empire Drive  Anti-social behaviour   
4          On or near Devon Parade  Anti-socia

Now it is time to save the GeoDataFrame as a shapefile, which you can then load onto a GIS software such as ArcGIS and analyse the data from there.

In [11]:
gdf.to_file('data_files/NIcrimefeb.shp')

  gdf.to_file('data_files/NIcrimefeb.shp')


Your shapefile should look similar to this once loaded into a GIS software. 

PICTURE.

Now that we have our crime data has been changed to a shapefile, we can now begin to look at the Wards data for Northern Ireland and merge the two together. The first step is to load the wards data. Do this by running the cell below.

In [12]:
wards = gpd.read_file('data_files/NI_wards.shp')  # load wards shapefile
wards.crs = 'epsg:4326'  # set wards crs
print(wards.head())  # display subset of wards data
print(wards['Ward'].count())  # counts the number of wards in the dataset

  Ward Code          Ward  Population   
0    95DD05     Ballykeel        1739  \
1    95DD06  Ballyloughan        2588   
2    95DD03      Ardeevin        3503   
3    95DD04        Ballee        1926   
4    95DD09  Craigywarren        2590   

                                            geometry  
0  POLYGON ((-6.25014 54.85879, -6.25015 54.85875...  
1  POLYGON ((-6.28913 54.86946, -6.28917 54.86949...  
2  POLYGON ((-6.29527 54.84752, -6.29527 54.84751...  
3  POLYGON ((-6.26087 54.84528, -6.26096 54.84524...  
4  POLYGON ((-6.24807 54.89652, -6.24819 54.89645...  
582
