<a href="https://colab.research.google.com/github/redrum88/data_science/blob/main/uk_crimes_2023_01.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ⚠️ Crimes in UK 2023 January ⚠️

## About Dataset

### General Information

* **Title:** ASB Incidents, Crime and Outcomes
* **Theme:** Crime and Criminal Justice
* **Description:** Individual crime and anti-social behaviour (ASB) incidents, including street-level location information and subsequent police and court outcomes associated with the crime.
* **Keywords:** police, courts, crime, anti-social behaviour

* **Geographic Coverage:** England, Wales, Northern Ireland
Publisher: Single Online Home National Digital Team

> licence: Open Government Licence v3.0

> Language: en-GB


### CSV Columns
The columns in the CSV files are as follows:

* `Crime ID`
* `Month` (when the crime was committed)
who reported the crime
* `Falls within`( the force that provided the data about the crime. This is currently being looked into and is likely to change in the near future)
* `Longitude` and `Latitude`
* `Crime Type`
* `Last outcome` (A reference to whichever of the outcomes associated with the crime occurred most recently)
* `Context` (additional human-readable data about individual crimes)

## Check the data

In [1]:
# Download data
!wget https://github.com/redrum88/data_science/raw/main/data/uk_crimes_2023_01.zip

--2023-04-05 00:12:08--  https://github.com/redrum88/data_science/raw/main/data/uk_crimes_2023_01.zip
Resolving github.com (github.com)... 140.82.121.3
Connecting to github.com (github.com)|140.82.121.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/redrum88/data_science/main/data/uk_crimes_2023_01.zip [following]
--2023-04-05 00:12:08--  https://raw.githubusercontent.com/redrum88/data_science/main/data/uk_crimes_2023_01.zip
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20696787 (20M) [application/zip]
Saving to: ‘uk_crimes_2023_01.zip.1’


2023-04-05 00:12:09 (214 MB/s) - ‘uk_crimes_2023_01.zip.1’ saved [20696787/20696787]



In [2]:
# Import tools
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import zipfile

In [3]:
## Unzip downloaded dataset
# Set zip file path
zip_path = "uk_crimes_2023_01.zip"

# Create a new folder "dataset"
os.makedirs("dataset", exist_ok=True)

# Extract contents to "dataset" folder
with zipfile.ZipFile(zip_path, "r") as zip_ref:
    zip_ref.extractall("dataset")

In [4]:
os.chdir("/content/dataset")
df = pd.DataFrame()

# create a list of DataFrames
df_list = []
for file in os.listdir():
    if file.endswith('.csv'):
        df_temp = pd.read_csv(file)
        df_list.append(df_temp)

# concatenate DataFrames
df = pd.concat(df_list, ignore_index=True)

df.head()

Unnamed: 0,Crime ID,Month,Reported by,Falls within,Longitude,Latitude,Location,LSOA code,LSOA name,Crime type,Last outcome category,Context
0,efa8939fa30f266f2a79c6c3c9778c13670aeedbb0190e...,2023-01,West Midlands Police,West Midlands Police,-1.851067,52.593204,On or near Longdon Drive,E01009417,Birmingham 001A,Burglary,Under investigation,
1,5f1bd5581cec329ec985a779b0e59567ef17496a5f28e4...,2023-01,West Midlands Police,West Midlands Police,-1.847123,52.593864,On or near Bramble Way,E01009417,Birmingham 001A,Burglary,Investigation complete; no suspect identified,
2,246cca11540fe6cf584ce6e2bac43a93c2788aae9f017a...,2023-01,West Midlands Police,West Midlands Police,-1.851067,52.593204,On or near Longdon Drive,E01009417,Birmingham 001A,Burglary,Under investigation,
3,173d517361cf6d4586e81e0e096193ae6bccd783d0fa04...,2023-01,West Midlands Police,West Midlands Police,-1.847123,52.593864,On or near Bramble Way,E01009417,Birmingham 001A,Criminal damage and arson,Investigation complete; no suspect identified,
4,932d4d4d6896e51746781de92b3102cb2018c55b78fcfc...,2023-01,West Midlands Police,West Midlands Police,-1.847123,52.593864,On or near Bramble Way,E01009417,Birmingham 001A,Criminal damage and arson,Investigation complete; no suspect identified,


## Data Cleaning

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 478347 entries, 0 to 478346
Data columns (total 12 columns):
 #   Column                 Non-Null Count   Dtype  
---  ------                 --------------   -----  
 0   Crime ID               413159 non-null  object 
 1   Month                  478347 non-null  object 
 2   Reported by            478347 non-null  object 
 3   Falls within           478347 non-null  object 
 4   Longitude              470741 non-null  float64
 5   Latitude               470741 non-null  float64
 6   Location               478347 non-null  object 
 7   LSOA code              458368 non-null  object 
 8   LSOA name              458368 non-null  object 
 9   Crime type             478347 non-null  object 
 10  Last outcome category  404512 non-null  object 
 11  Context                0 non-null       float64
dtypes: float64(3), object(9)
memory usage: 43.8+ MB


In [6]:
df.columns

Index(['Crime ID', 'Month', 'Reported by', 'Falls within', 'Longitude',
       'Latitude', 'Location', 'LSOA code', 'LSOA name', 'Crime type',
       'Last outcome category', 'Context'],
      dtype='object')

In [7]:
df.isnull().sum()

Crime ID                  65188
Month                         0
Reported by                   0
Falls within                  0
Longitude                  7606
Latitude                   7606
Location                      0
LSOA code                 19979
LSOA name                 19979
Crime type                    0
Last outcome category     73835
Context                  478347
dtype: int64

In [8]:
df = df.drop(labels=["Crime ID", "Month"], axis=1)

In [9]:
df["Reported by"].value_counts()


Metropolitan Police Service           87211
West Midlands Police                  30937
West Yorkshire Police                 27442
Thames Valley Police                  16893
South Yorkshire Police                14770
Hampshire Constabulary                14578
Kent Police                           14573
Northumbria Police                    14513
Merseyside Police                     14147
Essex Police                          13935
Lancashire Constabulary               13923
Avon and Somerset Constabulary        12566
Police Service of Northern Ireland    11970
Sussex Police                         11899
South Wales Police                    10187
West Mercia Police                     9510
Staffordshire Police                   9008
Nottinghamshire Police                 8914
Derbyshire Constabulary                8884
Leicestershire Police                  8676
Humberside Police                      8583
Hertfordshire Constabulary             8168
Cleveland Police                

In [10]:
df[df['Reported by'] != df['Falls within']]

Unnamed: 0,Reported by,Falls within,Longitude,Latitude,Location,LSOA code,LSOA name,Crime type,Last outcome category,Context


In [11]:
df = df.drop(labels=['LSOA code', 'Context','Falls within'], axis=1)
df.head()

Unnamed: 0,Reported by,Longitude,Latitude,Location,LSOA name,Crime type,Last outcome category
0,West Midlands Police,-1.851067,52.593204,On or near Longdon Drive,Birmingham 001A,Burglary,Under investigation
1,West Midlands Police,-1.847123,52.593864,On or near Bramble Way,Birmingham 001A,Burglary,Investigation complete; no suspect identified
2,West Midlands Police,-1.851067,52.593204,On or near Longdon Drive,Birmingham 001A,Burglary,Under investigation
3,West Midlands Police,-1.847123,52.593864,On or near Bramble Way,Birmingham 001A,Criminal damage and arson,Investigation complete; no suspect identified
4,West Midlands Police,-1.847123,52.593864,On or near Bramble Way,Birmingham 001A,Criminal damage and arson,Investigation complete; no suspect identified


In [12]:
df = df.dropna()

In [13]:
len(df)

397205

In [14]:
df.isnull().sum()

Reported by              0
Longitude                0
Latitude                 0
Location                 0
LSOA name                0
Crime type               0
Last outcome category    0
dtype: int64

In [15]:
df.head()

Unnamed: 0,Reported by,Longitude,Latitude,Location,LSOA name,Crime type,Last outcome category
0,West Midlands Police,-1.851067,52.593204,On or near Longdon Drive,Birmingham 001A,Burglary,Under investigation
1,West Midlands Police,-1.847123,52.593864,On or near Bramble Way,Birmingham 001A,Burglary,Investigation complete; no suspect identified
2,West Midlands Police,-1.851067,52.593204,On or near Longdon Drive,Birmingham 001A,Burglary,Under investigation
3,West Midlands Police,-1.847123,52.593864,On or near Bramble Way,Birmingham 001A,Criminal damage and arson,Investigation complete; no suspect identified
4,West Midlands Police,-1.847123,52.593864,On or near Bramble Way,Birmingham 001A,Criminal damage and arson,Investigation complete; no suspect identified


In [16]:
df["Crime type"].unique()

array(['Burglary', 'Criminal damage and arson', 'Vehicle crime',
       'Violence and sexual offences', 'Public order', 'Bicycle theft',
       'Other theft', 'Shoplifting', 'Other crime',
       'Theft from the person', 'Drugs', 'Possession of weapons',
       'Robbery'], dtype=object)

In [17]:
len(df["LSOA name"].unique())

31387

In [18]:
len(df["Last outcome category"].unique())

13

In [19]:
df["Last outcome category"].unique()

array(['Under investigation',
       'Investigation complete; no suspect identified',
       'Unable to prosecute suspect', 'Local resolution',
       'Action to be taken by another organisation',
       'Awaiting court outcome', 'Offender given a caution',
       'Further investigation is not in the public interest',
       'Formal action is not in the public interest',
       'Further action is not in the public interest',
       'Offender given penalty notice',
       'Suspect charged as part of another case'], dtype=object)