Skip to content

This repository is dedicated to raising awareness about possible crimes in Derbyshire through the critical analysis of the different crimes

Notifications You must be signed in to change notification settings

siraug/Derbyshire-Crime-Analysis-With-R

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 

Repository files navigation

Derbyshire-Crime-Analysis-using-R

Introduction

This project aims to analyze the crime data set from different regions in Derbyshire. The objectives of this project are to:

  • Gain a better understanding of the data set through descriptive visualizations
  • Evaluate the relationship between variables in the data set using linear regression.

Dataset Description

The provided dataset includes data on the number of criminal incidents reported in various Lower Layer Super Output Areas (LSOAs) of Derbyshire. The dataset contains 642 observations and 18 variables.

EDA

Figure 1: Total Population by City

Total Population by City

Figure 2: Total Crime by City

Total Crime by City

Figure 2: Map Plot of Population Density

Map Plot of Population Density

Linear Regression

Summary Statistics - Normality Distribution

Skewness is a measure used to fact-check how the data deviates from a normal distribution (Gawali, 2021). The tail of the distributions as shown in the Density plot below are longer on the positive side, indicating that the crime variables are positively skewed.

Figure 1: Density Plot of the Crime Data

Density Plot

This phenomenon of the data led to the decision to log transform the data. Log transformation is a method used to reduce the skewness of the data. It helps to achieve a much closer bell curve as shown below (Htoon, 2020):

Figure 2: Density Plot of the Log Transformed Crime Data

Log Transformed Density Plot

The log transformation has reduced the skewness of the variable distribution. The means are lower than their medians, indicating negative skewness. The upper half of the data is more dispersed than the lower half.

Model Results

The positive slope coefficient (β1) for each crime type shows a correlation between population and crimes. The intercept for each model is negative, indicating that crime rates remain lower than the area's normal rate even with zero population. The models fit the data well with R-squared values ranging from 0.5972 to 0.8772, accounting for 59.72% to 87.72% of the variation in dependent variables.

Linearity:

Linear regression assumes a linear relationship between the dependent and independent variables. A linear regression model may produce unreliable results if the relationship is nonlinear. For the models, the assumption of linearity holds.

Figure 3: Linearity Plot of the Models

Linearity Plot

Correlation:

The residuals are lowly correlated especially at the bottom left of the plot while the top right has a stronger correlation between variables. Anti-social behaviour and Public Order correlate at 0.8. This is the same with the pairs of Anti-social behaviour and Violent Crimes, Anti-social behaviour, and Criminal Damage Arson

Figure 4: Correlation Heatmap of Residuals

Correlation Heatmap of Residuals

Hierarchical Clustering:

A dendrogram is a diagram that shows the outcomes of hierarchical clustering. It helps sort related objects into clusters based on their similarity. The relationships between the clusters are shown graphically in the dendrogram. In Figure 5, the data is in 3 clusters coloured red, green, and blue.

Figure 5: Cluster Dendogram

Cluster Dendogram

About

This repository is dedicated to raising awareness about possible crimes in Derbyshire through the critical analysis of the different crimes

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages