Skip to content

gis-yang/Crime-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ST-Cokriging ArcGIS extension for crime prediction

Citation

Yang, B., Liu, L., Lan, M., Wang, Z., Zhou, H., Yu, H., (2020). A spatio-temporal method for crime prediction using historical crime data and transitional zones identified from nightlight imagery. International Journal of Geographical Information Science, 1-25. DOI: 10.1080/13658816.2020.1737701

Introduction

Crime Prediction

Goal: The goal of this project is to use machine learning and statistical methods to predict crime occurrence, crime hotspots, and crime type.

Data

Data sources include:

  1. Crime data: This dataset contains crime incidents, including the type of crime, location, and time of the incident. Publicly available crime data can be found on city or police department websites, such as the San Francisco Police Department.

  2. Auxiliary data: This dataset includes demographic, economic, and other relevant factors that may contribute to crime occurrence. Auxiliary data can be obtained from sources like the United States Census Bureau.

Method

  1. Data preprocessing: Clean and preprocess the data to remove irrelevant information and transform the data into a suitable format for analysis.

  2. Feature engineering: Generate new features that may help improve the predictive power of the model. Examples include time-based features (e.g., hour of the day, day of the week) and spatial features (e.g., distance to the nearest police station).

  3. Model training: Train various machine learning models (e.g., logistic regression, random forest, deep learning) and select the best performing model based on evaluation metrics such as accuracy, precision, recall, and F1 score.

  4. Model evaluation: Assess the performance of the selected model using a test dataset to ensure its generalizability to new data.

  5. Prediction: Use the trained model to predict crime occurrence, hotspots, and crime type based on the input features.

Applications

The results of this project can be used by law enforcement agencies to allocate resources more effectively, identify potential crime hotspots, and develop crime prevention strategies.

Accurate crime prediction can help allocate police resources for crime reduction and prevention. There are two popular approaches to predict criminal activities: one is based on historical crime, and the other is based on environmental variables correlated with criminal patterns. Previous research on geo-statistical modeling mainly considered one type of data in space-time domain, and few sought to blend multi-source data. In this research, we proposed a spatio-temporal Cokriging algorithm to integrate historical crime data and urban transitional zones for more accurate crime prediction. Time-series historical crime data were used as the primary variable, while urban transitional zones identified from the VIIRS nightlight imagery were used as the secondary co-variable.

Figure 1. Data processing flowchart of ST-Cokriging method

2. ST-Cokriging work flow

In ST-cokriging formulation, we assume that the primary variable of interest is coarse spatial resolution Images that are sampled at a high temporal frequency (high temporal resolution), and the secondary variable (co-variable) are fine spatial resolution Images that are sparsely sampled over time (low temporal resolution), as shown in Figure 1. Without loss of generality, we only consider the case with only one co-variable observed at multiple time points in the mathematical formulation of ST-Cokriging method. The extension to two or more co-variables observed at multiple time points is straightforward.

3. Extension implementation In the Cokriging linear system, the covariance matrix C is of very large size. Since the co-variable is observed at high spatial resolution, it is very likely that covariance matrix are very large. Similarly, even with smaller dimension for the primary variable observed at coarse spatial resolution, adding temporal dimension can still be large, since the primary variable is observed at high temporal frequency. Therefore, the matrix C can be of high dimension. Solving such a high-dimensional linear system can be computationally infeasible. One popular method to alleviate this difficulty is to force small numbers in the matrix (vector) to be zero, known as thresholding or tapering. Meanwhile, we take advantage of the feature of regularly gridded data in the application presented in Section 3, which facilitates efficient parallel computing of the Cokriging predictor and variance.

Source and Environment

  1. Source code download Download the toolbox (yangtoolbox_crime.tbx) for ArcGIS from this repository folder (Toolbox and Codes). Download python scripts for ST-Cokriging from this repository folder (Toolbox and Codes)

    • FittingVariog_crime.py to be linked with toolbox FittingVariogram
    • ImageFusion_Speedy_crime.py to be linked with toolbox STCoKriging_crime
    • SemiVariog.py to be linked with toolbox Variogram
  2. Environmental setup

    • Enable extensions in ArcMap
    • Setup the Geo-processing option
    • Open Arctoolbox window, right click and add a toolbox
    • Navigate to the toolbox just downloaded and select
    • Unfold the toolbox and the scripts should be appeared in the toolbox
    • Right click the ST-Cokriging script and select properties
    • Click source tab and link the ST-Cokriging script to the toolbox
    • Do the same procedure to the semi-variogram script and FittingVariog_crime script link the downloaded code to the arctoolbox script.

Prediction via toolbox

  • Step I, using the spatio-temporal data to calculate spatial and temporal semi-variogram
  • Estimate the spatio-temporal semi-variograms
  • Input the parameters as shown in the figure below. All 13 quad-week images should be included in the calculation and arranged in chronological order.
  • The input spatial raster should be one of the quad-week period with largest number of crime.
  • The spatial sample ratio is the percentage of the subset of spatial samples, depends on the resolution and total pixels of the study ration, the subset population should be around 3,000 – 10, 000.
  • At last, select the path to store the txt files of spatial and temporal semi-variogram.

  • Step II Fitting spatial and temporal semi-variogram
  • the output are txt files of spatial and temporal semi-variogram
  • input the spatial and temporal semi-variogram files to be fitting to function
  • Choose the fitting function based on the shape of the semi-varigram. In the example below, the spatial semivariogram is fit for Gaussian function, while the temporal semi-variogram is fit for Exponential fucntion
  • The fitted output are two txt files, using parameters of nuggest, sill, and range to depict the spatial and temporal dependance

  • The spatial and temporal semivariogram will be converted to covariance/correlation, then combined to spatio-temporal covariance fucntion

Step III, predict using ST-Cokriging

  • In put the secondary covariable image, and time-series primary variable, the primary variable should be input in the time-series order.
  • The number of time-series should be no less than 3 for using of spatio-temporal prediction.
  • Input the fitted fucntion of spatial and temporal semi-vatiogram from Step II, and assign other parameters.

Troubleshooting

  1. Both time-series primary variable and co-variable should be re-project to same coordinates system as well as same datum.
  2. Co-variable should be at finer spatial resolution or same. In this version, the coverage of the both co-variable and time-series primary variable should be same, and at same spatial resolution.
  3. If using 32-bit OS software computer, the maximum number of row and column are 6000 by 6000.
  4. The fusion process may take a while (about 1 hour depends on the CPU and RAM of the computer). Once it started running, please don’t use other program at same time. Otherwise window 10 might report ArcGIS no response. When no response appeared, please choose to wait rather than kill the process.
  5. The final product will have 1-pixel-width edge at constant value (or very smooth). It is normal because current version did not model the edge effects, and leave the edge for showing the trend value.

About

Initiate the cirme prediction algorithm

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages