# STAT 201 Group Project: Assessing the Effectiveness of VPD Bike Theft Prevention Programs

Angela Felicia, Christin Wang, Linda Chu, Yifan Hao

### Background Information
Bike theft is a major issue in Vancouver. According to the Vancouver Police Department (VPD), over 2000 bikes are reported stolen in Vancouver every year (https://vpd.ca/crime-prevention-safety/bike-theft-protection/). In fact, Vancouver currently has the highest rate of bike theft per capita of all cities in Canada (https://www.cbc.ca/news/canada/british-columbia/vancouver-still-has-the-most-bike-thefts-per-capita-among-major-canadian-cities-despite-efforts-1.5898575).

In 2015, the VPD partnered with Project 529, an online database for bicycles in hopes of reducing bike theft in the city. Cyclists can register their bike on the database and display a Project 529 decal (otherwise known as a "shield") on their bike in order to deter thieves, as well as to make tracking stolen bikes easier.

The VPD stated that the introduction of the Project 529 program was effective in reducing bike theft in Vancouver. According to this article (https://vancouversun.com/news/local-news/the-state-of-bike-thefts-in-vancouver), since 2015, bike theft has dropped by over 50%.

### Our Question
We want to investigate whether or not the VPD's claims are supported by statistical inference. In particular, we want to compare the proportion of all crimes that are bike theft from a year before the introduction of Project 529 (2012) to the proportion of all crimes that are bike theft from a year after the introduction of Project 529 (2022). 

We also want to develop a confidence interval to state how confident we are in the actual proportion of bike theft decrease over the years after the introduction of Project 592.

We also are focusing on the Hastings-Sunrise neighbourhood since this is a neighbourhood with a high rate of bike theft. (need citation)

### Our Dataset
The dataset we will use to investigate our question is the VPD Crime dataset. This dataset is extracted from the PRIME BC Police Records Management System (RMS) and contains information about the type of crime, date/time of crime, as well as the location of the crime. The data ranges from 2003 to 2023 and covers all neighbourhoods in Vancouver.

## Preliminary Results: Loading the dataset 


In [1]:
library(dplyr)
library(ggplot2)
library(readr)
library(tidyr)


Attaching package: 'dplyr'


The following objects are masked from 'package:stats':

    filter, lag


The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union




In [2]:
cbd_2012 <- read_csv("crimedata_csv_Central Business District_2012.csv")
cbd_2022 <- read_csv("crimedata_csv_Central Business District_2022.csv")

head(cbd_2012)
head(cbd_2022)

ERROR: Error: 'crimedata_csv_Central Business District_2012.csv' does not exist in current working directory ('/Users/angelafelicia/Downloads/STAT-201-Group-Project').


In [36]:
bike_theft_2012 <- cbd_2012 %>%
    summarize(bike_theft = sum(TYPE == "Theft of Bicycle"),
              prop_bike_theft = bike_theft / n()) %>%
    select(prop_bike_theft) %>%
    as.numeric()

bike_theft_2022 <- cbd_2022 %>%
    summarize(bike_theft = sum(TYPE == "Theft of Bicycle"),
              prop_bike_theft = bike_theft / n()) %>%
    select(prop_bike_theft) %>%
    as.numeric()

head(bike_theft_2012)
head(bike_theft_2022)

## Methods

What do you expect to find?
We expect to find the effectiveness of the Project 529 program and its impact on bike theft in the Central Business District neighbourhood. 

### Step 1: Sample distribution
Make the sample distribution histogram for the 2012, 2022 datasets. Layout all the number of counts for each type of crime.

### Step 2: Calculate original datasets’ statistic
Calculate the bike theft proportion for the original 2012, 2022 datasets.

### Step 3: Perform hypothesis test
Set a seed at first. Using the infer package, make a null distribution. Generate 1000 samples by permute from the original data, and calculate the bike theft proportion for each one. Use the hypothesis test “diff of props” to determine whether the bike theft proportion has gone down comparing the data from 2012 and the data from 2022. Let 2022 be p1 and 2012 be p2.
H0 = P1 - P2 = 0, Ha =  P1 - P2 < 0 (left-tail test)
Get the p value for the null model created.
Visualise the null distribution and shade the left side p value.

### Step 1: Interpret results
Set a significance level to determine the outcoming of the research. Use different significant levels and compare the results. 
If the p-value is less than or equal to the significance level, reject H0 (potential Type I error). This would suggest that there is a significant difference in the proportions of bike thefts in the two years, which means that Project 529 is efficient in reducing bike theft in the Central Business District.
If the p-value is greater than the significance level, do not reject H0 (potential Type II error). This indicates that there is not enough evidence to conclude that the proportions are different, and that Project 529 is not efficient in reducing bike theft in the Central Business District.


## Discussion

### What impact could such findings have?
*have to remember it is not a direct correlation, other factors in play as well

Based on our findings, we would know whether the project was effective in reducing bike theft or not. If the program is found to be effective, it may justify the allocation of current and potentially additional resources and funding towards the program, as well as towards establishing similar programs in other neighbourhoods where bike theft is prominent. If the program is found to have no significant impact, the findings may indicate that a better strategy can be implemented to reduce bike theft. Besides, policies could be adjusted by the government to combat bike theft. Moreover, it could raise public awareness towards bike theft. This would call for higher participation in the program, which would decrease bike theft further. The decrease in crime rates would enhance the overall quality of life for residents, and make our society more civilised.

### What future questions could this lead to?
Since we only looked into the Central Business District, the final inference results wouldn't be powerful enough. Therefore, we could further examine if the project worked out in other regions in Vancouver? 
Further research could examine the cost-effectiveness of Project 529. Is the reduction in bike theft worth the investment in the program? Are there more cost-efficient alternatives? If the project is successful, how are some ways to advocate it to other cities and neighbourhoods?
Could we conduct different hypothesis tests to infer other types of crimes in the criminal data and improve policing methods? Besides, if Project 529 is found to be successful, it may be useful to evaluate other crime prevention programs.


## References 

Dataset source: https://geodash.vpd.ca/opendata 

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6594600/
This article investigates the spatial resolution for crime patterns using the same database. One of the four crimes they investigated was theft of bikes (TOB) because it is voluminous (at least 2000 events per year, not sure if they only meant Vancouver). One thing they remarked on was that bike theft happens mostly around transport hubs or places of employment where they’re more available and accessible. Also, the density of crime happened more often in Downtown Vancouver according to their plot using 2015-2016 data. (So, Hastings-Sunrise has bike theft, but not as much as Downtown)
