# Project Proposal
---

## 1.0 Introduction

The true cost of crime encompasses a wide range of factors that extend beyond the immediate impact of the crime itself.  Gabor (2014) identifies four broad categories that contribute to the aggregate cost of crime:
1. Victim costs - direct economic losses to crime victims, including property loss and damage, loss wages, and medical costs related to injuries;
2. Criminal justice costs - expenditures for law enforcement, the courts, and correctional facilities, programs, and services;
3. Opportunity costs - costs incurred when an individual chooses to participate in illegal activities as opposed to the legitimate marketplace;
4. Intangible costs - costs related to pain and suffering of crime victims and a diminished quality of life.

While the true cost of crime is difficult to quantify, and Canadian data does not permit an annual assessment of the cost at this time, a 2014 study published by the Fraser Institute (Easton et al. 2014) estimates that Canadians spend over $85 billion annually being victimized by, catching, and punishing crime. Undoubtedly crime, and the associated costs, pose a significant burden to society.  More accurate information on crime could guide our legal, political, and cultural stance toward crime and allow informed prioritization and development of programs that curtail criminal activity (Anderson, 1999).

The Vancouver Police Department (VPD) crime data set, publicaly available from the [VPD open data portal](https://geodash.vpd.ca/opendata/), contains 10 variables related to 854,615 reported crime incidents in the City of Vancouver between 2003 and 2022, as follows:

|Variable | Description |
|---------| ----------- |
| TYPE    | Type of crime activity (11 unique types)             |
| YEAR    | A four-digit field that indicates the year when the reported crime activity occurred |
| MONTH   | A numeric field that indicates the month when the reported crime activity occurred |
| DAY     | A two-digit field that indicates the day of the month when the reported crime activity occurred |
| HOUR    | A two-digit field that indicates the hour time (in 24 hours format) when the reported crime activity occurred |
| MINUTE  | A two-digit field that indicates the minute when the reported crime activity occurred |
| HUNDRED_BLOCK | Generalized location of the reported crime activity |
| NEIGHBOURHOOD | Vancouver neighbourhood location of reported crime activity (24 unique neighbourhoods) |
| X             | X-coordinate location of reported crime activity (UTM Zone 10) |
| Y             | Y-coordinate location of reported crime activity (UTM Zone 10) |

We consider the VPD crime data set to represent a sample of all crime incidents that occurred in Vancouver between 2003 and 2022.  This project aims to answer a question of the form: can we use the VPD data set to determine whether certain Vancouver neighbourhoods experience higher annual incidents of crime relative to others?  

To focus our analysis, we will constrain the crime type to *theft from vehicle* considering that this represents the highest proportion (27.9%) of reported crime incidents in the data set. Furthermore, a reaonable hypthoesis may be that annual incidents of theft from vehicles are higher in poorer neighbourhoods. 2016 median household income data for the Strathcona and Kitsilano neighbourhoods, available on the [Canada Mortgage and Housing Corporation (CMHC) website](https://www03.cmhc-schl.gc.ca/hmip-pimh/en/TableMapChart/TableMatchingCriteria?GeographyType=MetropolitanMajorArea&GeographyId=2410&CategoryLevel1=Population%2C%20Households%20and%20Housing%20Stock&CategoryLevel2=Household%20Income&ColumnField=HouseholdIncomeRange&RowField=Neighbourhood&SearchTags%5B0%5D.Key=Households&SearchTags%5B0%5D.Value=Number&SearchTags%5B1%5D.Key=Statistics&SearchTags%5B1%5D.Value=AverageAndMedian), is reported to be approximatley \\$23,000 and \\$70,000, respecively.  We therefore refine our objective to answer the specific question: are annual incidents of theft from vehicle greater in the Strathcona than in Kitsilano?






        

In [1]:
# load libraries and set seed value
library(tidyverse)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.2 ──
[32m✔[39m [34mggplot2[39m 3.3.6      [32m✔[39m [34mpurrr  [39m 0.3.4 
[32m✔[39m [34mtibble [39m 3.1.8      [32m✔[39m [34mdplyr  [39m 1.0.10
[32m✔[39m [34mtidyr  [39m 1.2.1      [32m✔[39m [34mstringr[39m 1.4.1 
[32m✔[39m [34mreadr  [39m 2.1.2      [32m✔[39m [34mforcats[39m 0.5.2 
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()


In [2]:
# read data set from GitHub
crime_data_raw <- read_csv("https://github.com/jburden1/STAT201_Project_Group14/raw/main/vpd_crime_data/crimedata_csv_AllNeighbourhoods_AllYears.csv")
head(crime_data_raw)

[1mRows: [22m[34m854615[39m [1mColumns: [22m[34m10[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (3): TYPE, HUNDRED_BLOCK, NEIGHBOURHOOD
[32mdbl[39m (7): YEAR, MONTH, DAY, HOUR, MINUTE, X, Y

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


TYPE,YEAR,MONTH,DAY,HOUR,MINUTE,HUNDRED_BLOCK,NEIGHBOURHOOD,X,Y
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>
Theft from Vehicle,2018,3,22,9,0,19XX TRIUMPH ST,Grandview-Woodland,495329.0,5459026
Theft from Vehicle,2004,4,6,7,0,19XX TRIUMPH ST,Grandview-Woodland,495341.2,5459026
Theft from Vehicle,2003,2,24,0,0,19XX TRIUMPH ST,Grandview-Woodland,495354.4,5459026
Theft from Vehicle,2019,8,19,16,0,19XX TRIUMPH ST,Grandview-Woodland,495354.4,5459026
Theft from Vehicle,2018,11,8,1,0,19XX TRIUMPH ST,Grandview-Woodland,495356.6,5459026
Theft from Vehicle,2005,10,15,12,0,19XX TRIUMPH ST,Grandview-Woodland,495357.0,5459017


In [18]:
nrow(crime_data_raw)
unique(crime_data_raw$TYPE)
unique(crime_data_raw$NEIGHBOURHOOD)
summary <- crime_data_raw |> 
    group_by(TYPE) |> 
    summarize(n = n(), prop = n / nrow(crime_data_raw)) |>
    arrange(desc(n))

theft_from_auto <- crime_data_raw |> 
    select(TYPE, YEAR, NEIGHBOURHOOD) |> 
    filter(TYPE == "Theft from Vehicle", NEIGHBOURHOOD == "Strathcona" | NEIGHBOURHOOD == "Kitsilano", YEAR != 2023) |>
    group_by(YEAR, NEIGHBOURHOOD) |>
    summarize(n = n()) |>
    arrange(desc(NEIGHBOURHOOD))

theft_from_auto

[1m[22m`summarise()` has grouped output by 'YEAR'. You can override using the
`.groups` argument.


YEAR,NEIGHBOURHOOD,n
<dbl>,<chr>,<int>
2003,Strathcona,1155
2004,Strathcona,1066
2005,Strathcona,977
2006,Strathcona,697
2007,Strathcona,627
2008,Strathcona,522
2009,Strathcona,501
2010,Strathcona,407
2011,Strathcona,393
2012,Strathcona,378


## 2.0 Preliminary Results

## 3.0 Methods: Plan

## 4.0 References
Anderson, D. A. (1999). The Aggregate Burden of Crime. The Journal of Law & Economics, 42(2), 611–642.

Easton, S., Brantingham, P., Furness, H. (2014). The Cost of Crime in Canada. Canada: Fraser Institute.

Gabor, T. (2015). Costs of crime and criminal justice responses. Ottawa, ON: Public Safety Canada. 


