    INTRODUCTION

This Exploratory Data Analysis (EDA) delves into crime data in Vancouver spanning from 2003 to 2023. Understanding crime patterns is crucial for informing public safety measures, shaping law enforcement strategies, and contributing to effective urban planning

    Importing Libraries

In [465]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import calendar
import warnings

warnings.filterwarnings("ignore", category=FutureWarning)


     Data Processing and Transformation

In [466]:
df = pd.read_csv("/Users/princegill/Documents/VSCode/Practice3/Data/data.csv")

In [467]:
df.head()

Unnamed: 0,TYPE,YEAR,MONTH,DAY,HOUR,MINUTE,HUNDRED_BLOCK,NEIGHBOURHOOD,X,Y
0,Break and Enter Commercial,2012,12,14,8,52,,Oakridge,491285.0,5453433.0
1,Break and Enter Commercial,2019,3,7,2,6,10XX SITKA SQ,Fairview,490612.9648,5457110.0
2,Break and Enter Commercial,2019,8,27,4,12,10XX ALBERNI ST,West End,491004.8164,5459177.0
3,Break and Enter Commercial,2021,4,26,4,44,10XX ALBERNI ST,West End,491007.7798,5459174.0
4,Break and Enter Commercial,2014,8,8,5,13,10XX ALBERNI ST,West End,491015.9434,5459166.0


In [468]:
df.shape

(881242, 10)

     Data Duplication

In [469]:
df.duplicated().sum()


32211

* Total of 32211 records are duplicate. Utilizing pandas functionality, we identified and quantified these redundancies.

In [470]:
df1 = df.drop_duplicates()

* With a clear understanding of the extent of duplicacy, the decision was made to eliminate redundant entries from the dataset.

     Missing Data

In [471]:
df1.isnull().sum()

TYPE               0
YEAR               0
MONTH              0
DAY                0
HOUR               0
MINUTE             0
HUNDRED_BLOCK     12
NEIGHBOURHOOD    142
X                 75
Y                 75
dtype: int64

* Given that the number of rows with null values is relatively small compared to the overall dataset, 
opting for removal is a viable consideration.

In [472]:
data = df1.dropna()

In [473]:
data.head()

Unnamed: 0,TYPE,YEAR,MONTH,DAY,HOUR,MINUTE,HUNDRED_BLOCK,NEIGHBOURHOOD,X,Y
1,Break and Enter Commercial,2019,3,7,2,6,10XX SITKA SQ,Fairview,490612.9648,5457110.0
2,Break and Enter Commercial,2019,8,27,4,12,10XX ALBERNI ST,West End,491004.8164,5459177.0
3,Break and Enter Commercial,2021,4,26,4,44,10XX ALBERNI ST,West End,491007.7798,5459174.0
4,Break and Enter Commercial,2014,8,8,5,13,10XX ALBERNI ST,West End,491015.9434,5459166.0
5,Break and Enter Commercial,2020,7,28,19,12,10XX ALBERNI ST,West End,491015.9434,5459166.0


*  'data' is the dataframe we will use further to anlayze and visualize or to make pivot table 

In [474]:
data.TYPE.unique()

array(['Break and Enter Commercial', 'Break and Enter Residential/Other',
       'Homicide', 'Mischief', 'Offence Against a Person', 'Other Theft',
       'Theft from Vehicle', 'Theft of Bicycle', 'Theft of Vehicle',
       'Vehicle Collision or Pedestrian Struck (with Fatality)',
       'Vehicle Collision or Pedestrian Struck (with Injury)'],
      dtype=object)


     Exploratory Data Analysis

        Crime Count Visualization on Yearly Basis

In [475]:

yearly_data = data.groupby(["YEAR"]).size().reset_index(name = "count")
fig = px.line(yearly_data, x = "YEAR" , y = "count")
fig.update_layout(
    title="Yearly Crime Count",
    xaxis_title="Year",
    yaxis_title="Crime Count",
    font=dict(family="Arial", size=12, color="black"),
)

fig.show()

The temporal analysis of crime count in Vancouver from 2003 to 2023 exposes intriguing patterns that reflect the dynamic nature of criminal activities within the city. Two distinctive phases emerge, shaping the trajectory of crime over the years.

1. Initial High Count and Subsequent Decline (2003-2011):

      Observation:-  The inception of the analyzed period, around 2003, witnessed a substantial surge in crime count, gradually reaching its zenith. However, this trend takes a significant turn around 2011, with a consistent and substantial decline in reported criminal incidents.

      Interpretation:-  This initial surge and subsequent decline could be attributed to a variety of factors, including changes in law enforcement strategies, community-based initiatives, or economic and demographic shifts. Understanding the specific catalysts behind this decline offers valuable insights into effective crime mitigation.

2. Resurgence Post-2011, Culminating in Peaks and Valleys:

     Observation:-  The latter part of the analyzed period, from 2011 onwards, unfolds a contrasting narrative. A resurgence in crime count is noticeable, with fluctuations marked by notable peaks, most prominently observed in 2019, followed by a sudden and pronounced dip in 2021.
   
     Interpretation:- The post-2011 resurgence suggests evolving challenges or changes in contributing factors that may have impacted crime rates. The sharp peaks and subsequent troughs in 2019 and 2021 warrant an in-depth examination of the specific events or policy interventions that could have influenced these abrupt shifts.

     Crime Count Visualization on Monthly Basis

In [476]:
monthly_data = data.groupby(["MONTH"]).size().reset_index(name = "count")
monthly_data['MONTH'] = monthly_data['MONTH'].apply(lambda x: calendar.month_abbr[x])
fig = px.line(monthly_data, x = "MONTH" , y = "count")

fig.update_layout(
    title="Monthly Crime Count",
    xaxis_title="Month",
    yaxis_title="Crime Count",
    font=dict(family="Arial", size=12, color="black"),
)

fig.show()

Monthly Crime Count Dynamics: A Quick Overview

1. February Dip:

   Observation: A notable decrease in crime count is consistently observed in February.

   Interpretation: Possible reasons could include seasonal factors, reduced outdoor activities, or specific law enforcement initiatives during this month.

2. August Global Maxima and Local Maxima in March, May, and October:

   Observation: August consistently experiences the highest crime count globally, with localized peaks in March, May, and October.

   Interpretation: Seasonal influences, community events, or socioeconomic factors may contribute to increased criminal activities during these months.
     
3. December Decline and January Resurgence:

   Observation: Crime count sharply declines in December, followed by a local maxima in January.
   
   Interpretation: Holiday-related factors and law enforcement dynamics during the festive season could lead to the December decline, while the January surge may result from post-holiday activities.


    Crime Count Visualization on Daily Basis

In [477]:
daily_data = data.groupby(["DAY"]).size().reset_index(name = "count")
fig = px.line(daily_data, x = "DAY" , y = "count")

fig.update_layout(
    title="Daily Crime Count",
    xaxis_title="Day of the Month",
    yaxis_title="Crime Count",
    font=dict(family="Arial", size=12, color="black"),
)

fig.show()

The analysis of crime distribution on a day-to-day basis in Vancouver from 2003 to 2023 reveals a nuanced pattern with subtle variations. While no significant daily pattern is evident, a distinct dip at the end of the month and a surge at the beginning stand out.

1. Lack of Clear Daily Pattern:

    Observation: The distribution of crime incidents throughout the week does not exhibit a discernible daily pattern.
    
    Interpretation: The absence of a clear daily trend suggests that criminal activities are not consistently influenced by specific weekdays. The dynamics may be more influenced by other temporal factors or situational contexts.
2. End-of-Month Dip and Starting-of-Month Surge:
    
    Observation: A notable dip in crime incidents is observed towards the end of each month, contrasted by a surge at the beginning of the month.
    
    Interpretation: This cyclic pattern may be attributed to various factors, including financial cycles, law enforcement presence, or shifts in community dynamics associated with monthly cycles.

    Crime Count Visualization over Time of the Day

In [478]:
daily_data = data.groupby(["HOUR"]).size().reset_index(name = "count")
fig = px.line(daily_data, x = "HOUR" , y = "count")
fig.update_xaxes(ticktext=["12 AM", "1 AM", "2 AM", "3 AM", "4 AM", "5 AM", "6 AM", "7 AM", "8 AM", "9 AM", "10 AM", "11 AM",
                           "12 PM", "1 PM", "2 PM", "3 PM", "4 PM", "5 PM", "6 PM", "7 PM", "8 PM", "9 PM", "10 PM", "11 PM"],
                 tickvals=list(range(24)))

fig.update_layout(
    title="Hourly Crime Count",
    yaxis_title="Crime Count",
    xaxis_title="Hour of the Day",
    font=dict(family="Arial", size=12, color="black"),
)


fig.show()

The hourly analysis of crime count in Vancouver from 2003 to 2023 exposes a distinctive daily rhythm, with notable patterns observed throughout the 24-hour cycle.

1. Midnight Peak at 12 AM:

     Observation: A significant peak in crime count is consistently observed around midnight, specifically at 12 AM.

    Interpretation: The surge in criminal activities during this hour may be influenced by factors such as reduced visibility, lower law enforcement presence, or heightened opportunities for illicit activities under the cover of darkness.

2. Quiescent Period from 1 AM to 6 AM:
   
   Observation: Between 1 AM and 6 AM, a relative quiescence in crime incidents is observed, indicating a lull during the early morning hours.
   
   Interpretation: The reduced activity during these hours may be attributed to factors such as lower foot traffic, or societal norms that discourage criminal activities during the late-night and early-morning period.

3. Slight Uptick as Day Begins:
     
     Observation: Crime count experiences a slight elevation as the day begins, persisting at a relatively consistent level throughout the day.
    
    Interpretation: The gradual increase in criminal activities as the day unfolds could be influenced by a variety of factors, including increased opportunities, higher population density, or changes in routine activities that may present more favorable conditions for certain crimes.


       Visualization of  Distribution of Crime Types

In [479]:
value_count = data["TYPE"].value_counts()
fig = px.bar(value_count , x = value_count.index , y = value_count.values,)

fig.update_layout(
    title="Distribution of Crime Types",
    xaxis_title="Crime Type",
    yaxis_title="Count",
    font=dict(family="Arial", size=12, color="black"),
)


fig.show()

The analysis of crime types in Vancouver from 2003 to 2023 highlights a spectrum of severity and prevalence, showcasing distinctive patterns in criminal activities.

1. Highly Prevalent Crimes: Theft from Vehicle and Other Theft:

   Observation: Theft from Vehicle and Other Theft emerge as the most frequently committed crimes, displaying consistently high occurrence.

   Interpretation: The prevalence of these crimes may be attributed to various factors such as opportunistic behavior, economic motives, or insufficient security measures. Addressing these crimes requires a multifaceted approach involving community engagement and targeted preventive measures.

2. Low Incidence of Homicide and Vehicle Collisions with Fatality:

   Observation: Homicide and Vehicle Collisions with Fatality stand out as the least committed crimes, exhibiting minimal occurrence.

   Interpretation: The infrequency of these severe crimes may reflect the relative rarity of incidents or the effectiveness of law enforcement measures and safety regulations. Strategies for maintaining this low incidence may involve sustained efforts in crime prevention, traffic safety, and community education.

     Crime Count Visualization by Neighbourhood

In [480]:
value_count = data["NEIGHBOURHOOD"].value_counts()
fig = px.bar(value_count , x = value_count.index , y = value_count.values,)

fig.update_layout(
    title="Crime Count by Neighbourhood",
    xaxis_title="Neighbourhood",
    yaxis_title="Crime Count",
    font=dict(family="Arial", size=12, color="black"),
)

fig.show()

The neighborhood-wise analysis of crime distribution in Vancouver from 2003 to 2023 reveals distinct patterns, with certain areas emerging as hotspots for criminal activities while others maintain a comparatively safer environment.

1. High Crime Incidence in Central Business District and West End:

     Observation: Central Business District and West End consistently exhibit the highest crime rates, indicating these areas as hotspots for criminal activities.

     Interpretation: Factors contributing to elevated crime rates in these central neighborhoods may include higher population density, increased commercial activities, and the presence of various amenities attracting both residents and non-residents.
     
2. Low Crime Incidence in Stanley Park and Musqueam:
   
     Observation: Stanley Park and Musqueam stand out as neighborhoods with the lowest crime incidence, suggesting a relatively safer environment.

     Interpretation: The lower crime rates in these areas may be attributed to factors such as lower population density, well-maintained public spaces, or specific community-oriented initiatives contributing to a secure environment.

     Distribution of crime type over the years

In [481]:
data_1 = data.groupby(["TYPE", "YEAR"]).size().reset_index(name = "count")

In [482]:
fig = px.line(data_1 , x = "YEAR" , y = "count" , color = "TYPE")
fig.update_layout(width=1000, height=500)
fig.show()

The temporal exploration of crime categories in Vancouver from 2003 to 2023 illuminates distinct trends in the prevalence of theft-related crimes, with notable fluctuations in Theft from Vehicle and Other Theft categories.

1. Consistent Dominance of Theft from Vehicle:

     Observation: Theft from Vehicle consistently maintains its position as the most prevalent crime category over the years.

     Interpretation: The persistent high occurrence of theft from vehicles suggests enduring challenges in addressing this specific type of crime, necessitating targeted preventive measures and community engagement initiatives.

2. Exceptional Period (2009 - 2015): Dip in Theft from Vehicle, Surge in Other Theft:

     Observation: A noteworthy exception is observed between 2009 and 2015, where Other Theft surpasses Theft from Vehicle due to a dip in the latter category.

     Interpretation: The temporary shift in dominance may be influenced by specific events, law enforcement initiatives, or changes in criminal behaviors during this period. Understanding the factors contributing to this anomaly can inform strategies for maintaining consistent crime prevention.
3. Steady Trends in Other Crime Categories, Minor Increase in Mischief after 2015:

     Observation: Other crime categories maintain relatively steady trends, with a minor increase in Mischief observed after 2015.

     Interpretation: The overall stability in other crime categories suggests a consistent enforcement landscape, while the uptick in Mischief may warrant further investigation into underlying factors influencing this specific type of crime.

      Crime Count, Weekly Time and Day Heatmap

In [483]:
data_weekly = data.copy()
data_weekly["Date"] = pd.to_datetime(data_weekly[["YEAR", "MONTH", "DAY"]])

In [484]:
data_weekly.head()

Unnamed: 0,TYPE,YEAR,MONTH,DAY,HOUR,MINUTE,HUNDRED_BLOCK,NEIGHBOURHOOD,X,Y,Date
1,Break and Enter Commercial,2019,3,7,2,6,10XX SITKA SQ,Fairview,490612.9648,5457110.0,2019-03-07
2,Break and Enter Commercial,2019,8,27,4,12,10XX ALBERNI ST,West End,491004.8164,5459177.0,2019-08-27
3,Break and Enter Commercial,2021,4,26,4,44,10XX ALBERNI ST,West End,491007.7798,5459174.0,2021-04-26
4,Break and Enter Commercial,2014,8,8,5,13,10XX ALBERNI ST,West End,491015.9434,5459166.0,2014-08-08
5,Break and Enter Commercial,2020,7,28,19,12,10XX ALBERNI ST,West End,491015.9434,5459166.0,2020-07-28


In [485]:
data_weekly["Day_of_week"] = data_weekly["Date"].dt.day_name()

In [486]:
filtered_df = data_weekly.groupby(["HOUR" , "Day_of_week"]).size().reset_index(name = "count")

In [487]:
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']


In [488]:
heatmap_data = filtered_df.pivot(index='HOUR', columns='Day_of_week', values='count')
heatmap_data = heatmap_data[day_order]

In [489]:
fig = px.imshow(heatmap_data,
                labels=dict(x="Day of Week", y="Time of Day", color="Red"),
                x=heatmap_data.columns,
                y=heatmap_data.index,
                color_continuous_scale='Reds',
                text_auto=True,
            
                )

fig.update_xaxes(side="top")
fig.update_yaxes(tickmode='array', tickvals=list(range(24)))
fig.update_yaxes(ticktext=["12 AM", "1 AM", "2 AM", "3 AM", "4 AM", "5 AM", "6 AM", "7 AM", "8 AM", "9 AM", "10 AM", "11 AM",
                           "12 PM", "1 PM", "2 PM", "3 PM", "4 PM", "5 PM", "6 PM", "7 PM", "8 PM", "9 PM", "10 PM", "11 PM"],
                 tickvals=list(range(24)))
fig.show()

The analysis of crime incidents in Vancouver from 2003 to 2023 reveals a distinctive temporal pattern, particularly around midnight on weekends, signifying heightened criminal activities during this time.

1. High Crime Incidence around 12 AM:
     
     Observation: A significant surge in crime incidents is consistently observed around midnight, with a notable peak at 12 AM.
    
     Interpretation: As we noticed and discussed earlier the spike in criminal activities during this late-night hour may be influenced by reduced visibility, fewer witnesses, and increased opportunities for illicit behaviors, contributing to a concentration of incidents.
2. Weekend-specific Trend:

     Observation: The elevated crime count around 12 AM is particularly prominent on weekends.
    
     Interpretation: Weekends, characterized by increased social activities, nightlife, and gatherings, may create an environment conducive to higher crime rates during the late-night hours. Factors such as alcohol consumption and relaxed routines contribute to this observed weekend-specific trend.


     City MAP Visualization

In [490]:
value_count = data["NEIGHBOURHOOD"].value_counts()

In [491]:
import json
import geopandas as gpd
import plotly.express as px

# Load your GeoJSON file
with open('/Users/princegill/Documents/VSCode/Practice3/Data/local-area-boundary.geojson', 'r') as f:
    geojson_data = json.load(f)

# Create a GeoDataFrame from the GeoJSON data
gdf_boundaries = gpd.GeoDataFrame.from_features(geojson_data['features'])

# Merge your crime data with the GeoDataFrame based on a common attribute (e.g., neighborhood name)
gdf_combined = gdf_boundaries.merge(value_count, how='left', left_on='name', right_on=value_count.index)

# Create a choropleth map
fig = px.choropleth_mapbox(gdf_combined, 
                           geojson=gdf_combined.geometry, 
                           locations=gdf_combined.index, 
                           color='count',  # replace with your actual column name
                           color_continuous_scale="Viridis",
                           mapbox_style="carto-positron",
                           center={"lat": gdf_boundaries.geometry.centroid.y.mean(), "lon": gdf_boundaries.geometry.centroid.x.mean()},
                           zoom=10,
                           opacity=0.6,
                           hover_name='name'  # replace with your actual column name
                          )

fig.update_geos(fitbounds="locations", visible=False)
fig.show()


# SUMMARY

---

**Comprehensive Analysis of Crime in Vancouver (2003-2023): Unveiling Patterns and Hotspots**

In a thorough exploration of crime data spanning two decades in Vancouver, several crucial patterns and trends have emerged. From a data preprocessing perspective, the removal of duplicates and handling missing values streamlined the dataset for meaningful analysis.

**Temporal Trends:**
- Crime count witnessed an initial surge around 2003, gradually declining until 2011. Subsequent to 2011, there was a resurgence, peaking notably in 2019 before experiencing a sudden dip in 2021.
- Monthly analysis revealed a dip in crimes during February, a global maxima in August, and local peaks in March, May, and October. A distinctive cyclicality was observed with a decline in December and resurgence in January.

**Daily and Hourly Dynamics:**
- Day-wise crime distribution exhibited no significant pattern, but a dip at the end of the month and a surge at the beginning were notable.
- Hourly analysis highlighted a significant peak in crime around midnight, a period of quiescence from 1 AM to 6 AM, and a gradual increase throughout the day. Weekends witnessed a particularly high crime count around 12 AM.

**Crime Types and Neighborhoods:**
- Theft from Vehicle consistently dominated, excluding a period (2009-2015) when Other Theft surpassed due to a dip in theft from vehicles.
- Central Business District and West End emerged as hotspots with high crime incidence, while Stanley Park and Musqueam experienced the lowest crime rates.

**Neighborhood-wise Analysis:**
- Central Business District and West End consistently reported the highest crime rates, while Stanley Park and Musqueam maintained the lowest.
- Thefts from vehicles consistently held the top position, emphasizing the need for targeted preventive measures.

**Spatial Analysis:**
- A city map color-coded with yellow for hotspots and dark violet for safe areas provided a visual representation of crime distribution across Vancouver.

**Conclusion:**
This comprehensive analysis not only unveils temporal and spatial patterns in crime but also emphasizes the importance of targeted strategies for specific crime types and areas. The insights gained can guide law enforcement, city planning, and community engagement initiatives to foster a safer and more secure Vancouver.

---


Legal Disclaimer 

The release of Vancouver Police Department (VPD) crime data is intended to enhance community awareness of policing activity in Vancouver. Users are cautioned not to rely on the information provided to make decisions about the specific safety level of a specific location or area. By using this data the user agrees and understands that neither the Vancouver Police Department, Vancouver Police Board nor the City of Vancouver assumes liability for any decisions made or actions taken or not taken by the user in reliance upon any information or data provided. 

The data provided in GeoDASH is extracted from the PRIME BC Police Records Management System (RMS). Specific filters, categorizations, and conditions are applied to ensure the data is relevant to public safety, and that it adheres to the BC Freedom of Information & Protection of Privacy Act (BC FIPPA). The broad category of 'Offence Against a Person' includes all violent incidents (e.g., robbery, assault, sexual assault, domestic assault), with the exception of 'Assaults Against Police'. Further, the category is intentionally designed as a large aggregate of multiple subcategories of violent incidents in order to reduce the likelihood of specific incidents being used to reveal Personal Identifiable Information (PII). Property incidents do not fall within the same guidelines, allowing for additional categories to be displayed with a more granular breakdown. The 'Other Theft' category includes a range of property related incidents such as shoplifting, theft of personal property (over / under $5000), mail theft, and utilities theft. Please see the above Frequently Asked Questions (FAQ) guideline for additional details. The data provided uses the 'All Offence' reporting method, with the additional condition of 'Founded' incidents, denoting those incidents where it was determined after police investigation that the violation had occurred. Please note that aggregate categories tend to mask variations in specific areas, as trends in subcategories are not readily identifiable in large datasets. Further, the data and statistical summaries provided in GeoDASH are not comparable to Statistics Canada reporting. Statistics Canada generally use the 'UCR Survey' scoring rules, whereby only the 'Most Serious Offence (MSO)' in an incident is counted. The two methods are not cross comparable. Please see Statistics Canada website (https://www23.statcan.gc.ca/imdb-bmdi/pub/document/3302_D2_T9_V3-eng.pdf) for additional details.

While every effort has been made to be transparent in this process, users should be aware that this data is designed to provide individuals with a general overview of incidents falling into several crime categories. The information provided therefore does not reflect the total number of calls or complaints made to the VPD. The crime classification and file status may change at any time based on the dynamic nature of police investigations. The VPD has taken great care to protect the privacy of all parties involved in the incidents reported. No personal or identifying information has been provided in the data. Locations for reported incidents involving Offences Against a Person have been deliberately randomized to several blocks and offset to an intersection. No time or street location name will be provided for these offences. For property related offences, the VPD has provided the location to the hundred block of these incidents within the general area of the block. All data must be considered offset and users should not interpret any locations as related to a specific person or specific property.
