# **Step 1 : Ask**
# 
Dataset sources that we use:
The dataset comes from the Vancouver Open Data Catalogue.
It was extracted on 2017-07-18 and it contains 530,652 records from 2003-01-01 to 2017-07-13.


Questions that we want to answer from this case study:

1. What neighbourhoods are mentioned in this dataset
2. How many crimes were committed in these neighbourhoods
3. What year had the most crimes
4. What types of crimes were committed

# **Step 2 : Prepare**

Loading the dataset and import library that we need to help me explore the dataset

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
        
data = pd.read_csv("../input/crime-in-vancouver/crime.csv")
data.head()
# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# **Step 3: Process**

In [None]:
# data.shape
print('number of rows in the dataset are : ', data.shape[0], 'number of columns in the dataset are : ', data.shape[1] )

In [None]:
data.dtypes

In [None]:
data.nunique()

From this summary we can see :

* There are 11 distinct types of crimes in this dataset
* 24 noted neighbourhoods
* The count for days and year is valid

In [None]:
data.isnull().sum()

From this summary we will drop some columns :
* HOUR
* MINUTES
* HUNDRED_BLOCK
* NEIGHBOURHOOD

In [None]:
data.drop(['MINUTE', 'X', 'Y', 'Latitude', 'Longitude'], axis=1, inplace=True)
data.dropna(inplace=True)
data.shape

In [None]:
data.reset_index(drop=True, inplace=True)

# **Step 4 : Analysis**

**1. How many neighbourhoods are listed and what are they?**

In [None]:
print('We have :',str(data.NEIGHBOURHOOD.nunique()), 'neighbourhoods')
print('List of neighbourhood :')
print(data.NEIGHBOURHOOD.unique())

**2. How many crimes were commited in these neighbourhoods?**

In [None]:
count_type= data[['NEIGHBOURHOOD','YEAR','TYPE']].groupby(['YEAR','NEIGHBOURHOOD']).count().reset_index(drop=False)
print(count_type)

In [None]:
import plotly.express as px
import plotly.graph_objects as go

In [None]:
def group_data(df,col1,col2):
    subdata = df.groupby([col1,col2]).count().reset_index(drop=False)
    subdata = subdata[[col1,col2,'TYPE']]
    subdata.columns = [col1,col2,'Counts']
    return subdata
att_dept = group_data(data,'NEIGHBOURHOOD','YEAR')
fig3=px.bar(att_dept,x='NEIGHBOURHOOD',y='Counts',color='YEAR',title='Total Crimes by Neighbourhood',color_discrete_sequence=['blue','red'])
fig3.show()

This shows that Central Business District has the the highest average number of crimes with West End being second.

**3. What year were most crimes committed?**

In [None]:
year_crime = data[['YEAR','TYPE']]
year_crime = year_crime.groupby('YEAR').count().reset_index(drop=False)
year_crime.columns = ['YEAR', 'Counts']
fig = px.line(year_crime, x='YEAR',y='Counts',title='Number of Crimes in Vancouver')
fig.show()

From this diagram, we can see that crimes in Vancouver has decreased but the graph shows a spike in crime between 2013-2016.


**4. What types of crime were committed?**

In [None]:
type_crime = pd.DataFrame(data['TYPE']).value_counts().reset_index(drop=False)
type_crime.columns = ['TYPE', 'Counts']
fig3=px.bar(type_crime,x='TYPE',y='Counts',title='Total Crimes by Neighbourhood',color_discrete_sequence=['blue','red'])
fig3.show()

From this diagram we can see that Theft from Vehicle is the most common crime in Vancouver.

# **Conclusion**

1. From the crime records, we have 24 neighbourhoods in Vancouver and Central Business district is the neighbourhood with the highest number of crimes.
2. Number of crimes has decreased from 2004 - 2011 but we highlighted the spike in crimes between 2013-2016.
3. Theft from Vehicles is the most common crime with a count of 170889.