**Introduction**

This data comes from the Vancouver Open Data Catalogue.
It was extracted on 2017-07-18 and it contains 530,652 records from 2003-01-01 to 2017-07-13.

My focus will be on:
* What neighbourhoods are mentioned in this dataset
* How many crimes were committed in these neighbourhoods
* What year had the most crimes
* What types of crimes were committed

**Import libraries**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

**Load dataset**

In [None]:
crime = pd.read_csv("../input/crime-in-vancouver/crime.csv")

**Undertanding the data**

In [None]:
crime.shape

In [None]:
print('The Number of Rows in the Dataset are',crime.shape[0],'\nThe Number of Columns in the Dataset are ',crime.shape[1])

In [None]:
crime.head()

In [None]:
crime.tail()

In [None]:
crime.nunique()

This tells us a few things:
* There are 11 distinct types of crimes in this dataset
* 24 noted neighbourhoods
* The count for days and year is valid

**Checking for missing data**

In [None]:
crime.isnull().sum()

**Cleaning the data**

I'll be dropping columns I won't be looking into & empty rows

In [None]:
crime.drop(['MINUTE', 'X', 'Y', 'Latitude', 'Longitude'], axis=1, inplace=True)

In [None]:
crime.dropna(inplace=True)

In [None]:
crime.shape

In [None]:
crime.tail()

This will reset the index 

In [None]:
crime.reset_index(drop=True, inplace=True)

In [None]:
crime

All missing values have been dropped, along with columns I will not focus on.

**The analysis**

**1. How many neighbourhoods are listed and what are they?**

In [None]:
print('There are ' + str(crime.NEIGHBOURHOOD.nunique()) + ' listed neighbourhoods, and they are:')
print(crime.NEIGHBOURHOOD.unique())

**2. How many crimes were commited in these neighbourhoods?**

In [None]:
nc = pd.DataFrame([crime['YEAR'], crime['NEIGHBOURHOOD'], crime['TYPE']]).T 
#nc = neighbourhood-crime

In [None]:
nc.head()

In [None]:
ncavg = nc.groupby(['YEAR','NEIGHBOURHOOD']).count().reset_index()
ncavg.head()

In [None]:
ncavg = ncavg.drop('YEAR', axis = 1)
ncavg.columns = ["Neighborhood", "Avg"]
ncavg = ncavg.groupby(['Neighborhood'])['Avg'].mean()
ncavg.head()

In [None]:
plotnc = ncavg.plot(kind = 'bar')

This shows that Central Business District has the the highest average number of crimes with West End being second. 

**3. What year were most crimes committed?**

In [None]:
yc = pd.DataFrame([crime['YEAR'], crime['TYPE']]).T
yc.head()

In [None]:
#yc = year-crime
ycTotal = yc.groupby(['YEAR']).count().reset_index()
ycTotal.columns = ['Year','Total']
ycTotal

In [None]:
sns.lineplot(data=ycTotal, x="Year", y="Total").set_title('# of crime per year in Vancouver')

Overall the crime in Vancouver has decreased but the graph shows a spike in crime between 2013-2016.

**4. What types of crime were committed?**

In [None]:
#tc = type-crime
tc = pd.DataFrame(crime['TYPE'])
tc.head()

In [None]:
tcCount = tc['TYPE'].value_counts()
tcCount

In [None]:
plottc = tcCount.plot(kind = 'bar')

Theft from Vehicle looks to be the most common crime in Vancouver.

**Conclusion**

1. There are 24 neighbourhoods and Central Business district is the neighbourhood with the highest number of crimes.
2. I've visualized the trend in the number of crimes across 2003 - 2017 and highlighted the spike in crimes between 2013-2016.
3. Theft from Vehicles is the most common crime with a count of 170889.