### Necessary Python Imports and Setup

In [34]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from calendar import month_name 

---
# Cleaning and Organizing the Data for Analysis

Alright, so let's get our data in here and take a look at what we've got.

In [39]:
crime_df = pd.read_csv("vancouver_crime.csv")

crime_df.head()

Unnamed: 0,TYPE,YEAR,MONTH,DAY,HOUR,MINUTE,HUNDRED_BLOCK,NEIGHBOURHOOD,X,Y,Latitude,Longitude
0,Other Theft,2003,5,12,16.0,15.0,9XX TERMINAL AVE,Strathcona,493906.5,5457452.47,49.269802,-123.083763
1,Other Theft,2003,5,7,15.0,20.0,9XX TERMINAL AVE,Strathcona,493906.5,5457452.47,49.269802,-123.083763
2,Other Theft,2003,4,23,16.0,40.0,9XX TERMINAL AVE,Strathcona,493906.5,5457452.47,49.269802,-123.083763
3,Other Theft,2003,4,20,11.0,15.0,9XX TERMINAL AVE,Strathcona,493906.5,5457452.47,49.269802,-123.083763
4,Other Theft,2003,4,12,17.0,45.0,9XX TERMINAL AVE,Strathcona,493906.5,5457452.47,49.269802,-123.083763


*For the purposes of this project, I am going to pare down the data a little bit into something more workable. There is some great info in here, but also some extra details that I don't want to deal with while I work with the data.*

### Dropping Extra Columns
First things first, let's get rid of the X, Y, Latitude, and Longitude columns. I am satisfied with letting "Neighborhood" and "Hundred_Block" be representative of the location of the crime.

Next, there is a LOT of "time" information here.
Some of these catagories are great and useful seperately like the Year and Month of the crime.

I am less concerned about which day in the month a crime happened. I believe this is caught well enough by the month column, so I am just going to drop this "Day" column too.

Similarly, I'm less interested in where in the hour crime happened than when in the day a crime happened - I'm going to combine the Minute column into the Hour column. 



In [40]:
df = crime_df

df["HOUR"] = df.apply(
    lambda row:
        row["HOUR"] + (0.01 * row["MINUTE"]),
    axis=1
)


df.drop(
    columns = ["X", "Y", "Latitude", "Longitude", "DAY", "MINUTE"],
    inplace = True,
)

df.head()

Unnamed: 0,TYPE,YEAR,MONTH,HOUR,HUNDRED_BLOCK,NEIGHBOURHOOD
0,Other Theft,2003,5,16.15,9XX TERMINAL AVE,Strathcona
1,Other Theft,2003,5,15.2,9XX TERMINAL AVE,Strathcona
2,Other Theft,2003,4,16.4,9XX TERMINAL AVE,Strathcona
3,Other Theft,2003,4,11.15,9XX TERMINAL AVE,Strathcona
4,Other Theft,2003,4,17.45,9XX TERMINAL AVE,Strathcona


This is looking pretty good, and very workable. 

The last thing we are going to do is turn the Month's of the year back into their names, for readability.

In [41]:
df["MONTH"] = df.apply(
    lambda row:
        month_name[row["MONTH"]],
    axis = 1
)

df.head()

Unnamed: 0,TYPE,YEAR,MONTH,HOUR,HUNDRED_BLOCK,NEIGHBOURHOOD
0,Other Theft,2003,May,16.15,9XX TERMINAL AVE,Strathcona
1,Other Theft,2003,May,15.2,9XX TERMINAL AVE,Strathcona
2,Other Theft,2003,April,16.4,9XX TERMINAL AVE,Strathcona
3,Other Theft,2003,April,11.15,9XX TERMINAL AVE,Strathcona
4,Other Theft,2003,April,17.45,9XX TERMINAL AVE,Strathcona


This leaves us with some extremely telling variables to work with!
We have the following types of data in our study:
1. **Type** - a nominal, catagorical variable describing the type of crime committed.
2. **Year** - a nominal, ordinal variable describing the year the crime was committed.
3. **Month** - a nominal, ordinal variable describing the month of the year the crime was committed.
4. **Hour** - a discrete, numerical variable describing the minute of the day that the crime was committed.
5. **Hundred Block** - a nominal, catagorical variable describing the rough block in Vancouver the crime was committed at.
6. **Neighborhood** - a nominal, catagorical variable describing the neighborhood in Vancouver the crime was committed at.
---