### Illegal Parking in Toronto

<span> This notebook is a descriptive analysis of illegal parking in Toronto. The data has been source from Open Data Toronto.
    
Dataset: [2016 Parking Ticket Information](https://www.toronto.ca/city-government/data-research-maps/open-data/open-data-catalogue/#75d14c24-3b7e-f344-4412-d8fd41f89455)

### Import Preliminaries

In [1]:
 # Import generic data science packages and configurations
%matplotlib inline
%config InlineBackend.figure_format='retina'

# Import modules
import itertools
import keras
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import matplotlib as mpl
import numpy as np
import pandas as pd 
import sklearn
import seaborn as sns
import warnings

# Import Model Selection
from sqlalchemy import create_engine
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Set pandas options
pd.set_option('max_columns',1000)
pd.set_option('max_rows',30)
pd.set_option('display.float_format', lambda x: '%.3f' % x)

# Set plotting options
mpl.rcParams['figure.figsize'] = (8.0, 7.0)

# Ignore Warnings
warnings.filterwarnings('ignore')

Using TensorFlow backend.


### Import Data

In [47]:
# Import data from local direcotry
p1 = pd.read_csv('Data/Toronto Parking Tickets/Parking_Tags_Data_2016_1.csv')
p2 = pd.read_csv('Data/Toronto Parking Tickets/Parking_Tags_Data_2016_2.csv')
p3 = pd.read_csv('Data/Toronto Parking Tickets/Parking_Tags_Data_2016_3.csv')
p4 = pd.read_csv('Data/Toronto Parking Tickets/Parking_Tags_Data_2016_4.csv')
pk = pd.concat([p1,p2,p3,p4], axis=0)

# Princ the dataframe shape
print('DataFrame Shape: P1', p1.shape)
print('DataFrame SHape: p2', p2.shape)
print('DataFrame Shape: p3', p3.shape)
print('DataFrame Shape: p4', p4.shape)
print('DataFrame Shape pk:', pk.shape)

DataFrame Shape: P1 (750000, 11)
DataFrame SHape: p2 (750000, 11)
DataFrame Shape: p3 (750000, 11)
DataFrame Shape: p4 (4761, 11)
DataFrame Shape pk: (2254761, 11)


In [25]:
pk.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2254761 entries, 0 to 4760
Data columns (total 11 columns):
tag_number_masked         object
date_of_infraction        int64
infraction_code           float64
infraction_description    object
set_fine_amount           int64
time_of_infraction        float64
location1                 object
location2                 object
location3                 object
location4                 object
province                  object
dtypes: float64(2), int64(2), object(7)
memory usage: 206.4+ MB


In [31]:
# View a sample of the datafame
pk.head()

Unnamed: 0,tag_number_masked,date_of_infraction,infraction_code,infraction_description,set_fine_amount,time_of_infraction,location1,location2,location3,location4,province
0,***03850,20160101,29.0,PARK PROHIBITED TIME NO PERMIT,30,0.0,NR,49 GLOUCESTER ST,,,ON
1,***03851,20160101,29.0,PARK PROHIBITED TIME NO PERMIT,30,1.0,NR,45 GLOUCESTER ST,,,ON
2,***98221,20160101,29.0,PARK PROHIBITED TIME NO PERMIT,30,2.0,NR,274 GEORGE ST,,,ON
3,***85499,20160101,29.0,PARK PROHIBITED TIME NO PERMIT,30,2.0,NR,270 GEORGE ST,,,ON
4,***03852,20160101,406.0,PARK-VEH. W/O VALID ONT PLATE,40,2.0,NR,45 GLOUCESTER ST,,,ON


### Cleaning Data

In [48]:
pk.date_of_infraction = pk.date_of_infraction.astype(str)
time = pd.to_datetime(pk["time_of_infraction"], unit='s')
pk['Datetime'] = (pk.date_of_infraction.str.slice(0,4) + "-" 
                         + pk.date_of_infraction.str.slice(4,6) + "-"  
                         + pk.date_of_infraction.str.slice(6)) + '-' +time.dt.strftime('%H:%M')
pk['Datetime']

0       2016-01-01-00:00
1       2016-01-01-00:00
2       2016-01-01-00:00
3       2016-01-01-00:00
4       2016-01-01-00:00
5       2016-01-01-00:00
6       2016-01-01-00:00
7       2016-01-01-00:00
8       2016-01-01-00:00
9       2016-01-01-00:00
10      2016-01-01-00:00
11      2016-01-01-00:00
12      2016-01-01-00:00
13      2016-01-01-00:00
14      2016-01-01-00:00
              ...       
4746    2016-12-31-00:39
4747    2016-12-31-00:39
4748    2016-12-31-00:39
4749    2016-12-31-00:39
4750    2016-12-31-00:39
4751    2016-12-31-00:39
4752    2016-12-31-00:39
4753    2016-12-31-00:39
4754    2016-12-31-00:39
4755    2016-12-31-00:39
4756    2016-12-31-00:39
4757    2016-12-31-00:39
4758    2016-12-31-00:39
4759      2016-12-31-NaT
4760      2016-12-31-NaT
Name: Datetime, Length: 2254761, dtype: object

In [34]:
pk.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2254761 entries, 0 to 4760
Data columns (total 11 columns):
tag_number_masked         object
date_of_infraction        datetime64[ns]
infraction_code           float64
infraction_description    object
set_fine_amount           int64
time_of_infraction        float64
location1                 object
location2                 object
location3                 object
location4                 object
province                  object
dtypes: datetime64[ns](1), float64(2), int64(1), object(7)
memory usage: 206.4+ MB


0       00:00
1       00:00
2       00:00
3       00:00
4       00:00
5       00:00
6       00:00
7       00:00
8       00:00
9       00:00
10      00:00
11      00:00
12      00:00
13      00:00
14      00:00
        ...  
4746    00:39
4747    00:39
4748    00:39
4749    00:39
4750    00:39
4751    00:39
4752    00:39
4753    00:39
4754    00:39
4755    00:39
4756    00:39
4757    00:39
4758    00:39
4759      NaT
4760      NaT
Name: time_of_infraction, Length: 2254761, dtype: object