# Parking Data Analytics
###### by Simon Huang (27067380)

## Questions
* Is there a monthly ticket quotas for agents issuing them?
* Do the density affect the amount of tickets issued?
* Does the population age affect the amount of tickets issued?

## Data Set Sources
##### Los Angeles Parking Citations
https://www.kaggle.com/cityofLA/los-angeles-parking-citations

##### 2010 Census Populations by Zip Code
https://data.lacity.org/dataset/2010-Census-Populations-by-Zip-Code/nxs9-385f

##### Zip Codes in Southern California
https://controllerdata.lacity.org/dataset/Zip-Code-Areas/9uax-58sb

# Loading Data
The data cannot be fetched directly using the `urllib` module. We are assuming that all the data is available locally.

The data sets are placed in the following directories:

`data\raw\los-angeles-parking-citations\parking-citations.csv`

`data\raw\2010_Census_Populations_by_Zip_Code.csv`

`data\raw\Zip Code Areas.geojson`

*Note that `parking-citations.csv` is very large (~1.3GB) and may take time to load*

In [1]:
import pandas as pd

# Removing scientific notation from prints
pd.set_option('display.float_format', lambda x: '%.3f' % x)

In [2]:
parking_data = pd.read_csv("./data/raw/los-angeles-parking-citations/parking-citations.csv")
parking_data.head()

# A warning may occur due to the large size of the .csv

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,Ticket number,Issue Date,Issue time,Meter Id,Marked Time,RP State Plate,Plate Expiry Date,VIN,Make,Body Style,Color,Location,Route,Agency,Violation code,Violation Description,Fine amount,Latitude,Longitude
0,1103341116,2015-12-21T00:00:00,1251.0,,,CA,200304.0,,HOND,PA,GY,13147 WELBY WAY,01521,1.0,4000A1,NO EVIDENCE OF REG,50.0,99999.0,99999.0
1,1103700150,2015-12-21T00:00:00,1435.0,,,CA,201512.0,,GMC,VN,WH,525 S MAIN ST,1C51,1.0,4000A1,NO EVIDENCE OF REG,50.0,99999.0,99999.0
2,1104803000,2015-12-21T00:00:00,2055.0,,,CA,201503.0,,NISS,PA,BK,200 WORLD WAY,2R2,2.0,8939,WHITE CURB,58.0,6439997.9,1802686.4
3,1104820732,2015-12-26T00:00:00,1515.0,,,CA,,,ACUR,PA,WH,100 WORLD WAY,2F11,2.0,000,17104h,,6440041.1,1802686.2
4,1105461453,2015-09-15T00:00:00,115.0,,,CA,200316.0,,CHEV,PA,BK,GEORGIA ST/OLYMPIC,1FB70,1.0,8069A,NO STOPPING/STANDING,93.0,99999.0,99999.0


In [3]:
population_data = pd.read_csv("./data/raw/2010_Census_Populations_by_Zip_Code.csv")
population_data.head()

Unnamed: 0,Zip Code,Total Population,Median Age,Total Males,Total Females,Total Households,Average Household Size
0,91371,1,73.5,0,1,1,1.0
1,90001,57110,26.6,28468,28642,12971,4.4
2,90002,51223,25.5,24876,26347,11731,4.36
3,90003,66266,26.3,32631,33635,15642,4.22
4,90004,62180,34.8,31302,30878,22547,2.73


The zip code areas data is formatted in `geojson`. We need to use the `geopandas` module instead of the regular one.

`geopandas` can be installed using the command `conda install geopandas`

In [4]:
import geopandas as gpd

In [5]:
zip_data = gpd.read_file('./data/raw/Zip Code Areas.geojson')
zip_data.head()

Unnamed: 0,external_i,name,mtfcc10,display_na,intptlat10,set,awater10,slug,zcta5ce10,funcstat10,aland10,geoid10,kind,intptlon10,classfp10,geometry
0,90001,90001,G6350,90001 ZIP Code Tabulation Area (2012),33.9740268,ZIP Code Tabulation Areas (2012),0,90001-zip-code-tabulation-area-2012,90001,S,9071359,90001,ZIP Code Tabulation Area (2012),-118.2495088,B5,(POLYGON ((-118.2651510000001 33.9702490000000...
1,90002,90002,G6350,90002 ZIP Code Tabulation Area (2012),33.9490988,ZIP Code Tabulation Areas (2012),0,90002-zip-code-tabulation-area-2012,90002,S,7930684,90002,ZIP Code Tabulation Area (2012),-118.2467371,B5,(POLYGON ((-118.2373700000001 33.9585210000000...
2,90003,90003,G6350,90003 ZIP Code Tabulation Area (2012),33.9641307,ZIP Code Tabulation Areas (2012),403,90003-zip-code-tabulation-area-2012,90003,S,9197637,90003,ZIP Code Tabulation Area (2012),-118.2727831,B5,(POLYGON ((-118.2651740000001 33.9818280000000...
3,90004,90004,G6350,90004 ZIP Code Tabulation Area (2012),34.0761981,ZIP Code Tabulation Areas (2012),0,90004-zip-code-tabulation-area-2012,90004,S,7894525,90004,ZIP Code Tabulation Area (2012),-118.3107225,B5,(POLYGON ((-118.3116010000001 34.0689580000000...
4,90005,90005,G6350,90005 ZIP Code Tabulation Area (2012),34.0591634,ZIP Code Tabulation Areas (2012),0,90005-zip-code-tabulation-area-2012,90005,S,2807559,90005,ZIP Code Tabulation Area (2012),-118.3068924,B5,(POLYGON ((-118.2916380000001 34.0617930000000...


# Data Sets Descriptions

## Checking if any null values are present

In [6]:
parking_data.isnull().sum()

Ticket number                  0
Issue Date                   568
Issue time                  2925
Meter Id                 7031696
Marked Time              9163348
RP State Plate               765
Plate Expiry Date         866347
VIN                      9459249
Make                        9521
Body Style                  9930
Color                       4523
Location                     938
Route                      70783
Agency                       578
Violation code                 0
Violation Description       1011
Fine amount                 7126
Latitude                       4
Longitude                      4
dtype: int64

In [7]:
population_data.isnull().sum()

Zip Code                  0
Total Population          0
Median Age                0
Total Males               0
Total Females             0
Total Households          0
Average Household Size    0
dtype: int64

In [8]:
zip_data.isnull().sum()

external_i    0
name          0
mtfcc10       0
display_na    0
intptlat10    0
set           0
awater10      0
slug          0
zcta5ce10     0
funcstat10    0
aland10       0
geoid10       0
kind          0
intptlon10    0
classfp10     0
geometry      0
dtype: int64

We notice that only `parking_data` has null values. 

However, not all missing values are declared as NaN. 

For example, the `longitude` and `latitude` has missing values if they are displayed as 99999.

We check if 99999 is the only missing values

In [14]:
parking_data['Latitude'].value_counts(bins=[-1000000,0,1000000,2000000,3000000,4000000,5000000,6000000,7000000,8000000])

(6000000.0, 7000000.0]    8074769
(0.0, 1000000.0]          1400782
(7000000.0, 8000000.0]          0
(5000000.0, 6000000.0]          0
(4000000.0, 5000000.0]          0
(3000000.0, 4000000.0]          0
(2000000.0, 3000000.0]          0
(1000000.0, 2000000.0]          0
(-1000000.001, 0.0]             0
Name: Latitude, dtype: int64

In [10]:
parking_data['Longitude'].value_counts()

99999.000      1400782
1819688.456       8610
1859071.166       7610
1882601.871       5720
1849114.300       5041
1859071.200       4795
1849114.334       4401
1836817.208       4326
1849114.334       4215
1864751.557       4196
1883363.553       3843
1852080.794       3560
1837269.893       3527
1845112.570       3521
1819197.428       3483
1876409.078       3481
1840433.993       3481
1849336.560       3397
1845451.300       3327
1845451.348       3305
1848423.248       3271
1858229.000       3222
1857542.012       3185
1857542.000       3185
1858835.417       3155
1859516.042       3089
1848423.200       3049
1882601.900       2995
1803997.602       2890
1819197.428       2866
                ...   
1832483.744          1
1840585.351          1
1934631.030          1
1903949.900          1
1848483.584          1
1883095.633          1
1861520.860          1
1801712.181          1
1839372.809          1
1903982.400          1
1844281.500          1
1808854.681          1
1869526.494