# Better parking in Seattle
## Phase 1 Business understanding

<i>The initial phase is to understand the project's objective from the business or application perspective. Then, you need to translate this knowledge into a machine learning problem with a preliminary plan to achieve the objectives.</i><p>

Imagine you love your car for whatever reason, and you don't want it to be damaged. Arriving in Seattle, your question is <i><b>"where to park my car to reduce the probability of getting damaged to a minimum?"</b></i><p>
   This question should be answered here based on the <a href="https://s3.us.cloud-object-storage.appdomain.cloud/cf-courses-data/CognitiveClass/DP0701EN/version-2/Data-Collisions.csv">ArcGIS data</a> (Metadata described <a href="https://s3.us.cloud-object-storage.appdomain.cloud/cf-courses-data/CognitiveClass/DP0701EN/version-2/Metadata.pdf">here</a>, assuming that this numbers contain all car collisions with parking cars.




## Phase 2 Data understanding

<i>In this phase, you need to collect or extract the dataset from various sources such as csv file or SQL database. Then, you need to determine the attributes (columns) that you will use to train your machine learning model. Also, you will assess the condition of chosen attributes by looking for trends, certain patterns, skewed information, correlations, and so on.</i><p>

Looking at the <a href="https://s3.us.cloud-object-storage.appdomain.cloud/cf-courses-data/CognitiveClass/DP0701EN/version-2/Metadata.pdf">Metadata</a> we recognize that the following attributes may help us answering the question: 
 
|Attribute|Data type, length|Description|
|------|-----|-----|
|SHAPE|Geometry|ESRI geometry field|
|HITPARKEDCAR | Text, 1 | Whether or not the collision involved hitting a parked car. (Y/N)|
    
### Raw data
Let us have a look at the raw data. Here we'll find the geometry as given as longitude (X) and latitude (Y): 

In [1]:
import pandas as pd

df = pd.read_csv("https://s3.us.cloud-object-storage.appdomain.cloud/cf-courses-data/CognitiveClass/DP0701EN/version-2/Data-Collisions.csv", parse_dates=['INCDATE', 'INCDTTM'], low_memory=False)
display(df.head(5))
display(df.dtypes)

Unnamed: 0,SEVERITYCODE,X,Y,OBJECTID,INCKEY,COLDETKEY,REPORTNO,STATUS,ADDRTYPE,INTKEY,...,ROADCOND,LIGHTCOND,PEDROWNOTGRNT,SDOTCOLNUM,SPEEDING,ST_COLCODE,ST_COLDESC,SEGLANEKEY,CROSSWALKKEY,HITPARKEDCAR
0,2,-122.323148,47.70314,1,1307,1307,3502005,Matched,Intersection,37475.0,...,Wet,Daylight,,,,10,Entering at angle,0,0,N
1,1,-122.347294,47.647172,2,52200,52200,2607959,Matched,Block,,...,Wet,Dark - Street Lights On,,6354039.0,,11,From same direction - both going straight - bo...,0,0,N
2,1,-122.33454,47.607871,3,26700,26700,1482393,Matched,Block,,...,Dry,Daylight,,4323031.0,,32,One parked--one moving,0,0,N
3,1,-122.334803,47.604803,4,1144,1144,3503937,Matched,Block,,...,Dry,Daylight,,,,23,From same direction - all others,0,0,N
4,2,-122.306426,47.545739,5,17700,17700,1807429,Matched,Intersection,34387.0,...,Wet,Daylight,,4028032.0,,10,Entering at angle,0,0,N


SEVERITYCODE                    int64
X                             float64
Y                             float64
OBJECTID                        int64
INCKEY                          int64
COLDETKEY                       int64
REPORTNO                       object
STATUS                         object
ADDRTYPE                       object
INTKEY                        float64
LOCATION                       object
EXCEPTRSNCODE                  object
EXCEPTRSNDESC                  object
SEVERITYCODE.1                  int64
SEVERITYDESC                   object
COLLISIONTYPE                  object
PERSONCOUNT                     int64
PEDCOUNT                        int64
PEDCYLCOUNT                     int64
VEHCOUNT                        int64
INCDATE           datetime64[ns, UTC]
INCDTTM                datetime64[ns]
JUNCTIONTYPE                   object
SDOT_COLCODE                    int64
SDOT_COLDESC                   object
INATTENTIONIND                 object
UNDERINFL   

In [2]:
print(f'Oldes entry: {df.INCDATE.min():%Y-%m-%d}')
print(f'Newest entry: {df.INCDATE.max():%Y-%m-%d}')

print(f'Count of datasets: {df.shape[0]}')
print(f'Count of datasets with parking cars involved: {df[df.HITPARKEDCAR == "Y"].shape[0]}')

Oldes entry: 2004-01-01
Newest entry: 2020-05-20
Count of datasets: 194673
Count of datasets with parking cars involved: 7216


### Viewing the data
Let us have a first view on a map.

In [None]:
import folium
from folium.plugins import HeatMap

# selet only events with damage on parking cars
df_copy = df[df.HITPARKEDCAR == 'Y'].copy()

df_copy['count'] = 1

base_map = folium.Map(location=[47.608013, -122.335167], control_scale=True, zoom_start=12)
HeatMap(data=df_copy[['Y', 'X', 'count']] \
        .groupby(['Y', 'X'])\
        .sum().reset_index().values.tolist(), 
        radius=8, max_zoom=13).add_to(base_map)
base_map

<img src="seattle.png">