# Metered Parking in Boston
We are going to do some analysis on what metered parking is available in the Boston area, using data taken from Boston's [Open Data Portal](https://data.boston.gov/dataset/parking-meters).

This file is available in the repository as a [csv](https://www.computerhope.com/issues/ch001356.htm) (comma seperated value file, similar to the type of tabular data you would work with in excel).  
![Parking Map](./data/parking.png)

#### Exercise Notes:
  `Syntax will be contained in code blocks like this.`
  
*Italicized portions of the example syntax should be replaced with the your variables*.  Normal text (not italicized) should be copied precisely.

We will cover:  
[Step 1: Importing Libraries](#Step-1:-Import-the-libraries-you-plan-to-use)  
[Step 2: Loading a CSV](#Step-2:-Loading-a-CSV)  
[Step 3: Exploring the data](#Step-3:-Exploring-the-data)  
[Step 4: Reorganizing the Data](#Step-4:-Reorganizing-the-Data)  
[Step 5: Mapping the Data](#Step-5:-Mapping-the-Data)  
[Step 6: Exporting Files](#Step-6:-Exporting-Files)


  

## Step 1: Import the libraries you plan to use

(This is done in the first lines of your script.  Always keep in mind that the script will run in order and won't have access to variables and functions set later in the file, just as you wouldn't be able to give someone the weather report if you hadn't looked it up yet.)

We will use:
- [pandas](https://pandas.pydata.org/pandas-docs/stable/reference/index.html).  This library allows us to easily manipulate and analyze data structures.
- [folium](https://python-visualization.github.io/folium/) for data vizualization with leaflet maps
- [geopy](https://github.com/geopy/geopy) for converting coordinates to addresses (reverse geocoding)

importing "as pd" allows us nickname pandas so that instead of typing the full name later, we can substitute "pd"  
Example: (pandas.dataframe.columns can instead be typed pd.dataframe.columns)

In [2]:
import pandas as pd
import geopandas as gp
from geopy.geocoders import Nominatim 
from geopy.extra.rate_limiter import RateLimiter #optional for our purposes
import folium


## Step 2: Loading a CSV 
![csv example](./data/meters_csv.png)

Pandas comes with built in functionality to read in a csv  
The syntax is:  
`pd.read_csv('`*`file_path`*`')`

To make this file easier to refer back to later, we are going to save it to a variable name of our choice. I'm going to call it boston_meters.

In [3]:
# Remember, variable names cannot contain spaces, 
# To make the name more readable you can separate words with-a-dash or_with_underscores

boston_meters = pd.read_csv('./data/parking_meters_boston.csv')
charlestown_pay = pd.read_csv('./data/charlestown_pay.csv')


In [4]:
#load the charlestown_location.csv
charlestown_locations = pd.read_csv('./data/charlestown_location.csv')




## Step 3: Exploring the data
There are several techniques we can use to get a sense of what sort of data is available. 

Keep in mind that the code that is run will not automatically display results.  If you want the program to report back to you, you will need to wrap the command (or the variable it is saved to) in a print funtion


#### How many datapoints?
To start, let's find out how much data we are dealing with. Since each row gives information about a specific parking meter, we can find out how many parking meters are reported in this dataset by getting a row count for our CSV.

The syntax is:
*`dataframe`*`.shape`

In [5]:
# Remember we named our dataframe "boston_meters" in step 2
# Keep in mind that the code that is run will not automatically display results.  
#If you want the program to report back to you, you will need to wrap the command (or the variable) in a print funtion 

print(boston_meters.shape)

(6955, 14)


#### What columns does this csv have?
Let's take a look at the data available in the csv by printing the column headings.  The data structure is identical for the Charlestown and Boston dataframes.

The syntax is:
*`pd.dataframe`*`.columns`

In [6]:
print(boston_meters.columns)

Index(['OBJECTID', 'METER_ID', 'VENDOR', 'PAY_POLICY', 'PARK_NO_PAY',
       'TOW_AWAY', 'BLK_NO', 'STREET', 'LONGITUDE', 'LATITUDE', 'G_DISTRICT',
       'G_SUBZONE', 'G_ZONE', 'BASE_RATE'],
      dtype='object')


#### How many cells are missing Data?
Syntax:  
*`dataframe`*`.isnull().sum()`

In [7]:
boston_meters.isnull().sum()

OBJECTID          0
METER_ID       6831
VENDOR            1
PAY_POLICY        2
PARK_NO_PAY       2
TOW_AWAY       6901
BLK_NO            1
STREET            1
LONGITUDE         1
LATITUDE          1
G_DISTRICT       15
G_SUBZONE        15
G_ZONE           15
BASE_RATE         3
dtype: int64

#### Finding all unique values
Dataframe 1 tells us which vendors service the meters in the "VENDOR" column. How many vendors service the boston area meters?  
Syntax: *`dataframe.column`*`.unique()`

In [8]:
print(boston_meters.VENDOR.unique())

['IPS' 'Parkeon' nan]


In [9]:
# What are the distinct types of pay policies for meters? 

print(boston_meters.PAY_POLICY.unique())


['08:00AM-04:00PM MON-FRI $0.25 120,08:00AM-08:00PM SAT $0.25 120,06:00PM-08:00PM MON-FRI $0.25 120'
 '08:00AM-08:00PM MON-SAT $0.25 120' '11:00AM-08:00PM MON-SAT $0.25 120'
 '08:00AM-06:00PM MON-SAT $0.25 240' '08:00AM-05:00PM MON-SAT $0.25 120'
 '08:00AM-08:00PM SAT $0.25 120, 09:30AM-08:00PM MON-FRI $0.25 120'
 '08:00AM-08:00PM MON-SAT $0.25 720' '08:00AM-06:00PM MON-SAT $0.25 120'
 '08:00AM-08:00PM SAT $0.25 120' '08:00AM-04:00PM MON-SAT $0.25 120'
 '08:00AM-04:00PM MON-FRI $0.25 120, 08:00AM-08:00PM SAT $0.25 120, 06:00PM-08:00PM MON-FRI $0.25 120'
 '08:00AM-08:00PM SAT $0.25 120, 09:30AM-04:00PM MON-FRI $0.25 120, 06:00PM-08:00PM MON-FRI $0.25 120'
 '09:00AM-05:00PM MON-SAT $0.25 120'
 '08:00AM-06:00PM SAT $0.25 120, 10:00AM-04:00PM MON-FRI $0.25 120'
 '08:00AM-08:00PM SAT $0.25 120, 10:00AM-06:00PM MON-FRI $0.25 120' nan
 '08:00AM-08:00PM MON-FRI $0.25 120'
 '08:00AM-06:00PM SAT $0.25 120, 09:30AM-06:00PM MON-FRI $0.25 120'
 '08:00AM-06:00PM SAT $0.25 120, 09:30AM-04:00PM MON-FR

  
  
  
  
## Step 4: Reorganizing the Data

#### Dropping Columns and Rows
Since we will ultimately be putting this data on a map, we would like to drop all values that don't include a location. We will filter which NaN values to drop by specifying a subset of columns.

Syntax:  
*`DataFrame`*`.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)`

*`DataFrame`*`.drop(self, labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')`

In [5]:
edited_columns =boston_meters.drop(['TOW_AWAY','G_DISTRICT', 'G_ZONE', 'G_SUBZONE', 'METER_ID'], axis=1)
edited_columns.head()

Unnamed: 0,OBJECTID,VENDOR,PAY_POLICY,PARK_NO_PAY,BLK_NO,STREET,LONGITUDE,LATITUDE,BASE_RATE
0,1001,IPS,"08:00AM-04:00PM MON-FRI $0.25 120,08:00AM-08:0...","00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",SOME,CAMBRIDGE ST,-71.060641,42.360431,0.25
1,1002,IPS,08:00AM-08:00PM MON-SAT $0.25 120,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",SOME,CAMBRIDGE ST,-71.060526,42.360332,0.25
2,1003,IPS,08:00AM-08:00PM MON-SAT $0.25 120,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",SOME,CAMBRIDGE ST,-71.060479,42.360287,0.25
3,1004,IPS,08:00AM-08:00PM MON-SAT $0.25 120,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",SOME,CAMBRIDGE ST,-71.060373,42.360188,0.25
4,1005,IPS,08:00AM-08:00PM MON-SAT $0.25 120,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",SOME,CAMBRIDGE ST,-71.06033,42.360139,0.25


In [6]:
no_na = edited_columns.dropna(subset=['LONGITUDE', 'LATITUDE'])

##### Merging Dataframes  

![Merge Types](./data/merges.png)

We have a dataframe listing parking meters for Charlestown and another dataframe listing parking meters for Boston.
Try combining these two into one dataframe.


Syntax: *`dataframe`*`.merge(`*`dataframe_2`*`, how = "")`   (Default is inner merge)


In [8]:
#We will merge charlestown_pay & charlestown_locations
charlestown_meters = charlestown_pay.merge(charlestown_locations, how="outer")

In [9]:
all_meters = no_na.append(charlestown_meters, sort=True)


#### Filtering
Sometimes you may only want data with certain attributes. You can filter the data and save to a new dataframe or delete data from the table.  It can also be useful in cases where you want a count of the data that matches your query.

In [101]:
sun = all_meters[all_meters['PAY_POLICY'].str.contains('SUN', na=False)]
print(sun)

      BASE_RATE BLK_NO G_DISTRICT G_SUBZONE G_ZONE   LATITUDE  LONGITUDE  \
2071       0.25   CLAR        NaN       NaN    NaN  42.351037 -71.073597   

      METER_ID  OBJECTID                                       PARK_NO_PAY  \
2071       NaN        72  00:00AM-08:00AM SUN-SAT, 08:00PM-24:00AM SUN-SAT   

                             PAY_POLICY           STREET  TOW_AWAY   VENDOR  
2071  08:00AM-08:00PM SUN-SAT $0.25 120  BOYLSTON ST C-B       NaN  Parkeon  



![string.contains documentation](./data/str-contains-method.png)  
One way to do this is to check if the cell contains a certain string (remember a string is a sequence of characters).
syntax: *`dataframe[dataframe['column']`*`.str.contains(`*`'string we are looking for'`*`)]`

   This will return all result that evaluate to true.  In the next example we want all the results that *do not contain* a certain string.  We are in luck! We can easily invert our results by including *`~`* in front of the dataframe path like this: *`dataframe[~ dataframe['column']`*
   
   
   
Some additional methods include `str.startswith("")` and `str.endswith("")`


In [112]:
# let's find out what meters don't require payment on saturdays
# we have included the optional parameter "na=False" to exclude no data values, which can neither be true nor false

free_saturdays = all_meters[~ all_meters['PAY_POLICY'].str.contains('SAT', na=False)]
print(free_saturdays)

free_saturday = pd.DataFrame(free_saturdays).reset_index(drop=True)


      BASE_RATE BLK_NO G_DISTRICT G_SUBZONE G_ZONE   LATITUDE  LONGITUDE  \
1222       0.25   CHAN        NaN       NaN    NaN  42.346890 -71.073805   
1381       0.25   EAST        NaN       NaN    NaN  42.342063 -71.067369   
2037       0.25   FAIR        NaN       NaN    NaN  42.349293 -71.083408   
4575       0.25   BEAC        NaN       NaN    NaN  42.354180 -71.076863   
4576       0.25   BEAC        NaN       NaN    NaN  42.354121 -71.076832   
...         ...    ...        ...       ...    ...        ...        ...   
4882       0.25   MARL        NaN       NaN    NaN  42.351200 -71.086933   
4883       0.25   MARL        NaN       NaN    NaN  42.351254 -71.086960   
4884       0.25   MARL        NaN       NaN    NaN  42.351308 -71.086986   
4885       0.25   MARL        NaN       NaN    NaN  42.351362 -71.087013   
4886       0.25   MARL        NaN       NaN    NaN  42.351416 -71.087040   

      METER_ID  OBJECTID                                        PARK_NO_PAY  \
1222    

# Reverse GeoCoding

In [113]:
locator = Nominatim(user_agent="test")
coordinates = "53.480837, -2.244914"
location = locator.reverse(coordinates)
location.raw

{'place_id': 96393663,
 'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright',
 'osm_type': 'way',
 'osm_id': 37139875,
 'lat': '53.4809597',
 'lon': '-2.2450668274629235',
 'display_name': 'Eagle Insurance Buildings, 68, Cross Street, City Centre, Manchester, Greater Manchester, North West England, England, M2 4JG, United Kingdom',
 'address': {'building': 'Eagle Insurance Buildings',
  'house_number': '68',
  'road': 'Cross Street',
  'suburb': 'City Centre',
  'city': 'Manchester',
  'county': 'Greater Manchester',
  'state_district': 'North West England',
  'state': 'England',
  'postcode': 'M2 4JG',
  'country': 'United Kingdom',
  'country_code': 'gb'},
 'boundingbox': ['53.480856', '53.4810634', '-2.2451761', '-2.2449576']}

In [123]:
#create a dataframe geometry column which combines lat and long
free_saturday["geom"] = free_saturday["LATITUDE"].map(str) + "," + free_saturday["LONGITUDE"].map(str)
free_saturday.head()

Unnamed: 0,BASE_RATE,BLK_NO,G_DISTRICT,G_SUBZONE,G_ZONE,LATITUDE,LONGITUDE,METER_ID,OBJECTID,PARK_NO_PAY,PAY_POLICY,STREET,TOW_AWAY,VENDOR,geom,address
0,0.25,CHAN,,,,42.34689,-71.073805,,2223,,,COLUMBUS AV,,IPS,"42.34689,-71.07380500000001","(None, (0.0, 0.0))"
1,0.25,EAST,,,,42.342063,-71.067369,,2382,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-FRI, ...",08:00AM-08:00PM MON-FRI $0.25 120,WASHINGTON ST,,IPS,"42.342063,-71.067369","(None, (0.0, 0.0))"
2,0.25,FAIR,,,,42.349293,-71.083408,,38,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-FRI, ...",08:00AM-08:00PM MON-FRI $0.25 120,NEWBURY ST F-G,,Parkeon,"42.349292999999996,-71.08340799999999","(None, (0.0, 0.0))"
3,0.25,BEAC,,,,42.35418,-71.076863,,3576,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-FRI, ...",08:00AM-06:00PM MON-FRI $0.25 120,CLARENDON ST,,IPS,"42.35418,-71.076863","(None, (0.0, 0.0))"
4,0.25,BEAC,,,,42.354121,-71.076832,,3577,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-FRI, ...",08:00AM-06:00PM MON-FRI $0.25 120,CLARENDON ST,,IPS,"42.354121,-71.076832","(None, (0.0, 0.0))"


In [129]:

# Define reverse Geocoding function
locator = Nominatim(user_agent="free_parking_saturdays", timeout=10)
reverse_geocode = RateLimiter(locator.reverse, min_delay_seconds=.01)

#Create Address Column variable and call the function we created
free_saturday["address"] = free_saturday["geom"].apply(reverse_geocode)

#Check out results
free_saturday.head()

Unnamed: 0,BASE_RATE,BLK_NO,G_DISTRICT,G_SUBZONE,G_ZONE,LATITUDE,LONGITUDE,METER_ID,OBJECTID,PARK_NO_PAY,PAY_POLICY,STREET,TOW_AWAY,VENDOR,geom,address
0,0.25,CHAN,,,,42.34689,-71.073805,,2223,,,COLUMBUS AV,,IPS,"42.34689,-71.07380500000001","(290, Columbus Avenue, Chinatown, South End, B..."
1,0.25,EAST,,,,42.342063,-71.067369,,2382,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-FRI, ...",08:00AM-08:00PM MON-FRI $0.25 120,WASHINGTON ST,,IPS,"42.342063,-71.067369","(38, Savoy Street, Chinatown, South End, Bosto..."
2,0.25,FAIR,,,,42.349293,-71.083408,,38,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-FRI, ...",08:00AM-08:00PM MON-FRI $0.25 120,NEWBURY ST F-G,,Parkeon,"42.349292999999996,-71.08340799999999","(Beantown Pho & Grill, 272, Newbury Street, Bl..."
3,0.25,BEAC,,,,42.35418,-71.076863,,3576,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-FRI, ...",08:00AM-06:00PM MON-FRI $0.25 120,CLARENDON ST,,IPS,"42.35418,-71.076863","(285, Clarendon Street, Chinatown, Back Bay, B..."
4,0.25,BEAC,,,,42.354121,-71.076832,,3577,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-FRI, ...",08:00AM-06:00PM MON-FRI $0.25 120,CLARENDON ST,,IPS,"42.354121,-71.076832","(285, Clarendon Street, Chinatown, Back Bay, B..."


# Sep Lists in Columns

## Step 5: Mapping the Data

#### Initializing the map
Start by creating a map object. We need to start by specifying where the map should be and what basemap to use.

Syntax:  
foilum.Map(location=[latitude, longitude], zoom_start = #, tiles='optional custom tiles')

Note: Higher numbers correspond to higher zoom level


In [130]:
map = folium.Map(location=[42.3621, -71.0570], zoom_start = 14, tiles='Stamen Toner')

numMarkers = len(free_saturday)

for i in range(0,numMarkers):
    try:
        longitude = float(free_saturday.iloc[i]['LONGITUDE'])
        latitude = float(free_saturday.iloc[i]['LATITUDE'])
        location = [latitude, longitude]
        popup_text = free_saturday.iloc[i]['address']
        folium.CircleMarker(location=location, radius=10, popup=popup_text, color='#FA8072', fill=True, fill_color='#FA8072').add_to(map)
    except Exception as exception:
        print("exception:", exception)
        pass
map

## Step 6: Exporting Files

Now we want to save the file we have just created. We could save it as a CSV, but suppose we need it in other formats? Let's write it out to a GeoJSON using built in functionality from the GeoPandas Library.
Syntax:
gdf = geopandas.GeoDataFrame(
    df, geometry=geopandas.points_from_xy(df.Longitude, df.Latitude))

In [131]:
geo_dataframe = gp.GeoDataFrame(free_saturdays, geometry=gp.points_from_xy(free_saturdays.LONGITUDE, free_saturdays.LATITUDE))

print(geo_dataframe.head())

      BASE_RATE BLK_NO G_DISTRICT G_SUBZONE G_ZONE   LATITUDE  LONGITUDE  \
1222       0.25   CHAN        NaN       NaN    NaN  42.346890 -71.073805   
1381       0.25   EAST        NaN       NaN    NaN  42.342063 -71.067369   
2037       0.25   FAIR        NaN       NaN    NaN  42.349293 -71.083408   
4575       0.25   BEAC        NaN       NaN    NaN  42.354180 -71.076863   
4576       0.25   BEAC        NaN       NaN    NaN  42.354121 -71.076832   

      METER_ID  OBJECTID                                        PARK_NO_PAY  \
1222       NaN      2223                                                NaN   
1381       NaN      2382  00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-FRI, ...   
2037       NaN        38  00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-FRI, ...   
4575       NaN      3576  00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-FRI, ...   
4576       NaN      3577  00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-FRI, ...   

                             PAY_POLICY          STREET  TOW_AWAY   

In [132]:
geo_dataframe.to_file("parking.geojson", driver='GeoJSON')