# Collingswood ADA Curb Ramp Analysis
---
An inspection of every curb ramp within a half-mile radius of the PATCO train station in Collingswood, NJ was conducted to rate each one's compliance with The Americans with Disabilities Act of 1990 (ADA).  Reasons to rate a ramp as non-compliant include but are not limited to:

- No detectable warning surface (DWS)/damaged DWS
- Running slope too steep (>5%)
- Cross slope too steep (>2%)
- Misaligned ramp/DWS
- No ramp

Over 400 ramps/corners were observed, and more than 280 were deemed non-compliant.  The purpose of this notebook is to accurately map all inspected ramps--down to the corner of the specified intersection, if possible--and depict them in various visualizations to gain any potential geospatial insight from the data.

An additional functionality would consist of using the data to plot most efficient, ADA-compliant routes from any given point within the half-mile radius to the train station.


Last Update: 8/28/2021

In [None]:
# (seemingly) required for pandas.read_excel()
!pip install openpyxl

In [None]:
import numpy as np
import pandas as pd
import geopandas as gpd

from geopandas.tools import geocode
import folium

agent = "my_colls_app"

# Skip
**If Checkpoint \#1 has been reached (below), skip down and run from there.**

Everything between here and Checkpoint \#1:

1. Reads in the raw data
2. Renames and reorders columns
3. Handles NaNs
4. Cleans and standardizes cross-street location data
5. Automatically geocodes as many locations as possible using GoogleMaps API
6. Returns new csv with most locations geocoded (to minimize API calls)

In [None]:
ramp_path = '../input/rampdatalog/RampDataLog.xlsm'
ramp_df = pd.read_excel(ramp_path, sheet_name='data')
ramp_df.head()

### Data Input
The raw data consist of five columns:
- Cross Street 1: First location identifier
- Cross Street 2: Second location identifier (if applicable)
- Corner: Cardinal direction indicating which corner of the intersection was inspected
- Compliance: "Y" or "N" deeming compliance with ADA
- Notes: Reasons, if any, for marking ramp as compliant or not

In [None]:
# Rename cross street columns to shorten and get rid of spaces
ramp_df.rename(columns={'Cross Street 1': 'CS_1', 'Cross Street 2': 'CS_2'}, inplace=True)
ramp_df.head()

### Locations
While most locations were simply intersections of two cross streets, a handful did not fit into this format so easily.  Some occurred at crosswalks halfway between blocks (Haddon Ave between Washington Ave and Irvin Ave).  Some occurred at non-standard locations (Wawa exit, alleyway).  Others were observed at misaligned four-way intersections (i.e. Maple Ave west of Woodlawn Ave is not in line with Maple Ave east of Woodlawn Ave), hence cardinal corners like "SE (to NE)" and "NW (to SW)."

In [None]:
# raw data QA/QC
for col in ramp_df.columns.tolist()[:-1]:
    print(ramp_df[col].unique())

In [None]:
# "NE" appears twice; one with leading whitespace
ramp_df.replace(' NE', 'NE', inplace=True) #correct instance
ramp_df.Notes.fillna('None', inplace=True) #fill blank notes fields

ramp_df.head(20)

# Data Cleaning
### Unique Street Name Cases

* Two-worded streets (N Atlantic)
* Three-worded streets (S Newton Lake)
* No second cross-street (PATCO Entrance)
* Entries "between" cross-streets (Haddon btw Washington to Irvin)
* Street suffixes already present (Maple Terr)
* Unique second cross-street cases:
    * "alleyway"
    * "Wawa exit"
* Cardinal direction in parentheses in second cross-street (Park (N))

**Assume street suffix is "Ave" unless otherwise specified**

In [None]:
import math

# Combine cross streets into one intersection, if applicable
def cs_comb_str(cs1, cs2):
    """
    Take an intersection's cross streets as arguments and return
    a string of the complete location
    """
    
    inter = ""
    suffixes = ['Ave', 'Ln', 'Terr'] # list of all street suffixes observed
    
    # Add "Ave" suffix before "between" descriptor
    if 'btw' in cs1:
        inter = cs1.split('btw')[0] + 'Ave btw ' + cs2
    # If second location is NaN, use only the first location
    elif type(cs2) != str:
        inter = cs1
    # Add "Ave" suffix before cardinal direction descriptor
    elif '(' in cs2:
        inter = cs1 + ' Ave and ' + cs2.split()[0] + ' Ave ' + cs2.split()[1]
    else:
        # If first location doesn't have a specified suffix, assume "Ave"
        if not any([suf in cs1 for suf in suffixes]):
            cs1 += ' Ave'
        
        # If second location doesn't have a specified suffix, special location, or cardinal identifer, append an "Ave"
        if (not any([suf in cs2 for suf in suffixes]) and
            not any([landmark in cs2 for landmark in ['alleyway', 'exit']]) and
            len(cs2.split()[-1]) > 1):
            cs2 += ' Ave'
        
        inter = ' and '.join([cs1, cs2])
        
    return inter

In [None]:
ramp_df['Inter'] = ramp_df.apply(lambda row: cs_comb_str(row['CS_1'], row['CS_2']), axis=1)

ramp_df.head()

# Geocoding
***Note:*** *This portion of the notebook has been run only **once** to minimize repeat calls to the GoogleMaps API*

Using a secure API key provided by GoogleMaps, each location of a standard format ("XYZ Ave and ABC Ave") is passed to the GoogleMaps API via a URL request.  The resulting JSON is parsed to receive the corresponding latitude and longitude for each location.  All non-standard locations are assigned NaN for the time being.

In [None]:
# GEOCODE INTERSECTIONS USING GOOGLE MAPS API #
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
api_key = user_secrets.get_secret("api_key")

In [None]:
import requests

lat = []
long = []

url_boiler = 'https://maps.googleapis.com/maps/api/geocode/json?address='
# Ignore any non-standard location formats for automated geocoding
unusual_phrases = ['btw', '\(', 'alleyway', 'exit']

for intersection in ramp_df.Inter:
    if not any([phrase in intersection for phrase in unusual_phrases]):
        url_address = '+'.join(intersection.split()) + ',+Collingswood,+NJ'
        url_complete = url_boiler + url_address + '&key=' + api_key
        
        response = requests.get(url_complete)
        resp_json = response.json()
        
        lat.append(resp_json['results'][0]['geometry']['location']['lat'])
        long.append(resp_json['results'][0]['geometry']['location']['lng'])
    else:
        lat.append(np.nan)
        long.append(np.nan)

In [None]:
ramp_df["Lat"] = pd.Series(lat)
ramp_df["Long"] = pd.Series(long)

# Rearrange columns
cols = ramp_df.columns.tolist()
cols = cols[:2] + cols[-3:] + cols[2:5]
ramp_df = ramp_df[cols]

In [None]:
# Export as checkpoint so Google Maps API doesn't need to be requested again
ramp_df.to_csv('CollingswoodADA_LatLong_checkpoint.csv', index=False)

# Checkpoint \#1
---
Start from here by importing cleaned data with latitudes and longitudes, rather than querying Google Maps API every time

In [None]:
ramp_df = pd.read_csv('../input/collingswoodada-clean-latlong/CollingswoodADA_LatLong_checkpoint.csv')
ramp_df.head()

### Geocoding Shortcut Functions
The `geo_short()` function is a shortcut of the GeoPandas geocoding tool using the Nominatim software to call OpenStreetMap for additional (and free) geocoding requests.

*This may be eventually replaced with either a GoogleMaps API request or hardcoding the lat/long if only used for a few locations.  In general, Nominatim has trouble geocoding cross streets and is less accurate than the GoogleMaps API.*

The `basemap_with_buffer()` function establishes a Folium basemap using OpenStreetMap tiles.  The map is centered around a given location, and a circle of a given radius (in miles) is drawn around that point.  This function should be called each time a new map representation is desired (i.e. rather than appending new markers to a current basemap).

In [None]:
def geo_short(location):
    """
    Take address, cross-street, etc. and return geocoded point at which
    lat/long can conveniently accessed.  Uses Nominatim.
    """
    pt = geocode(location, provider="nominatim", user_agent=agent)
    return pt.geometry.iloc[0]

In [None]:
def basemap_with_buffer(location, buffer_radius_miles):
    centerpoint = geo_short(location)
    basemap = folium.Map(location=[centerpoint.y, centerpoint.x], tiles="openstreetmap", zoom_start=15)
    
    buffer_radius_meters = buffer_radius_miles * 5280 / 3.28084 # miles to feet to meters
    basemap_buffer = folium.Circle(location=[centerpoint.y, centerpoint.x],
                                   radius=buffer_radius_meters).add_to(basemap)
    
    return basemap

## Geospatial Data Visualization
Now that latitudes/longitudes have been acquired for most of the curb ramp data, they can now be visualized on a map using various tools provided by Folium.

First, the basemap is displayed below, including the half-mile radius centered around the Collingswood PATCO Station.

In [None]:
patco_address = "100 Lees Ave, Collingswood, NJ 08108"
patco_base = basemap_with_buffer(patco_address, 0.5)
patco_base

In [None]:
# Establish a fresh basemap
patco_base = basemap_with_buffer(patco_address, 0.5)

# For each location that's not NaN, add a generic marker to the basemap.
# Enable the hover functionality for intersection data for QA/QC purposes.
for i, location in ramp_df.iterrows():
    if not np.isnan(location.Lat) and not np.isnan(location.Long):
        folium.Marker(location=[location.Lat, location.Long],tooltip=location.Inter).add_to(patco_base)

patco_base

### Location Markers
As can be seen above, most of the locations have been accurately mapped, but a small handful appear clearly out of place, as evidenced by their distance outside the circle.  Hovering over these points, one can see which intersection the API was *trying* to geocode.  It appears that most of the problem points are ones involving Park Avenue.

*Note that a few markers are correctly mapped outside of the circle but only because the inspection was conservative in its data collection.*

Another way of visualizing the data is using a Folium plugin known as `MarkerCluster()`.  Depending on the zoom level, this tool will group markers of close proximity together, label the quantity grouped, and color code them based on the quantity.  This representation is also more accurate as it will include *all* data points, whereas the marker map above only shows one if many are found to be in the same spot.  Click on any cluster to have it display the markers it contains.

In [None]:
from folium.plugins import MarkerCluster

# Establish a fresh basemap so as not to overlap previous markers
patco_base = basemap_with_buffer(patco_address, 0.5)
marker_cluster = folium.plugins.MarkerCluster()

for i, location in ramp_df.iterrows():
    if not np.isnan(location.Lat) and not np.isnan(location.Long):
        marker_cluster.add_child(folium.Marker([location.Lat, location.Long], tooltip=location.Inter))

patco_base.add_child(marker_cluster)
patco_base

### Preliminary Analysis
The simplest way of gaining even a little insight out of the data is to color-code the mapped locations based on their compliance with ADA.  The code cell below maps locations as circles: green if compliant (Y) and red if not (N).  From here, any groupings or patterns can be observed.

This data can and must be improved, however.  Not every ramp is displayed here: only one per location is currently portrayed.  Further, the locations with non-standard formats, as explained above, have still not been included.  We will come back to this after some more data cleaning.

In [None]:
patco_base = basemap_with_buffer(patco_address, 0.5)

for i, location in ramp_df.iterrows():
    if not np.isnan(location.Lat) and not np.isnan(location.Long):
        color = "green" if location.Compliance == "Y" else "red"
        folium.Circle(location=[location.Lat, location.Long], radius=10,
                      color=color, tooltip=location.Inter).add_to(patco_base)

patco_base

In [None]:
# THIS IS TO BE UPDATED TO ACCOUNT FOR ALL MISCODED LOCATIONS #

# Only non-compliant locations are displayed below.  Miscoded locations seen above have been hard-coded
# with their respective latitudes and longitudes for simplicity purposes.
patco_base = basemap_with_buffer(patco_address, 0.5)

for i, location in ramp_df.iterrows():
    if not np.isnan(location.Lat) and not np.isnan(location.Long) and location.Compliance == "N":
        color = "green" if location.Compliance == "Y" else "red"
        if location.CS_1 == "Park":
            if location.CS_2 == "Cuthbert":
                lat = 39.9103431
                lng = -75.0586094
            elif location.CS_2 == "Ogden":
                lat = 39.9109968
                lng = -75.0606443
            elif location.CS_2 == "Conard":
                lat = 39.9115682
                lng = -75.062489
        elif location.CS_1 == "Laurel" and location.CS_2 == "Lincoln":
            lat = 39.9201176
            lng = -75.0627509
        elif location.CS_1 == "Cuthbert" and location.CS_2 == "Lindisfarne":
            lat = 39.9121536
            lng = -75.0576633
        else:
            lat = location.Lat
            lng = location.Long
                
        folium.Circle(location=[lat, lng], radius=10,
                      color=color, tooltip=location.Inter).add_to(patco_base)

patco_base

# Sandbox
Below are cells of code acting as notes for future developments of the dataset's analysis.

In [None]:
dummy = ["951 Oriental Ave", "103 E Linden Ave", "100 E Homestead Ave", "101 E Stiles Ave", "900 Haddon Ave"]
dummy = [dummy[i] + ", Collingswood, NJ 08108" for i in range(len(dummy))]

for add in dummy:
    add_pt = geo_short(add)
    folium.Circle(location=[add_pt.y, add_pt.x], radius=5, tooltip=add.split(',')[0], color='red').add_to(patco_base)

folium.PolyLine(([geo_short(dummy[1]).y, geo_short(dummy[1]).x], [geo_short(dummy[2]).y, geo_short(dummy[2]).x])).add_to(patco_base)

patco_base

In [None]:
dummy_df = pd.DataFrame({
    'Address_Long': dummy,
    'Address_Short': [dum.split(',')[0] for dum in dummy],
    'Town': [dum.split(',')[1].strip() for dum in dummy],
    'State': [dum.split(',')[2].split(' ')[1] for dum in dummy],
    'Zip': [dum.split(',')[2].split(' ')[2] for dum in dummy],
    'Lat': [geo_short(dum).y for dum in dummy],
    'Long': [geo_short(dum).x for dum in dummy]
})

dummy_gdf = gpd.GeoDataFrame(dummy_df, geometry=gpd.points_from_xy(dummy_df.Long, dummy_df.Lat))

In [None]:
from shapely.geometry import Point, LineString