# Estimate Bus Stops

My first thought for as solution was clustering, but I thought it would be nice to not only have a discrete estimate, but a probability distribution that tells us the probability that a bus stop is at a certain location. I used an adaptation of the [gaussian kernel density estimator](https://en.wikipedia.org/wiki/Multivariate_kernel_density_estimation) from numpy to estimate this distribution from the points. I needed to adapt the algorithm because I wanted to be able to adjust a points importance by certain attributes (e.g. accuracy).

This notebook does the visualization and shows how to use the estimation package. 

First we load the data and calculate the weights:

In [None]:
import matplotlib
import folium
from folium import plugins
import numpy as np
import dalladalla.dallacrowd
import geopandas

points = geopandas.read_file('data/activity_points.geojson')
routes = geopandas.read_file('data/routes.geojson')

weights = np.array(points.accuracy, np.float)
#weights = 1/weights
weights = (weights - weights.min()) / (weights.max() - weights.min())

Now we can estimate the probability and the predicted bus stops:

In [2]:
estimator = dalladalla.dallacrowd.Estimator(points.geometry, weights)
bus_stops = estimator.estimate_stops(weighted=True, resolution=200, snap_to_street=False)

We can use `folium` to visualize the results. Estimated locations are marked by standard markers and datapoints are shown by polygon markers. The estimated probability is shown by an overlay.

In [None]:
%matplotlib inline

In [8]:
from matplotlib import cm

center = (estimator.boundaries[[2,3]].mean(),
          estimator.boundaries[[0,1]].mean())

map = folium.Map(location=center)

#add a marker for every record in the filtered data, use a clustered view
for stop in bus_stops:
    map.add_child(folium.Marker(stop))

coordinates = np.array([p.xy for p in points.geometry])
mc = folium.MarkerCluster(coordinates)
for coord in coordinates:
    m = folium.RegularPolygonMarker((coord[1],coord[0]), weight=1, opacity=0.4, color='green')
    mc.add_child(m)

map.add_children(mc)

for route in routes.geometry:
    coordinate_list = np.array([route.xy[1],route.xy[0]]).T.tolist()
    map.add_child(folium.PolyLine(locations=coordinate_list))


import copy
red_alpha = copy.copy(cm.Reds)
red_alpha._init()
red_alpha._lut[0,3] = 0

probabilities = estimator.stop_prob
normed_data = (probabilities - probabilities.min()) / (probabilities.max() - probabilities.min())
bounds = estimator.boundaries
bounds = np.array([bounds[[2,0]], bounds[[3,1]]]).tolist()
map.add_child(plugins.ImageOverlay(red_alpha(normed_data), bounds, opacity=.8, origin='lower'))
map

# Downgrade points further away from the routes

I wasn't too convinced by the results, so I added the distance to the given routes as a weighting factor.

In [9]:
distance = np.array(estimator.route_dist(routes.geometry))

In [10]:
d = (distance - distance.min()) / (0.01 - distance.min())
d[d > 1] = 1

In [11]:
estimator.weights = (estimator.weights + (1-d))/2
bus_stops = estimator.estimate_stops(weighted=True, resolution=200, snap_to_street=False)

In [12]:
map = folium.Map(location=center)

#add a marker for every record in the filtered data, use a clustered view
for stop in bus_stops:
    map.add_child(folium.Marker(stop))

coordinates = np.array([p.xy for p in points.geometry])
mc = folium.MarkerCluster(coordinates)
for coord in coordinates:
    m = folium.RegularPolygonMarker((coord[1],coord[0]), weight=1, opacity=0.4, color='green')
    mc.add_child(m)

map.add_children(mc)

for route in routes.geometry:
    coordinate_list = np.array([route.xy[1],route.xy[0]]).T.tolist()
    map.add_child(folium.PolyLine(locations=coordinate_list))


import copy
red_alpha = copy.copy(cm.Reds)
red_alpha._init()
red_alpha._lut[0,3] = 0

probabilities = estimator.stop_prob
normed_data = (probabilities - probabilities.min()) / (probabilities.max() - probabilities.min())
bounds = estimator.boundaries
bounds = np.array([bounds[[2,0]], bounds[[3,1]]]).tolist()
map.add_child(plugins.ImageOverlay(red_alpha(normed_data), bounds, opacity=.8, origin='lower'))
map

This slightly changes the picture, but doesn't improve it very much.

# Ground Truth

I noticed that there already are bus stops mapped on OSM. So this is the "true" result I should be able to predict. I used the overpass API to extract the kown bus stops from OSM and display them on the map:

In [16]:
import overpass

api = overpass.API()
query = 'node["name"="Dar es Salaam"];node(around:15000)["highway"="bus_stop"];node(around:10000)["amenity"="bus_station"];'
true_stops = api.Get(query, asGeoJSON=True)

In [14]:
for stop in true_stops['features']:
    coord = stop["geometry"]["coordinates"]
    m = folium.RegularPolygonMarker((coord[1],coord[0]), weight=.5, opacity=1, color='red', fill_color='red')
    map.add_child(m)


In [15]:
map

This shows quite plainly that with the given amount of data we are still far away from a good prediction in most cases. The method seems to be promising for the stops in the center we have enough data for. It would be good to do some filtering of data that is probably not caused by bus traffic before starting the estimation. This could be based on the other features we have for the datapoints.