# 7-Calculate Cluster Names

In this notebook we will use the locations that were used to create the clusters to match against OpenStreetMap's data in order to derive a human-readable name for each cluster. These clusters are mainly street intersections in Ann Arbor, so names will try and reflect that.

**Requirements:**

- Please run the `5-trip-endpoints.ipynb` notebook first and its dependencies.
- Recommended install: [ipywidgets](https://ipywidgets.readthedocs.io/en/stable/user_install.html). Enable using `jupyter nbextension enable --py widgetsnbextension --sys-prefix` for Jupyter Notebook and `jupyter labextension install @jupyter-widgets/jupyterlab-manager` for Jupyter Lab.

In [1]:
import os
import math
import folium
import requests
import json
import random
import folium
import numpy as np
import folium
import matplotlib.pyplot as plt

from folium.vector_layers import PolyLine, CircleMarker
from h3 import h3
from sqlapi import VedDb
from tqdm.notebook import tqdm
from osm.models import OSMNode, OSMWay, OSMNet
from shapely.geometry import Polygon
from shapely.ops import cascaded_union

Create an object of the `VedDB` type to interface with the database.

In [2]:
db = VedDb()

The function `get_cluster_points` retrieves all the geographic locations that define a particular cluster, or endpoint. Note how the single parameter is passed enclosed in a list.

In [3]:
def get_cluster_points(cluster):
    sql = """
    select latitude, longitude from cluster_point where cluster_id=?
    """
    return db.query(sql, [cluster])

Retrieve all the cluster identifiers from the database.

In [4]:
sql = """
select distinct cluster_id from cluster_point
"""
clusters = [c[0] for c in db.query(sql)]

In this section we calculate the cluster names using data from OpenStreetMap's [OverPass API](https://wiki.openstreetmap.org/wiki/Overpass_API).

In [5]:
overpass_url = "http://overpass-api.de/api/interpreter"

The code below calculates the bouding boxes for all clusters.

In [6]:
sql = """
select   cluster_id
,        min(latitude)
,        min(longitude)
,        max(latitude)
,        max(longitude)
from     cluster_point
group by cluster_id
"""
# s, w, n, e
bounding_boxes = {bb[0]: (bb[1], bb[2], bb[3], bb[4]) for bb in db.query(sql)}

Create a directory for the data sourced from OSM per cluster.

In [7]:
os.makedirs("./data/bbox", exist_ok=True)

Now, iterate over the clusters and retrieve the respective OSM data. If the cache file does not exist, get the data from the OSM API and store it in a cluster-specific file. The conversion magic happens in the `OSMNet` class, in `./osm/models.py`. The function `from_overpass` creates an `OSMNet` object per cluster containing two collections: nodes (`OSMNode`) and ways (`OSMWay`).

In [9]:
c_nets = dict()
c_names = dict()
for cluster in tqdm(clusters):
    file_name = "./data/bbox/bb-{0}.json".format(cluster)
    
    if os.path.isfile(file_name):
        with open(file_name) as f:
            txt = f.read()
            osm_data = json.loads(txt)
    else:
        overpass_query = "[out:json];(way[highway]{0};);(._;>;);out body;".format(bounding_boxes[cluster])
        response = requests.get(overpass_url, params={'data': overpass_query})
        osm_data = response.json()
        
        with open(file_name, "wt") as f:
            f.write(json.dumps(osm_data))
            
    c_nets[cluster] = OSMNet.from_overpass(osm_data)
        
    points = np.array(get_cluster_points(cluster))
    c_names[cluster] = c_nets[cluster].get_name(points)

HBox(children=(FloatProgress(value=0.0, max=312.0), HTML(value='')))




The `create_map_polygon` function below receives a list of point coordinates and plots them on a map as a closed polygon. We will use it to display the cluster's outer shape.

In [10]:
def create_map_polygon(xy, tooltip='',
                       color='#3388ff',
                       opacity=0.7,
                       fill_color='#3388ff',
                       fill_opacity=0.4, 
                       weight=3):
    points = [[x[0], x[1]] for x in xy]
    polygon = folium.vector_layers.Polygon(locations=points,
                                           tooltip=tooltip,
                                           fill=True,
                                           color=color,
                                           fill_color=fill_color,
                                           fill_opacity=fill_opacity,
                                           weight=weight,
                                           opacity=opacity)
    return polygon

In [20]:
cluster = 10

Retrieve all the H3 hexagon codes pertaining to a given cluster.

In [21]:
sql = """
select h3 from cluster_point where cluster_id = ?
"""
hexes = list({h[0] for h in db.query(sql, [cluster])})

Create the Folium map, as usual.

In [22]:
tiles = "cartodbpositron"
map = folium.Map(prefer_canvas=True, control_scale=True, max_zoom=18)
t = folium.TileLayer(tiles).add_to(map)

Calculate the bounding box for the whole cluster based on the hexagon's coordinates (not the underlying point coordinates).

In [23]:
bb_list = []  # List for the bounding-box calculation
polygons = []
for h in hexes:
    points = h3.h3_to_geo_boundary(h3_address=h)
    xy = [[x[1], x[0]] for x in points]
    xy.append([points[0][1], points[0][0]])
    polygons.append(Polygon(xy))
    bb_list.extend(points)

Merge all hexagons into a single polygon.

In [24]:
merged = cascaded_union(polygons)

Retrieve the cluster name, based on OSM information.

In [25]:
cluster_name = c_names[cluster]

Place all cluster points on the map.

In [26]:
for point in get_cluster_points(cluster):
    CircleMarker(point, radius=1).add_to(map)

Create and display the cluster polygon on the map.

In [27]:
if merged.geom_type == "MultiPolygon":
    max_len = 0
    largest = None
    for geom in merged.geoms:
        xy = geom.exterior.coords.xy
        lxy = list(zip(xy[1], xy[0]))
        create_map_polygon(lxy, tooltip=cluster_name).add_to(map)
elif merged.geom_type == "Polygon":
    xy = merged.exterior.coords.xy
    lxy = list(zip(xy[1], xy[0]))

    create_map_polygon(lxy, tooltip=cluster_name).add_to(map)

Finally, display the map with the cluster name at the tooltip.

In [28]:
locations = np.array(bb_list)
min_lat, max_lat = locations[:, 0].min(), locations[:, 0].max()
min_lon, max_lon = locations[:, 1].min(), locations[:, 1].max()
map.fit_bounds([[min_lat, min_lon], [max_lat, max_lon]])
map