# 7-Cluster Naming

In this notebook we will use the locations that were used to create the clusters to match against OpenStreetMap's data in order to derive a human-readable name for each cluster. These clusters are mainly street intersections in Ann Arbor, so names will try and reflect that.

**Requirements:**

- Please run the `06-cluster-geofencing.ipynb` notebook first and its dependencies.
- Recommended install: [ipywidgets](https://ipywidgets.readthedocs.io/en/stable/user_install.html). Enable using `jupyter nbextension enable --py widgetsnbextension --sys-prefix` for Jupyter Notebook and `jupyter labextension install @jupyter-widgets/jupyterlab-manager` for Jupyter Lab.

In [None]:
import os
import math
import folium
import requests
import json
import random
import folium
import numpy as np
import folium
import matplotlib.pyplot as plt
import ipywidgets as widgets

from folium.vector_layers import PolyLine, CircleMarker
from h3 import h3
from sqlapi import VedDb
from tqdm.auto import tqdm
from osm.models import OSMNode, OSMWay, OSMNet
from shapely.geometry import Polygon
from shapely.ops import cascaded_union
from ipywidgets import interact, interactive, fixed, interact_manual

Create an object of the `VedDB` type to interface with the database.

In [None]:
db = VedDb()

The function `get_cluster_points` retrieves all the geographic locations that define a particular cluster, or endpoint. Note how the single parameter is passed enclosed in a list.

In [None]:
def get_cluster_points(cluster):
    sql = "select latitude, longitude from cluster_point where cluster_id=?"
    return db.query(sql, [cluster])

Retrieve all the cluster identifiers from the database.

In [None]:
sql = "select distinct cluster_id from cluster_point"
clusters = [c[0] for c in db.query(sql)]

## Querying OpenStreetMap Data

In this section we calculate the cluster names using data from OpenStreetMap's [OverPass API](https://wiki.openstreetmap.org/wiki/Overpass_API).

We start by calculating all the clusters' bounding boxes, using the code below. Each bounding box delimits the query to OSM's API to get al the relevant geographic information.

In [None]:
sql = """
select   cluster_id
,        min(latitude)
,        min(longitude)
,        max(latitude)
,        max(longitude)
from     cluster_point
group by cluster_id
"""
# s, w, n, e
bounding_boxes = {bb[0]: (bb[1], bb[2], bb[3], bb[4]) for bb in db.query(sql)}

Create a directory for the data sourced from OSM per cluster.

In [None]:
os.makedirs("./data/bbox", exist_ok=True)

Now, iterate over the clusters and retrieve the respective OSM data. If the cache file does not exist, get the data from the OSM API and store it in a cluster-specific file. The conversion magic happens in the `OSMNet` class, in `./osm/models.py`. The function `from_overpass` creates an `OSMNet` object per cluster containing two collections: nodes (`OSMNode`) and ways (`OSMWay`).

The `get_osm_data` retrieves the OSM data associated with a cluster while abstracting away the file cache.

In [None]:
def get_osm_data(cluster, overpass_url ="http://overpass-api.de/api/interpreter"):
    file_name = "./data/bbox/bb-{0}.json".format(cluster)
    
    if os.path.isfile(file_name):
        with open(file_name) as f:
            txt = f.read()
            osm_data = json.loads(txt)
    else:
        overpass_query = "[out:json];(way[highway]{0};);(._;>;);out body;".format(bounding_boxes[cluster])
        response = requests.get(overpass_url, params={'data': overpass_query})
        osm_data = response.json()
        
        with open(file_name, "wt") as f:
            f.write(json.dumps(osm_data))
    return osm_data

In [None]:
c_nets = dict()    # OSMNet dict
c_names = dict()   # Cluster name dict
for cluster in tqdm(clusters):
    osm_data = get_osm_data(cluster)

    c_nets[cluster] = OSMNet.from_overpass(osm_data)
        
    points = np.array(get_cluster_points(cluster))
    c_names[cluster] = c_nets[cluster].get_name(points)

The `create_map_polygon` function below receives a list of point coordinates and plots them on a map as a closed polygon. We will use it to display the cluster's outer shape.

In [None]:
def create_map_polygon(xy, tooltip='',
                       color='#3388ff',
                       opacity=0.7,
                       fill_color='#3388ff',
                       fill_opacity=0.4, 
                       weight=3):
    points = [[x[0], x[1]] for x in xy]
    polygon = folium.vector_layers.Polygon(locations=points,
                                           tooltip=tooltip,
                                           fill=True,
                                           color=color,
                                           fill_color=fill_color,
                                           fill_opacity=fill_opacity,
                                           weight=weight,
                                           opacity=opacity)
    return polygon

In [None]:
def get_cluster_hexes(cluster_id):
    sql = "select h3 from cluster_point where cluster_id = ?"
    hexes = list({h[0] for h in db.query(sql, [cluster_id])})
    return hexes

In [None]:
def get_hexagon(h):
    geo_lst = [p for p in h3.h3_to_geo_boundary(h)]
    geo_lst.append(geo_lst[0])
    return np.array(geo_lst)

In [None]:
def show_named_cluster_map(cluster_id):
    html_map = folium.Map(prefer_canvas=True, control_scale=True, 
                          max_zoom=18, tiles="cartodbpositron")
    
    bb_list = []  # List for the bounding-box calculation
    polygons = []
    hexes = get_cluster_hexes(cluster_id)
    for h in hexes:
        points = get_hexagon(h)
        xy = [[x[1], x[0]] for x in points]
        xy.append([points[0][1], points[0][0]])
        polygons.append(Polygon(xy))
        bb_list.extend(points)
        
    merged = cascaded_union(polygons)
    cluster_name = c_names[cluster_id]
    
    for point in get_cluster_points(cluster_id):
        CircleMarker(point, radius=1).add_to(html_map)
    
    if merged.geom_type == "MultiPolygon":
        max_len = 0
        largest = None
        for geom in merged.geoms:
            xy = geom.exterior.coords.xy
            lxy = list(zip(xy[1], xy[0]))
            create_map_polygon(lxy, tooltip=cluster_name).add_to(html_map)
    elif merged.geom_type == "Polygon":
        xy = merged.exterior.coords.xy
        lxy = list(zip(xy[1], xy[0]))

        create_map_polygon(lxy, tooltip=cluster_name).add_to(html_map)
        
    locations = np.array(bb_list)
    min_lat, max_lat = locations[:, 0].min(), locations[:, 0].max()
    min_lon, max_lon = locations[:, 1].min(), locations[:, 1].max()
    html_map.fit_bounds([[min_lat, min_lon], [max_lat, max_lon]])
    return html_map

Get the number of cluster counts from the database (_I could have used the length of the_ `bounding_boxes` _list, I know..._)

In [None]:
sql = "select count(distinct cluster_id) from cluster_point"
cluster_count = db.query_scalar(sql)

Let's see the result of our efforts in an interactive fashion!

In [None]:
ii = interact(show_named_cluster_map, cluster_id=widgets.IntSlider(min=0, max=cluster_count-1, step=1, value=0))

## Serialize Clusters to the Database

We can now serialize the cluster names to the database. As before, we create a new table named `cluster` with the cluster identifier and its new found name. If the table already exists, we merely delete all its contents.

In [None]:
if not db.table_exists("cluster"):
    sql = """
    CREATE TABLE cluster (
        cluster_id      INTEGER PRIMARY KEY ASC,
        cluster_name    TEXT
    )
    """
    db.execute_sql(sql)
else:
    db.execute_sql("DELETE FROM cluster")

Collect all the cluster identifiers and their names to a list of tuples, the format for the bulk insertion process.

In [None]:
cluster_data = [(c, c_names[c]) for c in range(cluster_count)]

Now insert the data into the database using a pre-stored insert statement. You can fine the statement definition under the `/db/sql/cluster` folder.

In [None]:
db.insert_list("cluster/insert", cluster_data)

We are done. What more explorations await us?