# 7-Cluster Naming Using Quadkeys

_Quadkey_ version - Uses Bing Maps Tile System quadkeys to cache the OSM data.

In this notebook we will use the locations that were used to create the clusters to match against OpenStreetMap's data in order to derive a human-readable name for each cluster. These clusters are mainly street intersections in Ann Arbor, so names will try and reflect that.

**Requirements:**

- Please run the `06-cluster-geofencing.ipynb` notebook first and its dependencies.
- Recommended install: [ipywidgets](https://ipywidgets.readthedocs.io/en/stable/user_install.html). Enable using `jupyter nbextension enable --py widgetsnbextension --sys-prefix` for Jupyter Notebook and `jupyter labextension install @jupyter-widgets/jupyterlab-manager` for Jupyter Lab.

In [None]:
import os
import math
import folium
import requests
import json
import random
import numpy as np
import folium
import ipywidgets as widgets

from pyquadkey2 import quadkey
from pyquadkey2.quadkey import TileAnchor, QuadKey
from folium.vector_layers import PolyLine, CircleMarker
from h3 import h3
from db.api import VedDb
from tqdm.auto import tqdm
from osm.models import OSMNode, OSMWay, OSMNet
from shapely.geometry import Polygon
from shapely.ops import cascaded_union
from ipywidgets import interact, interactive, fixed, interact_manual

Create an object of the `VedDB` type to interface with the database.

In [None]:
db = VedDb()

The function `get_cluster_points` retrieves all the geographic locations that define a particular cluster, or endpoint. Note how the single parameter is passed enclosed in a list.

In [None]:
def get_cluster_points(cluster):
    sql = "select latitude, longitude from cluster_point where cluster_id=?"
    return db.query(sql, [cluster])

Retrieve all the cluster identifiers from the database.

In [None]:
sql = "select distinct cluster_id from cluster_point"
clusters = [c[0] for c in db.query(sql)]

## Querying OpenStreetMap Data

In this section we calculate the cluster names using data from OpenStreetMap's [OverPass API](https://wiki.openstreetmap.org/wiki/Overpass_API).

We start by calculating all the clusters' bounding boxes, using the code below. Each bounding box delimits the query to OSM's API to get al the relevant geographic information.

In [None]:
sql = """
select   cluster_id
,        min(latitude)
,        min(longitude)
,        max(latitude)
,        max(longitude)
from     cluster_point
group by cluster_id
"""
# s, w, n, e
bounding_boxes = {bb[0]: (bb[1], bb[2], bb[3], bb[4]) for bb in db.query(sql)}

Create a directory for the data sourced from OSM per cluster. This directory will contain OSM data partitioned by QuadKey.

In [None]:
os.makedirs("./data/qk-cache", exist_ok=True)

Now, iterate over the clusters and retrieve the respective OSM data. If the cache file does not exist, get the data from the OSM API and store it in a cluster-specific file. The conversion magic happens in the `OSMNet` class, in `./osm/models.py`. The function `from_overpass` creates an `OSMNet` object per cluster containing two collections: nodes (`OSMNode`) and ways (`OSMWay`).

The `query_osm` function queries the OSM servers for a "rectangle" of data delimited by the location parameters: _south_, _west_, _north_ and _east_.

In [None]:
def query_osm(s, w, n, e, overpass_url ="http://overpass-api.de/api/interpreter"):
    overpass_query = "[out:json];(way[highway]{0};);(._;>;);out body;".format((s, w, n, e))
    response = requests.get(overpass_url, params={'data': overpass_query})
    osm_data = response.json()
    return osm_data    

The `get_osm_by_qk` retrieves the OSM data given a single QuadKey code. If the data is already in the cache, it is not requested again from the server.

In [None]:
def get_osm_by_qk(qk):
    file_name = "./data/qk-cache/{0}.json".format(qk)
    
    if os.path.isfile(file_name):
        with open(file_name) as f:
            txt = f.read()
            osm_data = json.loads(txt)
    else:
        sw = qk.to_geo(anchor=TileAnchor.ANCHOR_SW)
        ne = qk.to_geo(anchor=TileAnchor.ANCHOR_NE)
        osm_data = query_osm(sw[0], sw[1], ne[0], ne[1])

        with open(file_name, "wt") as f:
            f.write(json.dumps(osm_data))
    return osm_data

The `get_elements` function extracts a single type of element from an OSM dictionary. Here, we will only use the `node` and `way` types.

In [None]:
def get_osm_elements(osm, type_name):
    return [n for n in osm["elements"] if n["type"] == type_name]

The `merge_osm` function merges two OSM dictionaries into one.

In [None]:
def merge_osm(osm0, osm1):
    node_dict = {n["id"]: n for n in get_osm_elements(osm0, "node")}
    for n in get_osm_elements(osm1, "node"):
        if n["id"] not in node_dict:
            node_dict[n["id"]] = n
            
    ways0 = {w["id"]: w for w in get_osm_elements(osm0, "way")} 
    ways1 = {w["id"]: w for w in get_osm_elements(osm1, "way")}
    
    way_list = []
    
    for w_id in ways0.keys():
        if w_id in ways1:
            w0 = ways0[w_id]
            w1 = ways1[w_id]
            nodes = list(set(w0["nodes"]).union(set(w1["nodes"])))
            way_list.append({"type": "way",
                             "id": w_id,
                             "nodes": nodes,
                             "tags": w0["tags"]})
        else:
            way_list.append(ways0[w_id])
            
    for w_id in ways1.keys():
        if w_id not in ways0:
            way_list.append(ways1[w_id])
    
    osm = {"elements": list(node_dict.values()) + way_list}
    return osm

The `merge_osm_list` merges a list of OSM dictionaries into one. Note that only the `elements` key is kept.

In [None]:
def merge_osm_list(osm_list):
    osm = osm_list[0]
    for i in range(1, len(osm_list)):
        osm = merge_osm(osm, osm_list[i])
    return osm

The `get_cluster_qk` callculates all the unique quadkey codes corresponding to the points in a cluster.

In [None]:
def get_cluster_qk(cluster, level=18):
    return list(set([quadkey.from_geo(p, level) for p in get_cluster_points(cluster)]))

The `get_osm_data` retrieves all the OSM data for a single cluster, discretized as fixed-level QuadKeys.

In [None]:
def get_osm_data(cluster, overpass_url ="http://overpass-api.de/api/interpreter", level=18):
    qk_list = get_cluster_qk(cluster, level=level)
    osm_list = [get_osm_by_qk(qk) for qk in qk_list]

    osm_data = merge_osm_list(osm_list)
    return osm_data

In [None]:
c_nets = dict()    # OSMNet dict
c_names = dict()   # Cluster name dict
for cluster in tqdm(clusters):
    osm_data = get_osm_data(cluster)

    c_nets[cluster] = OSMNet.from_overpass(osm_data)
        
    points = np.array(get_cluster_points(cluster))
    c_names[cluster] = c_nets[cluster].get_name(points)

The `create_map_polygon` function below receives a list of point coordinates and plots them on a map as a closed polygon. We will use it to display the cluster's outer shape.

In [None]:
def create_map_polygon(xy, tooltip='',
                       color='#3388ff',
                       opacity=0.7,
                       fill_color='#3388ff',
                       fill_opacity=0.4, 
                       weight=3):
    points = [[x[0], x[1]] for x in xy]
    polygon = folium.vector_layers.Polygon(locations=points,
                                           tooltip=tooltip,
                                           fill=True,
                                           color=color,
                                           fill_color=fill_color,
                                           fill_opacity=fill_opacity,
                                           weight=weight,
                                           opacity=opacity)
    return polygon

The `create_map_quadkey` function creates the map polygon for the projected square corresponding to a quadkey code.

In [None]:
def create_map_quadkey(qk, tooltip='',
                       color='#3388ff',
                       opacity=0.7,
                       fill_color='#3388ff',
                       fill_opacity=0.4, 
                       weight=3):
    sw = qk.to_geo(anchor=TileAnchor.ANCHOR_SW)
    ne = qk.to_geo(anchor=TileAnchor.ANCHOR_NE)
    s, w = sw[0], sw[1]
    n, e = ne[0], ne[1]
    points = [[n, e], [s, e], [s, w], [n, w]]
    return create_map_polygon(points, tooltip=tooltip, 
                              color=color, opacity=opacity, 
                              fill_color=fill_color, fill_opacity=fill_opacity,
                              weight=weight)

In [None]:
def get_cluster_hexes(cluster_id):
    sql = "select h3 from cluster_point where cluster_id = ?"
    hexes = list({h[0] for h in db.query(sql, [cluster_id])})
    return hexes

In [None]:
def get_hexagon(h):
    geo_lst = [p for p in h3.h3_to_geo_boundary(h)]
    geo_lst.append(geo_lst[0])
    return np.array(geo_lst)

In [None]:
def show_named_cluster_map(cluster_id):
    html_map = folium.Map(prefer_canvas=True, control_scale=True, 
                          max_zoom=18, tiles="cartodbpositron")
        
    cluster_qk = get_cluster_qk(cluster_id)
    for qk in cluster_qk:
        create_map_quadkey(qk, tooltip=qk, weight=1, color='red', fill_color='white', fill_opacity=0).add_to(html_map)
    
    bb_list = []  # List for the bounding-box calculation
    polygons = []
    hexes = get_cluster_hexes(cluster_id)
    for h in hexes:
        points = get_hexagon(h)
        xy = [[x[1], x[0]] for x in points]
        xy.append([points[0][1], points[0][0]])
        polygons.append(Polygon(xy))
        bb_list.extend(points)
        
    merged = cascaded_union(polygons)
    cluster_name = c_names[cluster_id]
    
    for point in get_cluster_points(cluster_id):
        CircleMarker(point, radius=1).add_to(html_map)
    
    if merged.geom_type == "MultiPolygon":
        max_len = 0
        largest = None
        for geom in merged.geoms:
            xy = geom.exterior.coords.xy
            lxy = list(zip(xy[1], xy[0]))
            create_map_polygon(lxy, tooltip=cluster_name).add_to(html_map)
    elif merged.geom_type == "Polygon":
        xy = merged.exterior.coords.xy
        lxy = list(zip(xy[1], xy[0]))

        create_map_polygon(lxy, tooltip=cluster_name).add_to(html_map)
        
    locations = np.array(bb_list)
    min_lat, max_lat = locations[:, 0].min(), locations[:, 0].max()
    min_lon, max_lon = locations[:, 1].min(), locations[:, 1].max()
    html_map.fit_bounds([[min_lat, min_lon], [max_lat, max_lon]])
    return html_map

Get the number of cluster counts from the database (_I could have used the length of the_ `bounding_boxes` _list, I know..._)

In [None]:
sql = "select count(distinct cluster_id) from cluster_point"
cluster_count = db.query_scalar(sql)

Let's see the result of our efforts in an interactive fashion!

In [None]:
ii = interact(show_named_cluster_map, cluster_id=widgets.IntSlider(min=0, max=cluster_count-1, step=1, value=0))

## Serialize Clusters to the Database

We can now serialize the cluster names to the database. As before, we create a new table named `cluster` with the cluster identifier and its new found name. If the table already exists, we merely delete all its contents.

In [None]:
if not db.table_exists("cluster"):
    sql = """
    CREATE TABLE cluster (
        cluster_id      INTEGER PRIMARY KEY ASC,
        cluster_name    TEXT
    )
    """
    db.execute_sql(sql)
else:
    db.execute_sql("DELETE FROM cluster")

Collect all the cluster identifiers and their names to a list of tuples, the format for the bulk insertion process.

In [None]:
cluster_data = [(c, c_names[c]) for c in range(cluster_count)]

Now insert the data into the database using a pre-stored insert statement. You can fine the statement definition under the `/db/sql/cluster` folder.

In [None]:
db.insert_list("cluster/insert", cluster_data)

We are done. What more explorations await us?