# 9-Cluster Trajectories
In this notebook we use the results from the previous ones to display trajectory clusters on a map. We have identified the most important stopping locations and the trips among them. Now, we can dive into these and determine the number of different trajectories.

**Requirements:**

- Please run the `07-cluster-names.ipynb` notebook first and its dependencies.
- Recommended install: [ipywidgets](https://ipywidgets.readthedocs.io/en/stable/user_install.html). Enable using `jupyter nbextension enable --py widgetsnbextension --sys-prefix` for Jupyter Notebook and `jupyter labextension install @jupyter-widgets/jupyterlab-manager` for Jupyter Lab.

In [None]:
import numpy as np
import pandas as pd
import folium
import hdbscan
import ipywidgets as widgets

from sqlapi import VedDb
from distances.frechet import FastDiscreteFrechetMatrix, LinearDiscreteFrechet, earth_haversine
from tqdm.auto import tqdm
from colour import Color

from h3 import h3
from shapely.geometry import Polygon
from shapely.ops import cascaded_union
from ipywidgets import interact, fixed

Create an object of the `VedDB` type to interface with the database.

In [None]:
db = VedDb()

## Supporting Functions

These are the supporting functions for the whole notebook. The `get_trajectories` function returns all the trips between the given cluster identifiers as a NumPy array of arrays.

In [None]:
def get_trajectories(cluster_ini, cluster_end):
    sql = """
    select   vehicle_id
    ,        day_num
    ,        ts_ini
    ,        ts_end
    from     move
    where    cluster_ini = ? and cluster_end = ?;"""
    moves = db.query(sql, (cluster_ini, cluster_end))
    
    sql = """
    select   latitude
    ,        longitude
    from     signal
    where    vehicle_id = ? and day_num = ? and time_stamp <= ?
    """
    
    trajectories = []
    for move in tqdm(moves):
        trajectory = db.query_df(sql, parameters=[move[0], move[1], move[3]]) \
                       .drop_duplicates(subset=["latitude", "longitude"]) \
                       .to_numpy()
        trajectories.append(trajectory)
    return trajectories

The function `get_cluster_hexes` retrieves from the databse all H3 indexes that make up a given cluster.

In [None]:
def get_cluster_hexes(cluster_id):
    sql = "select h3 from cluster_point where cluster_id = ?"
    hexes = list({h[0] for h in db.query(sql, [cluster_id])})
    return hexes

The `get_hexagon` function converts an H3 index to a geospatial polygon.

In [None]:
def get_hexagon(h):
    geo_lst = list(h3.h3_to_geo_boundary(h))
    geo_lst.append(geo_lst[0])
    return np.array(geo_lst)

The `create_map_polygon` creates a Folium polygon.

In [None]:
def create_map_polygon(xy, tooltip='',
                       color='#3388ff',
                       opacity=0.7,
                       fill_color='#3388ff',
                       fill_opacity=0.4, 
                       weight=3):
    points = [[x[0], x[1]] for x in xy]
    polygon = folium.vector_layers.Polygon(locations=points,
                                           tooltip=tooltip,
                                           fill=True,
                                           color=color,
                                           fill_color=fill_color,
                                           fill_opacity=fill_opacity,
                                           weight=weight,
                                           opacity=opacity)
    return polygon

The `create_map_polyline` creates a Folium map polyline

In [None]:
def create_map_polyline(xy, tooltip='',
                        color='#3388ff',
                        opacity=0.7, 
                        weight=3):
    polyline = folium.vector_layers.PolyLine(locations=xy,
                                             tooltip=tooltip,
                                             color=color,
                                             weight=weight,
                                             opacity=opacity)
    return polyline

The `get_trajectory_group_bb` function calculates the bounding box for the group of trajectories reported by `get_trajectories`.

In [None]:
def get_trajectory_group_bb(trjs):
    min_lat = min([t[:, 0].min() for t in trjs])
    min_lon = min([t[:, 1].min() for t in trjs])
    max_lat = max([t[:, 0].max() for t in trjs])
    max_lon = max([t[:, 1].max() for t in trjs])
    return [[min_lat, min_lon], [max_lat, max_lon]]

Given a set of HDBSCAN-calculated cluster identifiers, the `get_cluster_colors` function assigns a color to each of the clusters, excluding the outlier indicator (-1).

In [None]:
def get_cluster_colors(clusters):
    blue = Color("blue")
    red = Color("red")

    ids = np.unique(clusters)
    if np.isin(-1, ids):
        color_range = list(blue.range_to(red, ids.shape[0] - 1))
    else:
        color_range = list(blue.range_to(red, ids.shape[0]))
    return color_range

Creates an HTML Folium map.

In [None]:
def create_map():
    html_map = folium.Map(prefer_canvas=True, control_scale=True, 
                      max_zoom=18, tiles="cartodbpositron")
    return html_map

The `add_cluster_polygon` function adds the H3-based cluster geofence to an existing Folium map.

In [None]:
def add_cluster_polygon(html_map, cluster_id):
    polygons = []
    hexes = get_cluster_hexes(cluster_id)
    for h in hexes:
        points = get_hexagon(h)
        xy = [[x[1], x[0]] for x in points]
        xy.append([points[0][1], points[0][0]])
        polygons.append(Polygon(xy))
        
    merged = cascaded_union(polygons)
    
    if merged.geom_type == "MultiPolygon":
        max_len = 0
        largest = None
        for geom in merged.geoms:
            xy = geom.exterior.coords.xy
            lxy = list(zip(xy[1], xy[0]))
            create_map_polygon(lxy, tooltip=str(cluster_id)).add_to(html_map)
    elif merged.geom_type == "Polygon":
        xy = merged.exterior.coords.xy
        lxy = list(zip(xy[1], xy[0]))

        create_map_polygon(lxy, tooltip=str(cluster_id)).add_to(html_map)
    return html_map

The `show_trajectory_group_map` displays all the trajectories in a map, optionally with clustering coloring information.

In [None]:
def show_trajectory_group_map(html_map, trajectories, clusterer=None):
    cluster_colors = ['#3388ff']
    
    if clusterer is not None:
        cluster_colors = get_cluster_colors(clusterer.labels_)
    
    for i in range(len(trajectories)):
        trajectory = trajectories[i]
        
        if clusterer is None:
            color = '#3388ff'
        else:
            color_idx = clusterer.labels_[i]
            if color_idx == -1 or clusterer.outlier_scores_[i] > 0.9:
                color = '#777777'
            else:
                color = cluster_colors[color_idx].hex
        
        polyline = create_map_polyline(trajectory, color=color)
        polyline.add_to(html_map)

    bound_box = get_trajectory_group_bb(trajectories)
    html_map.fit_bounds(bound_box)
    return html_map

The `show_single_trajectory` function displays a map with a single trajectory from the set of existing trajectories. This function is used below to help audit the individual trajectories.

In [None]:
def show_single_trajectory(trajectories, trajectory_num):
    html_map = create_map()
    bound_box = get_trajectory_group_bb(trajectories)
    html_map.fit_bounds(bound_box)

    trajectory = trajectories[trajectory_num]

    color = '#3388ff'
    
    polyline = create_map_polyline(trajectory, color=color)
    polyline.add_to(html_map)

    return html_map

## Trajectory Display
Here, we display all trajectories together in one Folium map. Use the variables below to set the starting cluster identifier and the ending cluster identifier. A nice query to extract promising pairs of clusters is this:
```
select * from (
    select   cluster_ini
    ,        cluster_end
    ,        count(move_id) as move_count
    from     move
    where    cluster_ini <> -1 and 
             cluster_end <> -1 and 
             cluster_ini <> cluster_end
    group by cluster_ini, cluster_end
) tt order by tt.move_count desc;
```

In [None]:
cluster_ini = 9
cluster_end = 6

## Display the Grouped Trajectories

We start by displaying the collected trajectories on a map. For the trips starting at cluster 9 and ending at 6, there are two clear clusters of trajectories along with some outliers.

In [None]:
trajectories = get_trajectories(cluster_ini, cluster_end)
html_map = create_map()
html_map = show_trajectory_group_map(html_map, trajectories)
html_map = add_cluster_polygon(html_map, cluster_ini)
html_map = add_cluster_polygon(html_map, cluster_end)
html_map

## Cluster the Trajectories

To cluster the trajectories with HDBSCAN, we must first start by calculating the symmetric distance matrix between each pair of trajectories. The `calculate_distance_matrix` function does just that.

In [None]:
def calculate_distance_matrix(trajectories):
    n_traj = len(trajectories)
    dist_mat = np.zeros((n_traj, n_traj), dtype=np.float64)
    dfd = FastDiscreteFrechetMatrix(earth_haversine)

    for i in range(n_traj - 1):
        p = trajectories[i]
        for j in range(i + 1, n_traj):
            q = trajectories[j]

            # Make sure the distance matrix is symmetric
            dist_mat[i, j] = dfd.distance(p, q)
            dist_mat[j, i] = dist_mat[i, j]
    return dist_mat

In [None]:
dist_mat = calculate_distance_matrix(trajectories)

After calculating the distance matrix, we can now run it through the HDBSCAN algorithm and colect the calculated cluster identifiers.

In [None]:
clusterer = hdbscan.HDBSCAN(metric='precomputed', min_cluster_size=2, min_samples=1, cluster_selection_methos='leaf')

In [None]:
clusterer.fit(dist_mat)

In [None]:
clusterer.labels_

We can now display the colored version of the map above, by using the HDBSCAN-calculated cluster identifiers. Note that outlier trajectories are drawn in gray.

In [None]:
html_map = create_map()
html_map = show_trajectory_group_map(html_map, trajectories, clusterer)
html_map = add_cluster_polygon(html_map, cluster_ini)
html_map = add_cluster_polygon(html_map, cluster_end)
html_map