In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import sys; sys.path.extend(["../src", ".."])
import os
import config
import sensai
import logging

c = config.get_config(reload=True)
sensai.util.logging.configureLogging(level=logging.INFO)

# Coordinate Clustering

On top of support for different clustering algorithms, sensAI provides useful methods specific to
clustering of geospatial data. They include utilities for wrangling geometrical data, spanning trees and for persisting and
visualizing the results. It seamlessly interoperates with geopandas and shapely.
This notebook gives an overview of the coordinate clustering's main functions

In [None]:
import geopandas as gp
from pprint import pprint
import numpy as np

from sklearn.cluster import DBSCAN

import logging
from sensai.geoanalytics.geopandas.graph import CoordinateSpanningTree
from sensai.geoanalytics.geopandas.coordinate_clustering import SkLearnCoordinateClustering
from sensai.geoanalytics.geopandas.geometry import alphaShape

## Loading Data and Fitting a Clusterer

The library contains utilities for loading coordinates from files and for wrapping arbitrary scikit-learn-compatible
clustering algorithms. Custom clustering algorithms can be implemented by inheriting from the base class
`EuclideanClusterer`.

In [None]:
sampleFile = c.datafile_path("sample", stage=c.RAW) # this can point to a directory or a shp/geojson file
sampleGeoDF = gp.read_file(sampleFile)
sampleGeoDF

In [None]:
dbscan = SkLearnCoordinateClustering(DBSCAN(eps=150, min_samples=20))
dbscan.fit(sampleGeoDF)

The instance has many useful methods.
You can retrieve clusters individually or via a generator. The noise cluster can be accessed individually.

In [None]:
print(f"Clusters found: {dbscan.numClusters}")

clustersMin50 = list(dbscan.clusters(condition=lambda x: len(x) >= 50))

print(f"Clusters with at least 50 members: {len(clustersMin50)}")

## Analysis and Visualization

From the instance, individual clusters, which are instances of `EuclidianClusterer.Cluster`,
can be retrieved and visualized. Most objects, including the clusterer itself, have a built-in plot method.

In [None]:
dbscan.plot(markersize=0.2)

We can apply a condition to the clusters to be plotted and pass additional arguments affecting the display.

In [None]:
dbscan.plot(condition=lambda x: len(x) >= 50, cmap='plasma')

### Properties of Individual Clusters

Individual clusters can be plotted, too.

In [None]:
sampleCluster = dbscan.getCluster(0)

sampleCluster.plot()

Clusters have an identifier and coordinates. It is easy to extract additional information,
e.g. via the summary method

In [None]:
pprint(sampleCluster.summaryDict())

A single cluster is just a wrapper around its coordinates. They can be
retrieved either as a numpy array, a geodataframe or a MultiPoint object.
The latter is useful for geometric operations, e.g. computing hulls

In [None]:
clusterMultipoint = sampleCluster.asMultipoint()
clusterMultipoint.convex_hull

In [None]:
# we also provide a utility for computing alpha shapes for such objects

alphaShape(clusterMultipoint)

sensAI also provides utilities for computing trees, e.g. here for the minimal spanning tree

In [None]:
sampleTree = CoordinateSpanningTree(sampleCluster)
sampleTree.plot()

Most objects provide a way for extracting a summary from them, either as a dict or as a data frame

In [None]:
print("cluster summary:")
pprint(sampleCluster.summaryDict())

In [None]:
dbscan.summaryDF().head()

## Saving and Loading

All of the objects used above can be exported to a GeoDataFrame using the `toGeoDF` method. This geodataframe
can then be persisted as usual.

In addition to that `CoordinateClusteringAlgorithm` has its own save method which persists the object as pickle.
An instance can be loaded using the load classmethod.
This way of persisting the fitted algorithm is _much more efficient and general_ than saving the corresponding gdf

Individual clusters themselves also have saving and loading methods,
with the difference that they are persisted as (and instantiated from) shapefiles.

In [None]:
dbscanGeoDF = dbscan.toGeoDF() # here again a condition for filtering clusters can be passed
clusterGeoDF = sampleCluster.toGeoDF()
treeGeoDF = sampleTree.toGeoDF()
dbscanGeoDF.head()

In [None]:
dbscanSavedPath = os.path.join(c.temp, f"{dbscan}_sample.pickle")
clusterSavedPath = os.path.join(c.temp, f"sampleCluster_{sampleCluster.identifier}")


dbscan.save(dbscanSavedPath)
sampleCluster.save(clusterSavedPath)

In [None]:
loadedDBSCAN = SkLearnCoordinateClustering.load(dbscanSavedPath)
loadedCluster = SkLearnCoordinateClustering.Cluster.load(clusterSavedPath)

In [None]:
# The loaded objects are equal to the ones we persisted

print(loadedCluster.identifier == sampleCluster.identifier)
print(np.array_equal(sampleCluster.datapoints, loadedDBSCAN.getCluster(0).datapoints))

# Cleaning up
import shutil

shutil.rmtree(clusterSavedPath)
os.remove(dbscanSavedPath)