Spectral Clustering Example.

The image loaded here is a cropped portion of the ``MERCATOR_LC80210392016114LGN00_B10.TIF`` LANDSAT image included as a public [datashader example](http://datashader.org/topics/landsat.html).

In addition to `dask-ml`, we'll use `rasterio` to read the data and `matplotlib` to plot the figures.
I'm just working on my laptop, so we could use either the threaded or distributed scheduler. I'll use the distributed scheduler for the diagnostics.

In [None]:
import rasterio
import numpy as np
import xarray as xr
import holoviews as hv
from holoviews.operation.datashader import regrid
import dask.array as da
from dask_ml.cluster import SpectralClustering
from dask.distributed import Client
hv.extension('bokeh')

In [None]:
client = Client(processes=False)
client

In [None]:
import intake
cat = intake.open_catalog('../catalog.yml')
list(cat)

In [None]:
l5_img = cat.l5.read_chunked()
l5_img

In [None]:
bands = l5_img
bands.coords['band'] = [1, 2, 3, 4, 5, 6]
bands = bands[:, 2500:5000, 2000:4500]
bands.data[bands.data==-9999.0] = 0.0
bands.fillna(0.0)
bands = bands.astype(float)
bands = (bands - bands.mean()) / bands.std()
bands

In [None]:
%%opts Image [invert_yaxis=True width=250 height=250 tools=['hover']] (cmap='viridis')
hv.Layout([regrid(hv.Image(band, kdims=['x', 'y'])) for band in bands[:3]])

In [None]:
%%opts Image [invert_yaxis=True width=250 height=250 tools=['hover']] (cmap='viridis')
hv.Layout([regrid(hv.Image(band, kdims=['x', 'y'])) for band in bands[3:]])

In [None]:
flat_input = bands.stack(z=('y', 'x'))
flat_input

In [None]:
flat_input.shape

We'll reshape the image to be how dask-ml / scikit-learn expect it: `(n_samples, n_features)` where n_features is 1 in this case. Then we'll persist that in memory. We still have a small dataset at this point. The large dataset, which dask helps us manage, is the intermediate `n_samples x n_samples` array that spectral clustering operates on. For our 2,500 x 2,500 pixel subset, that's ~50

In [None]:
X = flat_input.values.astype('float').T
X.shape

In [None]:
X = da.from_array(X, chunks=100_000)
X = client.persist(X)

And we'll fit the estimator.

In [None]:
clf = SpectralClustering(n_clusters=4, random_state=0,
                         gamma=None,
                         kmeans_params={'init_max_iter': 5},
                         persist_embedding=True)

In [None]:
%time clf.fit(X)

In [None]:
labels = clf.assign_labels_.labels_.compute()
labels.shape

In [None]:
labels = labels.reshape(bands[0].shape)

In [None]:
%%opts Image [invert_yaxis=True width=250 height=250 tools=['hover']] (cmap='viridis')
hv.Layout([regrid(hv.Image(band, kdims=['x', 'y'])) for band in bands])

In [None]:
%%opts Image [invert_yaxis=True width=250 height=250 tools=['hover']] (cmap='viridis')
hv.Layout([regrid(hv.Image(band, kdims=['x', 'y'])) for band in bands[3:]])

In [None]:
%%opts Image [invert_yaxis=True width=250 height=250 tools=['hover']] (cmap='viridis')
hv.Image(labels)