# Cluster Outlines & Contours

In this notebook we're going to learn how to use line annotations to emphasize cluster boundaries.

In [None]:
# If you run this notebook in Google Colab, you need to manually install the following packages.
# !pip install --quiet jupyter-scatter
!pip install --quiet scipy

## Cluster Outlines

In the following we render cluster outlines (i.e., hulls) as line annotations. For that we first setup a `scatter` instance with some dummy data.

In [None]:
import jscatter
import numpy as np
import pandas as pd

n = 1000

x1, y1, g1 = np.random.normal(-1, 0.2, n), np.random.normal(+1, 0.05, n), np.repeat(1, n)
x2, y2, g2 = np.random.normal(+1, 0.5, n), np.random.normal(+1, 0.15, n), np.repeat(2, n)
x3, y3, g3 = np.random.normal(+1, 0.3, n), np.random.normal(-1, 0.33, n), np.repeat(3, n)
x4, y4, g4 = np.random.normal(-1, 0.7, n), np.random.normal(-1, 0.25, n), np.repeat(4, n)

df = pd.DataFrame({
    'x': np.concatenate((x1, x2, x3, x4)),
    'y': np.concatenate((y1, y2, y3, y4)),
    'g': np.concatenate((g1, g2, g3, g4)),
})
df['g'] = df['g'].astype('category')

scatter = jscatter.Scatter(
    data=df,
    x='x', x_scale=(-2, 2),
    y='y', y_scale=(-2, 2),
    color_by='g',
    height=400,
)

Then we compute the convex hull of each cluster using SciPy's `ConvexHull` class and draw the hull as line annotations using Jupyter Scatter's `Line()` class.

<div class="alert alert-block alert-info">
<b>Note:</b> You need to have at Jupyter Scatter &ge; <em>v0.18.0</em> installed for the following to work.
</div>

In [None]:
from scipy.spatial import ConvexHull

cmap = scatter._color_map

def get_hull_line(x, y, color):
    points = np.hstack([np.expand_dims(x, axis=1), np.expand_dims(y, axis=1)])
    vertices = ConvexHull(points).vertices.tolist()
    vertices.append(vertices[0]) # We append the first vertex to close the line
    return jscatter.Line([(x[v], y[v]) for v in vertices], line_color=color)

hull1 = get_hull_line(x1, y1, cmap[0])
hull2 = get_hull_line(x2, y2, cmap[1])
hull3 = get_hull_line(x3, y3, cmap[2])
hull4 = get_hull_line(x4, y4, cmap[3])

scatter.annotations([hull1, hull2, hull3, hull4])

scatter.show()

## Cluster Contours

To further emphasize the density of clusters, we can also draw [contour lines](https://en.wikipedia.org/wiki/Contour_line) using Jupyter Scatter's `Contour()` class.

<div class="alert alert-block alert-info">
<b>Note:</b> You need to have at Jupyter Scatter &ge; <em>v0.19.0</em> installed for the following to work.
</div>

**🚨 Shout-out Alert:** Internally, Jupyter Scatter uses [Seaborn's wonderful `kdeplot()`](https://seaborn.pydata.org/generated/seaborn.kdeplot.html) that relies on [SciPy's Gaussian kernel-density estimation](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html#scipy.stats.gaussian_kde).

In [None]:
import seaborn as sns

geyser = sns.load_dataset("geyser")
geyser.head(3)

In [None]:
scatter_geyser = jscatter.Scatter(
    data=geyser,
    x='waiting',
    y='duration',
    color_by='kind',
    annotations=[jscatter.Contour(by='kind')],
    size=5,
    legend=True,
)

scatter_geyser.show()