# Linking Multiple Scatter Plots

In this example we're going to link multiple scatter plots and later compose them intro a grid.

In [1]:
import numpy as np
import pandas as pd
import jscatter

### Scatter Plots with 1-to-1 Correspondences

The the straight-forward case, we have two scatter plots with a one-to-one correspondence of the data points. (E.g., think of comparing two embedding spaces of the same data that were generated with two different dimensionality reduction methods)

In [2]:
import jscatter
import numpy as np

x = np.random.rand(500)
y = np.random.rand(500)

scatterA = jscatter.Scatter(x=x, y=y)
scatterB = jscatter.Scatter(x=x, y=y)

jscatter.link([scatterA, scatterB])

GridBox(children=(HBox(children=(VBox(children=(Button(button_style='primary', icon='arrows', layout=Layout(wi…

In the following we'll look at a real-world use-case for the linking feature. In the example we're going to explore four different embeddings of the [Fashion MNIST dataset](Fashion MNIST).

We'll start by loading the data...

In [3]:
import io
import pyarrow as pa
import requests

r = requests.get('https://storage.googleapis.com/flekschas/regl-scatterplot/fashion-mnist-embeddings.arrow')

with pa.ipc.open_file(io.BytesIO(r.content)) as reader:
    embeddings = reader.read_pandas()
    embeddings['class'] = embeddings['class'].astype('category')

...and then instantiate four scatters with a popping color map...

In [4]:
config = dict(
    background_color='#111111',
    color_by='class',
    color_map=['#FFFF00', '#1CE6FF', '#FF34FF', '#FF4A46', '#008941', '#006FA6', '#A30059', '#FFDBE5', '#7A4900', '#0000A6']
)

scatter_pca = jscatter.Scatter(data=embeddings, x='pcaX', y='pcaY', **config)
scatter_tsne = jscatter.Scatter(data=embeddings, x='tsneX', y='tsneY', **config)
scatter_umap = jscatter.Scatter(data=embeddings, x='umapX', y='umapY', **config)
scatter_cae = jscatter.Scatter(data=embeddings, x='caeX', y='caeY', **config)

jscatter.link([scatter_pca, scatter_tsne, scatter_umap, scatter_cae], rows=2, row_height=240)

GridBox(children=(HBox(children=(VBox(children=(Button(button_style='primary', icon='arrows', layout=Layout(wi…

In clockwise order, the embeddings are: PCA, t-SNE, convolutional autoencoder, and UMAP.

### Scatter Plots with N-to-M Correspondences

Now to the more complex case where we have an N-by-M correspondence between the data points.

To simulate this case, we first generate some dummy grid data where the first dataset has `4` points, the second has `16`, and the third has `64` points.

Hence, a point in the first scatter plot corresponds to 4 points in the second scatter plot and 16 points in the third scatter plot.

To be able to determine the correspondences between points, we need to have a column that specifies the membership of points to a common set of groups. In this examply we have four groups (`0` to `3`) and all points in the three datasets assigned to one of these three groups.

In [5]:
X1, Y1 = np.mgrid[0:2:1, 0:2:1]
X2, Y2 = np.mgrid[0:4:1, 0:4:1]
X3, Y3 = np.mgrid[0:8:1, 0:8:1]

df1 = pd.DataFrame(
    np.concatenate((
        np.expand_dims(X1.flatten(), axis=1),
        np.expand_dims(Y1.flatten(), axis=1),
        np.expand_dims(np.repeat(np.arange(4), 1), axis=1)
    ), axis=1),
    columns=['x', 'y', 'group'],
)
df1.group = df1.group.astype('category')

df2 = pd.DataFrame(
    np.concatenate((
        np.expand_dims(X2.flatten(), axis=1),
        np.expand_dims(Y2.flatten(), axis=1),
        np.expand_dims(np.repeat(np.arange(4), 4), axis=1)
    ), axis=1),
    columns=['x', 'y', 'group']
)
df2.group = df2.group.astype('category')

df3 = pd.DataFrame(
    np.concatenate((
        np.expand_dims(X3.flatten(), axis=1),
        np.expand_dims(Y3.flatten(), axis=1),
        np.expand_dims(np.repeat(np.arange(4), 16), axis=1)
    ), axis=1),
    columns=['x', 'y', 'group']
)
df3.group = df3.group.astype('category')

For instance, in the second dataset (`df2`) we see that the first four points are assigned to group `0`. So either one of these four points from dataset two correspond to the first point of the first dataset (`df1`) as that point is also assigned to group `0`.

In [6]:
df2.head(6)

Unnamed: 0,x,y,group
0,0,0,0
1,0,1,0
2,0,2,0
3,0,3,0
4,1,0,1
5,1,1,1


Next, we create three scatter plot instances as usual.

In [7]:
sc1 = jscatter.Scatter(data=df1, x='x', y='y', color_by='group', size=24)
sc2 = jscatter.Scatter(data=df2, x='x', y='y', color_by='group', size=24)
sc3 = jscatter.Scatter(data=df3, x='x', y='y', color_by='group', size=24)

Finally, to tell jscatter about the point correspondences, all we have to do is to specify which column contains the group information using the `match_by` argument.

In [8]:
jscatter.link([sc1, sc2, sc3], match_by='group')

GridBox(children=(HBox(children=(VBox(children=(Button(button_style='primary', icon='arrows', layout=Layout(wi…

Try selecting a point in the left scatter plot to see how the corresponding points in the other two scatter plots are selected.

Note that the hover state is limited to a single point, which means when you mouse over the top-left point in the left-most scatter plot, only one out of the `4` and `16` corresponding points in the other two scatter plots is highlighted. Unfortunately, there is no way around this limitation at the moment.

### Control Linking

If you need control over which interaction (view, selection, or hover) is linked, use `compose()` instead of `link()`. The APIs are very similar. In fact, `link()` is just a shorthand for `compose(sync_view=True, sync_selection=True, sync_hover=True)`.

For example, the example below we show the same four scatter plots from above but this time the views are _not_ linked.

In [9]:
jscatter.compose(
    [jscatter.Scatter(x=x, y=y) for i in range(4)],
    sync_selection=True,
    sync_hover=True,
    rows=2
)

GridBox(children=(HBox(children=(VBox(children=(Button(button_style='primary', icon='arrows', layout=Layout(wi…