# Welcome to Jupyter Scatter 👋

In this notebook, we're going over the basic to get you started quickly!

## The Very Basics

All you need to get going is a dataset with at least two variables. For instance, to visualize cities by their longitude/latitude (according to [GeoNames](https://geonames.org)) and color-code them by continent, we create a `Scatter` instance as follows.

In [None]:
import jscatter
import pandas as pd

geonames = pd.read_parquet('https://paper.jupyter-scatter.dev/geonames.pq')
scatter = jscatter.Scatter(
    data=geonames,
    x='Longitude',
    y='Latitude',
    color_by='Continent',
    height=360,
)

scatter.show()

Jupyter Scatter offers many ways to customize the plot via topic-specific methods. For instance, in the following we adjust the point opacity, size, and color.

In [None]:
from matplotlib.colors import AsinhNorm, LogNorm

scatter.opacity(0.5)
scatter.size(by='Population', map=(1, 8, 10), norm=AsinhNorm())
scatter.color(by='Population', map='magma', norm=LogNorm(), order='reverse')

To aid interpretation of individual points and point clusters, Jupyter Scatter includes legends, axis labels, and tooltips.

In [None]:
scatter.legend(True)
scatter.axes(True, labels=True)
scatter.tooltip(True, properties=['color', 'Latitude', 'Country'], preview='Name')

Exploring a scatterplot often involves studying subsets of the points. To select points, one can either long press and lasso-select points interactively or query-select points programmatically. Here we select all cities with a population greater than ten million.

In [None]:
scatter.selection(geonames.query('Population > 10_000_000').index)

The selection works both ways. I.e., to retrieved the indices of selected points use `scatter.selection()`. We can use these indices to get back the related data records.

In [None]:
geonames.iloc[scatter.selection()]

## Integration with Jupyter Widgets

Since Jupyter Scatter builds upon Traitlets, you can easily integrate it with other Jupyter Widgets by observing changes.

For instance, the following example shows how we can link a [UMAP](https://umap-learn.readthedocs.io/en/latest/) embedding scatterplot of the [Fashion
MNIST](https://github.com/zalandoresearch/fashion-mnist) dataset where points represent images to an widget showing the actual images of selected point.

First we're going to create a `Scatter` instance as before.

In [None]:
import jscatter

fashion_mnist = pd.read_parquet(
    'https://paper.jupyter-scatter.dev/fashion-mnist-embeddings.pq'
)

scatter = jscatter.Scatter(
    data=fashion_mnist,
    x='umapX',
    y='umapY',
    color_by='class',
    background_color='black',
    axes=False,
    height=480,
)

Next, we're creating a widget for displaying images in a grid. Don't worry about the details of this image widget here.

In [None]:
import anywidget
import traitlets
import traittypes


class ImagesWidget(anywidget.AnyWidget):
    _esm = """
    const baseUrl = 'https://paper.jupyter-scatter.dev/fashion-mnist-images/';
    
    function render({ model, el }) {
      const container = document.createElement('div');
      container.classList.add('images-container');
      el.appendChild(container);

      const grid = document.createElement('div');
      grid.classList.add('images-grid');
      container.appendChild(grid);

      function renderImages() {
        grid.textContent = '';
        
        model.get('images').forEach((image) => {
          const imgId = String(image).padStart(5, '0');
        
          const img = document.createElement('div');
          img.classList.add('images-fashion-mnist');
          img.style.backgroundImage = `url(${baseUrl}${imgId}.png)`;
        
          grid.appendChild(img);
        });
      }

      model.on("change:images", renderImages);
      
      renderImages();
    }

    export default { render };
    """

    _css = """
    .images-container {
      position: absolute;
      inset: 0;
      overflow: auto;
      background: black;
    }
    
    .images-grid {
      display: grid;
      grid-template-columns: repeat(auto-fit, minmax(32px, 1fr));
      align-content: flex-start;
      gap: 8px;
    }
    
    .images-fashion-mnist {
      width: 32px;
      height: 32px;
      background-repeat: no-repeat;
      background-position: center;
    }
    """

    images = traittypes.Array(default_value=[]).tag(sync=True)


images = ImagesWidget()

And finally, to link the point selection of our scatter to the image widget, all we have to do is to observe the scatter widget's `selection` property for changes.

In [None]:
import ipywidgets


def selection_change_handler(change):
    images.images = change['new']


scatter.widget.observe(selection_change_handler, names=['selection'])

ipywidgets.AppLayout(center=scatter.show(), right_sidebar=images)

Try selected some points to see what images they represent!

In [None]:
# fmt: off
scatter.selection([
    1254,  52549, 47543, 11095, 34364, 36959, 11363,  9277, 23068,
    8921,  54801, 46398, 51721, 20057, 50162,   572, 59831, 43542,
    13883, 21882, 27737,  3578, 21036, 35325,  6552, 44735, 29358,
    46910,  4645, 28069, 25871, 44880,  7053, 25587, 54431, 43876,
    19916, 20364, 26526, 39428, 52338, 15187, 15646, 41574, 33875,
    3613,  58362, 26254,  1274,  9648, 27629, 32981, 47433, 25390,
    15293,  9619,   872, 20886, 57006, 42770, 41476, 54424, 34547,
    6570,   5556, 36400, 14179, 16730, 15361,  5192, 58429, 59357,
    2789,  30767, 46375, 45233, 32280, 58065, 20809, 17061, 27960,
])
# fmt: on

## Composing Multiple Scatter Plots

Visualizing two or more related scatter plots can be useful comparing datasets. Jupyter Scatter makes this easy with synchronized hover, view, and point selections via its `compose` method.

For instance, in the following, we compose a two-by-two grid of four embeddings of the same [Fashion MNIST](https://github.com/zalandoresearch/fashion-mnist) dataset from before: [PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA), [UMAP](https://umap-learn.readthedocs.io/en/latest/), [t-SNE](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html), and a convolutional autoencoder.

In [None]:
config = dict(
    data=fashion_mnist,
    color_by='class',
    legend=True,
    axes=False,
    zoom_on_selection=True,
)

pca = jscatter.Scatter(x='pcaX', y='pcaY', **config)
tsne = jscatter.Scatter(x='tsneX', y='tsneY', **config)
umap = jscatter.Scatter(x='umapX', y='umapY', **config)
cae = jscatter.Scatter(x='caeX', y='caeY', **config)

jscatter.compose(
    [(pca, 'PCA'), (tsne, 't-SNE'), (umap, 'UMAP'), (cae, 'CAE')],
    sync_selection=True,
    sync_hover=True,
    rows=2,
)

### Next

If you like what you saw and you want to learn more, go to https://jupyter-scatter.dev for more guides and API docs. For a full-blown tutorial, check out https://github.com/flekschas/jupyter-scatter-tutorial that I initially presented at [SciPy '23 talk](https://www.youtube.com/watch?v=RyC5ixtQG-Q).

If you have ideas for improving Jupyter Scatter, found a bug, or want to give us a ⭐️, head over to https://github.com/flekschas/jupyter-scatter.