# Interactive Hover for Big Data
When visualizing large datasets with [Datashader](https://datashader.org/), we can easily spot interesting patterns and details. However, converting all that data into a smooth image can hide the finer points behind each pixel. In simple terms, while the big picture is clear, the details of individual data points get lost along the way.

To solve this problem, HoloViews offers the `selector` keyword, which makes it possible to acquire more information about the underlying data when using various Datashader operations like `rasterize` or `datashade`. The `selector` lets you retrieve key details for each pixel on the server side and dynamically push these to the front end, avoiding searching through the entire dataset or sending all the data to the browser. This makes it easier for everyone to dive into and understand complex visualizations, keeping the interactive experience fast and smooth even with very large datasets.

This notebook demonstrates how to use `selector`, which creates a dynamic hover tool that keeps the interactive experience fast and smooth with very large datasets and makes it easier to explore and understand complex visualizations.

:::{note}
This notebook uses dynamic updates, which require running a live Jupyter or Bokeh server. When viewed statically, the plots will not update, you can zoom and pan, and hover information will not be available. 
:::

Let's start by creating a Points element with a DataFrame consisting of five datasets combined

In [None]:
import datashader as ds
import holoviews as hv
import numpy as np
import pandas as pd
import panel as pn
from holoviews.operation.datashader import datashade, dynspread, rasterize

hv.extension("bokeh")

# Set default hover tools on various plot types
hv.opts.defaults(hv.opts.RGB(tools=["hover"]), hv.opts.Image(tools=["hover"]))


def create_synthetic_dataset(x, y, s, val, cat):
    seed = np.random.default_rng(1)
    num = 10_000
    return pd.DataFrame(
        {"x": seed.normal(x, s, num), "y": seed.normal(y, s, num), "s": s, "val": val, "cat": cat}
    )


df = pd.concat(
    {
        cat: create_synthetic_dataset(x, y, s, val, cat)
        for x, y, s, val, cat in [
            (2, 2, 0.03, 0, "d1"),
            (2, -2, 0.10, 1, "d2"),
            (-2, -2, 0.50, 2, "d3"),
            (-2, 2, 1.00, 3, "d4"),
            (0, 0, 3.00, 4, "d5"),
        ]
    },
    ignore_index=True,
)


points = hv.Points(df)

## Datashader Operations

Datashader is used to convert the points into a rasterized image. Two common operations are:

- **`rasterize`**: Converts points into an image grid where each pixel aggregates data. The default is to count the number of points per pixel.
- **`datashade`**: Applies a color map to the rasterized data, outputting RGBA values

The default aggregator counts the points per pixel, but you can specify a different aggregator, for example, `ds.mean("s")` to calculate the mean of the `s` column. For more information, see the [Large Data user guide](./15-Large_Data.ipynb).

In [None]:
rasterized = rasterize(points)
shaded = datashade(points)
rasterized + shaded

## What is the difference between an Aggregator and a Selector?

An aggregator in Datashader is a function that combines data points that fall into the same pixel.

For example:

- `ds.count()`: Calculate the number of points (does not depend on any specific column).
- `ds.mean("s")`: Calculate the mean of column `s` for all points in the pixel.
- `ds.max("s")`: Select the minimum value of column `s` for all points in the pixel.

It is important to note the difference between the first two aggregators and the last one. The first two calculate a value, whereas the last one selects a row; this is how a `selector` is defined. 
This means that a `selector` is a subset of an `aggregator`, which only selects a value and not doing any calculating.
Currently the following selectors are supported: `ds.min`, `ds.max`, `ds.first`, and `ds.last`.

Because a `selector` is a subset, it can also be used as an aggregator in an operation.

In [None]:
rasterized_mean = rasterize(points, aggregator=ds.mean("s"))
rasterized_max = rasterize(points, aggregator=ds.max("s"))
rasterized_mean + rasterized_max

## Server-side HoverTool

The key idea behind the server-side `HoverTool` is:

1. **Hover event**: When a user hovers over a plot, the pixel coordinates are sent to the server.
2. **Data lookup**: The server uses these coordinates to look up the corresponding aggregated data from the pre-computed dataset.
3. **Update display**: The hover information is updated and sent back to the front end to display detailed data.

This design avoids sending all the raw data to the client and only transmits the necessary information on-demand. You can enable this by adding a `selector` to a `rasterize` or `datashade` operation. 

In [None]:
rasterized_with_selector = rasterize(points, aggregator=ds.mean("s"), selector=ds.min("s"))
rasterized_with_selector

Some useful functions are `spread` and `dynspread`, which enhance the visual output by increasing the spread of pixels. This helps make points easier to hover over when zooming in on individual points.

In [None]:
dynspreaded = dynspread(rasterized_with_selector)
rasterized_with_selector + dynspreaded

:::{seealso}
[Large Data](./Large_Data.html): An introduction to Datashader and HoloViews
:::