Effectively representing temporal dynamics in large datasets requires selecting appropriate visualization techniques that ensure responsiveness while providing both a macroscopic view of overall trends and a microscopic view of fine details. This guide will explore various methods, such as WebGL Rendering, LTTB Downsampling, Datashader, and Minimap, each suited for different aspects of large timeseries data visualization. We predominantly demonstrate the use of hvPlot syntax, leveraging HoloViews for more complex requirements. Although hvPlot supports multiple backends, including Matplotlib and Plotly, our focus will be on Bokeh due to its advanced capabilities in handling large timeseries data.


## Getting the data 

Here we have a DataFrame with 1.2 million rows containing standardized data from 5 different sensors.

In [None]:
import pandas as pd

df = pd.read_parquet("https://datasets.holoviz.org/sensor/v1/data.parq")
df

In [None]:
df0 = df[df.sensor=='0']

Let's go ahead and plot this data using various approaches.

## WebGL Rendering

Rendering in hvPlot and HoloViews for Bokeh plots has evolved significantly. Prior to HoloViews version 1.16.0 (May 2023), Bokeh's custom HTML **Canvas** rendering was the default. This approach works well for datasets up to a few tens of thousands of points but struggles above 100K points, particularly in terms of zooming and panning speed. Now, if you want to utilize Canvas rendering, set `hv.renderer("bokeh").webgl = False` prior to creating your hvPlot object.

Around mid-2023, the adoption of improved **WebGL** as the default for hvPlot and HoloViews allowed for smoother interactions with larger datasets by utilizing GPU-acceleration. It's important to note that WebGL performance can vary based on your machine's specifications. For example, some Mac models may not exhibit a marked improvement in WebGL performance over Canvas due to GPU specifications.

In [None]:
df0.hvplot(x="time", y="value", responsive=True, min_height=300, autorange='y', title="WebGL")

<div class="alert alert-info">

<b>Note:</b> `autorange='y'` is used here for automatic y-axis scaling, a feature from HoloViews 1.17 and hvPlot 0.9.0. You can omit that option if you prefer to set the y scaling manually using the zoom tool.

</div>

WebGL plotting should work well for up to a million points, but beyond that, it suffers from the fact that standard Bokeh plotting sends *all* of your data from the server to the local browser. Especially if your code is running on a remote server, transferring that data can be very slow, and updating it on pan and zoom can take a long time even with WebGL for larger datasets. The rest of the methods below use various techniques for not sending all of that data, focusing only on the data needed at any one time.

# 2. LTTB Downsampling

To get smaller sizes, you could simply plot every _n_th datapoint using `df.sample`:

In [None]:
df0.sample(500).hvplot(x="time", y="value", responsive=True, min_height=300, autorange='y',
                       title = "Decimation: Don't do this!")

However, arbitrarily strided sampling like that will suffer from [aliasing](https://en.wikipedia.org/wiki/Downsampling_(signal_processing)), e.g. misrepresenting the curve because the selected samples miss important peaks, troughs, or slopes in the signal. In the example here, large spikes are clearly seen in the WebGL plot but are not visible in the decimated plot, which is why simple decimation of this type is not recommended.

Instead, there are ways to reduce the number of samples while preserving the curve shape, such as the [Largest Triangle Three Buckets](https://skemman.is/handle/1946/15343) algorithm (LTTB). LTTB allows data points not contributing significantly to the visible shape to be dropped, reducing the amount of data to send to the browser but preserving the appearance (and particularly the envelope, i.e. highest and lowest values in a region).

In hvPlot, adding `downsample=True` will enable the LTTB algorithm, which will automatically choose an appropriate number of samples for the current plot, updating with additional plots as you zoom in:

In [None]:
df0.hvplot(x="time", y="value", downsample=True, responsive=True, min_height=300, autorange='y', title="LTTB")

Here you should see that the LTTB plot is visually quite similar to the WebGL plot, but it is rendered much more quickly (especially for local browsing of remote computation). The plot will then be updated with additional detail automatically if you zoom in, as long as the Python code underlying this page is still running (as LTTB depends on Python dynamically, while a Bokeh WebGL plot depends only on JavaScript running in your web browser, which is an advantage when sending static HTML files).

With LTTB, it is now practical to include all of the different sensors in a single plot without slowdown, updating to show more detail when zooming in:

In [None]:
df.hvplot(x="time", y="value", downsample=True, by='sensor', responsive=True, min_height=300, autorange='y',
          title="Categorical LTTB")

LTTB is thus a good default way to browse a timeseries dataset if you don't know how large it might be or if you already know it is too large for WebGL.

# 3. Datashader rasterizing

In [None]:
from holoviews.operation.resample import ResampleOperation2D
ResampleOperation2D.width=1200
ResampleOperation2D.height=500

<div class="alert alert-info">

<b>Note:</b> This code above sets the default image size for Datashader renderings. It ensures images appear at high resolution when the notebook is displayed on our website, addressing the absence of dynamic resizing in this context. For interactive or local use, this adjustment is not critical.

</div>




Bokeh WebGL and LTTB both send data to the web browser and ask the web browser to "connect the dots" between them by drawing a line in the browser page, with LTTB simply sending fewer points. [Datashader](https://datashader.org) works in a different way, rendering the data into a frame buffer on the server, and then sending that buffer to the web browser rather than the individual data points. Thus Datashader will send only a fixed amount of data (the rendered plot), potentially greatly speeding up plots of the largest datasets. As for LTTB, plots will only be updated after a zoom or pan if Python is still running, because Python is what renders and supplies the updated image. Setting the argument `line_width` to a value above 0 will enable [anti-aliasing](https://en.wikipedia.org/wiki/Anti-aliasing) of the line. 

In [None]:
df0.hvplot(x="time", y="value", rasterize=True, cnorm='eq_hist', padding=(0, 0.1), line_width=1,
           responsive=True, min_height=300, autorange='y', title="Rasterize")

If you zoom in enough, you'll see a normal line, but for a long timeseries in a zoomed out plot like this one, what you will see is Datashader's "aggregation" of *all* the line segments between the points, with darker colors indicating areas where the data trace goes back and forth multiple times in a single pixel (with the number of "switchbacks" indicated in the color key). This representation conveys a lot more about the behavior of this data, with the previous plots showing a single solid color regardless of how many line segments crossed that pixel. Datashader rendering can be used to get a good overview of the full shape of a long timeseries, helping you understand how the signal varies even when the steps involved are smaller than the pixels on the screen.

For data with different "categories" (sensors, in this case), Datashader can assign a different color to each of the sensor categories and then aggregating all of them into the final display by mixing their colors:

In [None]:
df.hvplot(x="time", y="value", datashade=True, hover=True, padding=(0, 0.1), responsive=True,
          min_height=300, autorange='y', line_width=1, by='sensor', title="Rasterize categories")

This categorical color mixing can help indicate when traces overlap each other, to give you a clue when to zoom in, and becomes particularly important the more categories there are.

[The example above needs `rasterize`, plus instant inspection. Also needs to illustrate what happens when very large numbers of traces overlap.] 

# 4. Minimap

The LTTB and Datashader options are about rendering or omitting datapoints when showing a large time range that would include many data points. What if you have years of data, but the timescale involved is such that you typically study a single day or a single hour? In that case the new "minimap" approach can help you ensure that you see the larger context while actually plotting only the smaller time range.

A minimap is added using the HoloViews RangeToolLink:

In [None]:
from holoviews.plotting.links import RangeToolLink

# Does not yet work with downsample1d. For now, to make it easier on the browser, let's just take a subset of the data
downsampled_df = df.iloc[::10]

plot    = df0.hvplot(x="time", y="value", height=500)
minimap = df0.hvplot(x="time", y="value", height=150).opts(ylabel='', xlabel='')

link = RangeToolLink(minimap, plot, axes=["x", "y"], boundsx=(None, pd.Timestamp("2022-02-01")), boundsy=(-5, 5))

(plot + minimap).opts(shared_axes=False).cols(1)

Here, you can drag the grey box on the bottom plot and the top plot will update to show that range of the data, letting you explore a large dataset while plotting only a short stretch at a time.