decimate transforms #1966

Fil · 2024-01-02T15:56:07Z

A data decimation transform can be used to simplify dense line charts by removing many of the points that don't add visual information to a line path.

The decimation strategy is inspired by M4 [1]: cluster the values by grouping them on the main axis (say, x = date for time series) for each given pixel, and in each cluster retain the points that give the minimum and maximum x and y values.

This implementation goes a bit further, as it does not assume that the points are ordered along x, and we want to support curves (such as catmull-rom) that might need to use more control points than these 4 inside a given cluster. So we retain not only argminX, argmaxX, argminY, and argmaxY —this is M4—, but also the first, last, and for some curves the second and next-to-last points. Also, we keep them in the order they appear in the index.

This extension of M4 brings the number of points per pixel from a maximum of 4 to a maximum of 6 for regular (monotone) curves, and 8 for irregular (quadratic, etc) curves. This seems like a modest price to pay to have a generic transform that we can apply systematically.

The areaY, lineY, and differenceY marks now transparently call decimateX. The areaX, lineX (and differenceX in the future, cf. #1920) marks now transparently call decimateY.

The only supported option is pixelSize, which gives the step of the quantization on x (in pixels), and defaults to 0.5. Setting this option to 0 makes the transform return early, effectively neutralizing it.

I would also recommend to call the decimate transform on the tip mark for very heavy datasets, to make it faster, but it would not be a good idea to do it systematically since the user might be interested in all the intermediate points that are aligned on a same x pixel.

todo:

documentation
maybe replace the automatic selection of the main channel x (vs x2 or x1) by explicit function names such as decimateX2 etc.?

closes #1707

[1] https://www.vldb.org/pvldb/vol7/p797-jugel.pdf ; see also @jheer’s notebook https://observablehq.com/@uwdata/m4-scalable-time-series-visualization for a nice walk-through and implementation of M4 with Plot.

closes #1707

…the midpoint of x2 and x1, and might be rendered null if x1 is defined as -x2.

decimate transforms

0466f08

closes #1707

Fil requested a review from mbostock January 2, 2024 15:56

Fil added 2 commits January 2, 2024 17:15

documentation placeholder

8c20048

We should probably favor x2 over x, since there are cases where x is …

a5cfd1e

…the midpoint of x2 and x1, and might be rendered null if x1 is defined as -x2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

decimate transforms #1966

decimate transforms #1966

Fil commented Jan 2, 2024 •

edited

Loading

decimate transforms #1966

Are you sure you want to change the base?

decimate transforms #1966

Conversation

Fil commented Jan 2, 2024 • edited Loading

Fil commented Jan 2, 2024 •

edited

Loading