Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A data decimation transform can be used to simplify dense line charts by removing many of the points that don't add visual information to a line path.
The decimation strategy is inspired by M4 [1]: cluster the values by grouping them on the main axis (say, x = date for time series) for each given pixel, and in each cluster retain the points that give the minimum and maximum x and y values.
This implementation goes a bit further, as it does not assume that the points are ordered along x, and we want to support curves (such as catmull-rom) that might need to use more control points than these 4 inside a given cluster. So we retain not only argminX, argmaxX, argminY, and argmaxY —this is M4—, but also the first, last, and for some curves the second and next-to-last points. Also, we keep them in the order they appear in the index.
This extension of M4 brings the number of points per pixel from a maximum of 4 to a maximum of 6 for regular (monotone) curves, and 8 for irregular (quadratic, etc) curves. This seems like a modest price to pay to have a generic transform that we can apply systematically.
The areaY, lineY, and differenceY marks now transparently call decimateX. The areaX, lineX (and differenceX in the future, cf. #1920) marks now transparently call decimateY.
The only supported option is pixelSize, which gives the step of the quantization on x (in pixels), and defaults to 0.5. Setting this option to 0 makes the transform return early, effectively neutralizing it.
I would also recommend to call the decimate transform on the tip mark for very heavy datasets, to make it faster, but it would not be a good idea to do it systematically since the user might be interested in all the intermediate points that are aligned on a same x pixel.
todo:
closes #1707
[1] https://www.vldb.org/pvldb/vol7/p797-jugel.pdf ; see also @jheer’s notebook https://observablehq.com/@uwdata/m4-scalable-time-series-visualization for a nice walk-through and implementation of M4 with Plot.