data decimation transform #1707

Fil · 2023-06-21T20:09:38Z

A transform to decimate (sample) data, by filtering the index.

Possible strategies:

% sample, not smart but easy to understand
lttb (ref. https://skemman.is/bitstream/1946/15343/3/SS_MSthesis.pdf); adaptive binning.
M4 (as seen in mosaic), binning.

In practice we probably don't need all the methods; having one by default would be enough. M4 is easy to implement.

Fil · 2023-11-27T18:16:14Z

A good place to apply decimation (almost transparently) is just before rendering. The index is filtered, and the X and Y values are scaled. We can filter the index again, so that the rendering is (almost) the same, but with a lighter path/footprint.

This notebook implements the M4 strategy on the line mark:
https://observablehq.com/@fil/fast-brush-with-line-simplification

A line chart based on 10 million points, which was impossible to render, becomes possible. A brush (#5) can even be added and enjoy interactive speed.

mbostock · 2023-12-27T19:34:26Z

I would love for us to do decimation automatically (and transparently) when rendering areas and lines. (Even if only for the linear curve… and maybe we can make it work for the step curve too.)

Fil · 2024-01-02T11:47:25Z

I think I've solved the issue with curves (of all known types) in the prototype notebook. This is now using an extension of M4, where I add the first and last points, and for some curves the second and next to last points too. I'll work on a PR.

closes #1707

Hvass-Labs · 2024-01-15T15:52:08Z

I have two requests:

I need to smoothen a time-series so it doesn't look so erratic when plotted, but it is important that I keep the peaks which get smoothened out by the windowY transform. Would it be possible to make this new down-sampling method one of the reduce options in the windowY transform, so I could both smoothen the data and keep the peaks?
In another problem, I have to down-sample an array in Java-script. So it would be very useful to me, if you can provide direct access to the Java-script function for this down-sampling algorithm, similar to how I can compute e.g. histograms using a D3 function without actually plotting it.

Thanks!

Fil · 2024-01-15T18:18:11Z

For 1, let me refer to this notebook: https://observablehq.com/@fil/time-series-topological-subsampling. This framework offers a good way to think about the problem (like formally defining what a "peak" is), and the algorithm is pretty fast. There is a link to a second notebook that uses it with Plot.

For 2, if you still want to use the M4 strategy you could adapt the decimateIndex function I'm suggesting in the PR. It's using normalized values (in pixels), with a scaling factor pixelSize that you can tweak to decide which values of the horizontal component fall into the same "bucket". For example if X contains dates and the unit bucket that you're considering is an hour, you would use pixelSize: 3_600_000 (60x60x1000 milliseconds).

Hvass-Labs · 2024-01-17T11:19:54Z

Thanks for the suggestions!

I have taken a look at your Notebook and it looks great, but it is also considerably beyond my skill-level in this field :-) So hopefully you will one day make this an easy-to-use transform like windowY that everyone can use.

But let me elaborate a bit why I need this kind of smoothing. The time-series contains e.g. 16,000 daily data-points, which looks a bit erratic when plotted without smoothing.

I am also using your brushing / selection feature so the user can select a range of the plot and copy the data. But the extremes are fairly important in this application, so the user may be surprised if the copied data has more extreme values than what is shown in the plot.

That's why I think it may be good to smoothen the data while keeping the extremes.

mbostock · 2024-04-05T21:40:14Z

Also AM4 described here in the DashQL paper: https://arxiv.org/pdf/2306.03714.pdf

Fil added the enhancement New feature or request label Jun 21, 2023

Fil self-assigned this Jun 21, 2023

Fil changed the title ~~data decimation~~ data decimation transform Jun 21, 2023

Fil mentioned this issue Nov 25, 2023

Brushing #5

Open

Fil added a commit that referenced this issue Jan 2, 2024

decimate transforms

0466f08

closes #1707

Fil linked a pull request Jan 2, 2024 that will close this issue

decimate transforms #1966

Draft

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data decimation transform #1707

data decimation transform #1707

Fil commented Jun 21, 2023

Fil commented Nov 27, 2023

mbostock commented Dec 27, 2023 •

edited

Loading

Fil commented Jan 2, 2024

Hvass-Labs commented Jan 15, 2024

Fil commented Jan 15, 2024

Hvass-Labs commented Jan 17, 2024

mbostock commented Apr 5, 2024

data decimation transform #1707

data decimation transform #1707

Comments

Fil commented Jun 21, 2023

Fil commented Nov 27, 2023

mbostock commented Dec 27, 2023 • edited Loading

Fil commented Jan 2, 2024

Hvass-Labs commented Jan 15, 2024

Fil commented Jan 15, 2024

Hvass-Labs commented Jan 17, 2024

mbostock commented Apr 5, 2024

mbostock commented Dec 27, 2023 •

edited

Loading