-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data decimation transform #1707
Comments
A good place to apply decimation (almost transparently) is just before rendering. The index is filtered, and the X and Y values are scaled. We can filter the index again, so that the rendering is (almost) the same, but with a lighter path/footprint. This notebook implements the M4 strategy on the line mark: A line chart based on 10 million points, which was impossible to render, becomes possible. A brush (#5) can even be added and enjoy interactive speed. |
I would love for us to do decimation automatically (and transparently) when rendering areas and lines. (Even if only for the linear curve… and maybe we can make it work for the step curve too.) |
I think I've solved the issue with curves (of all known types) in the prototype notebook. This is now using an extension of M4, where I add the first and last points, and for some curves the second and next to last points too. I'll work on a PR. |
I have two requests:
Thanks! |
For 1, let me refer to this notebook: https://observablehq.com/@fil/time-series-topological-subsampling. This framework offers a good way to think about the problem (like formally defining what a "peak" is), and the algorithm is pretty fast. There is a link to a second notebook that uses it with Plot. For 2, if you still want to use the M4 strategy you could adapt the decimateIndex function I'm suggesting in the PR. It's using normalized values (in pixels), with a scaling factor pixelSize that you can tweak to decide which values of the horizontal component fall into the same "bucket". For example if X contains dates and the unit bucket that you're considering is an hour, you would use pixelSize: 3_600_000 (60x60x1000 milliseconds). |
Thanks for the suggestions! I have taken a look at your Notebook and it looks great, but it is also considerably beyond my skill-level in this field :-) So hopefully you will one day make this an easy-to-use transform like But let me elaborate a bit why I need this kind of smoothing. The time-series contains e.g. 16,000 daily data-points, which looks a bit erratic when plotted without smoothing. I am also using your brushing / selection feature so the user can select a range of the plot and copy the data. But the extremes are fairly important in this application, so the user may be surprised if the copied data has more extreme values than what is shown in the plot. That's why I think it may be good to smoothen the data while keeping the extremes. |
Also AM4 described here in the DashQL paper: https://arxiv.org/pdf/2306.03714.pdf |
A transform to decimate (sample) data, by filtering the index.
Possible strategies:
In practice we probably don't need all the methods; having one by default would be enough. M4 is easy to implement.
The text was updated successfully, but these errors were encountered: