Skip to content

Conversation

@leeoniya
Copy link
Owner

@leeoniya leeoniya commented Dec 27, 2024

That's not a typo in the title; this PR makes a single, 10 million point series render in 100ms instead of 300ms (on a 15W laptop CPU from 2021)

blinky

But how did the fastest JS plotting lib leave so much performance on the table?

Well, datasets this large are rare. Once you ask for this much data, both the query latency and download time will likely make you rethink your life choices long before you notice a 200ms difference in rendering time. More typical queries [that return fewer than 50k points] already rendered in < 10ms, so there wasn't much reason to deep dive into the profiler - everything was fine.

So, why now?

I'm getting started on uPlot 1.7 - a stepping stone to 2.0 - and since this will likely be the last 1.x bump, I wanted to either ship or discard some ideas I've had over the years. One of these was to switch the all-important linear path-builder from its modified M4 decimation to LTTB decimation, or adapt Simplify.js to improve drawing performance; ctx.lineTo() tends to be the bottleneck for large datasets, so doing less of it without impacting the result would be great.

An LTTB branch showed some promise in the past, but issues like timescale/timescaledb-toolkit#501 make me wonder if the nature of the algorithm (to bucket by pre-defined steps) could miss spikes at the bucket boundaries with certain data shape + threshold combos. Perhaps these concerns are unfounded [1], but not knowing if there are some easy-to-miss regressions would haunt me. uPlot's current approach is battle-tested for multiple years with millions of users, and the ROI is not big enough to offset the risk, IMHO.

This week, I went ahead and adapted Simplify.js to uPlot's columnar data layout and its linear pathbuilder in a simplify-js branch, but the resulting output was underwhelming:

before:
Screenshot_20241225_200031

after:
Screenshot_20241225_195836

Nevertheless, the process of trying out path simplification did lead me to stress test an artificial 10M point dataset against the current M4 approach. I quickly noticed that Chrome's profiler failed to properly attribute the cost of some known-hot functions, while surfacing obviously-inexpensive others. After wrestling the profiler into submission (a rant for another day), it highlighted some valid points of concern. The M4 decimation, as implemented, worked exclusively in the pixel domain, which meant running all 10M raw x and y values through the u.valToPos() scaling functions, lighting up like a Christmas tree. To make matters worse, the scaling functions did some things at runtime that could have been done once at init time.

So in this PR...

  • Decimation is disabled for fewer datapoints than 4x the plot pixel width, since there's no benefit from the added complexity, and this negatively impacts high frequency data updates such as streaming a few thousand points
  • Decimation is now done almost entirely in the raw value domain (both x and y) rather than pixel domain, bypassing the scaling functions 99.99% of the time
  • All scaling functions are pre-initialized with known constants at init time, reducing the hot path to simple, branchless math
  • Functions for finding data min/max have been slightly optimized and combined

[1] the paper definitely discusses issues that can arise from this algorithm, since it works best on regularly-spaced data, and since only one data-point is chosen per bucket, there is no bucket threshold that can catch both a downward and upward spike in the same bucket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants