Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor series focus performance with noisy data #787

Open
flyingmutant opened this issue Jan 15, 2023 · 14 comments
Open

Poor series focus performance with noisy data #787

flyingmutant opened this issue Jan 15, 2023 · 14 comments

Comments

@flyingmutant
Copy link
Collaborator

Following VKCOM/statshouse#134, as promised.

Note that we use a hack/workaround of setting focus.alpha = 1.1, this way the plot is not redrawn every time focus changes (which can happen every frame for dense data, and will tank the performance completely). Instead, we simply highligh the focused series in the legend.

Here is a self-contained file that reproduces the problem: uplot-focus-perf.zip, using real production data. Hovering over the plot is janky and slow with series focus enabled, and smooth otherwise.

@leeoniya
Copy link
Owner

leeoniya commented Jan 15, 2023

thanks for the info. currently on phone but if your attached data looks like the screenshot in the linked issue, then i wonder if it makes sense to offer some kind of median, geomean, or custom fn mode for focus that can do some reduce op on surrounding points of each series and then use that smoothed value to determine cursor proximity. that should be much more stable and reduce redraw bouncing while still maintaining focus highlight.

@flyingmutant
Copy link
Collaborator Author

Some kind of fast approximate focus mode sounds great. Here is how the data looks:

Screenshot from 2023-01-15 17-06-54

@leeoniya leeoniya changed the title Poor performance of series focus with lots of data Poor performance of series focus noisy data Jan 15, 2023
@leeoniya leeoniya changed the title Poor performance of series focus noisy data Poor series focus performance with noisy data Jan 16, 2023
@leeoniya
Copy link
Owner

leeoniya commented Jan 17, 2023

i didn't realize there were 100 series here, with most of them concentrated at the bottom 20% of the range. whatever we do to improve this, it will still lag badly when hovering anything below 25k, and may be also not great above 25k, since it will require averaging ~10pts of more on every mousemove for all 100 series.

at some point you need to do something else. e.g. turning a scatter plot with millions of points in an aggregated heatmap with 10k cells.

some kind of anomaly detection and smoothing would be good here. can you try doing this smoothing in advance? this way you won't have a null-filled ocean, and the series will be smoothed and and aligned enough for the default focus to work better. see ASAP here: https://leeoniya.github.io/uPlot/demos/data-smoothing.html, or you can do it server-side.

i'll keep this open just to experiment in the future, but i dont think this will solve this case adequately.

original ASAP code: http://www.futuredata.io.s3-website-us-west-2.amazonaws.com/asap/

@leeoniya
Copy link
Owner

leeoniya commented Jan 17, 2023

horizon plots are better for things like this. they give you a dedicated, fixed height hover area for each series.

https://datavis.blog/2022/04/30/horizon-charts-in-tableau/
https://observablehq.com/@d3/horizon-chart
https://bernatgel.github.io/karyoploter_tutorial/Tutorial/PlotHorizon/PlotHorizon.html

could be interesting to do something like this in a uPlot demo.

i have a y-shifted demo, which is close but not the same: https://leeoniya.github.io/uPlot/demos/y-shifted-series.html

@flyingmutant
Copy link
Collaborator Author

I am a fan of horizon plots too and am thinking about how we can incorporate them quite often :-)

The problem here is that our project provides a UI for interactive data exploration (in fact, this is its main mode of operation), where users control how much data is shown. We provide an ability to group by time series by tags, and to select how much of the aggregated ones to show (top 5 by default, but can be top 100 like in this example). When the plot looks and works great with top 5, but starts to be very janky with top 100, the user experience is not great, and right now in this case uPlot looks to be the bottleneck (backend or JSON decoding performance does not degrade this much). Dynamically switching the display based on the amount of data is unfortunately out of the question, as we want for user experience to be always the same, without drastic transition points.

@flyingmutant
Copy link
Collaborator Author

Also, I am personally quite agains data smoothing, and consider it to be an antipattern (at least as the default). We do select the aggregation interval based on the plot width (targeting to have 1 value every several pixels), however, so the screenshot above is probably near the worst case of data density. Most plots are much closer to this:

Screenshot from 2023-01-17 16-33-57

@leeoniya
Copy link
Owner

leeoniya commented Jan 17, 2023

i dont have a lot of free time this month, so you might need to get your hands dirty and make a PR so we can test it out.

maybe add a setting for sampling surrounding values like value?: (u: uPlot, seriesIdx: number, dataIdx: number) => number here:

uPlot/dist/uPlot.d.ts

Lines 513 to 516 in 9b6888c

export interface Focus {
/** minimum cursor proximity to datapoint in CSS pixels for focus activation */
prox: number;
}

then update around here to invoke cursor.focus.value(), which will handle sampling/reducing the surrounding points to the value you want to use:

uPlot/src/uPlot.js

Lines 2485 to 2494 in 9b6888c

let yPos = yVal2 == null ? -10 : incrRoundUp(valToPosY(yVal2, mode == 1 ? scales[s.scale] : scales[s.facets[1].scale], yDim, 0), 1);
if (yPos > 0 && mode == 1) {
let dist = abs(yPos - mouseTop1);
if (dist <= closestDist) {
closestDist = dist;
closestSeries = i;
}
}

@leeoniya
Copy link
Owner

i added you as a collaborator so you can create new branches that we can both push to. make something like flyingmutant/cursor-focus-sampling

@flyingmutant
Copy link
Collaborator Author

Thanks! Unfortunately can't promise right now when I'll be able to find time to dig into this.

@flyingmutant
Copy link
Collaborator Author

I've made a bit of progress:

Screenshot from 2023-02-02 12-25-29

I am able to reproduce the problem with very little data: only 3 series, 1 of which is very sparse. I am not very proficient with Chrome's profiler, but it shows that almost all the time is spent in "System", with no way to dig into it (?):

Screenshot from 2023-02-02 12-27-28

I've tried the Firefox profiler and it so so much more helpful:

Screenshot from 2023-02-02 12-37-10

Screenshot from 2023-02-02 12-38-05

Screenshot from 2023-02-02 12-38-10

Looks like a lot of slow redraw is happening, for some reason.

@leeoniya
Copy link
Owner

leeoniya commented Feb 3, 2023

it's not a question of how many series, but how many series are overlapping each other, and how much noise and nulls there are. i said this in my first comment. the focus can change on every mousemove event in noisy overlapping data because it is based on closest datapoint to cursor position. redrawing hundreds of times per second is always going to suck.

take a look at the last commit in the cursor.focus.value-2 branch. it tries to do a local 20-point average to stabilize the focus proximity. as i expected, it helps, but does not solve this in cases when the series are too densly packed or too noisy, so their local averages alternate.

the question was never why or where there was a problem, but how you expect a solution to work with such data. do you have an example in another charting library that works as you expect with the same data?

@flyingmutant
Copy link
Collaborator Author

Can we avoid the redraw completely for cases where we don't want focused and de-focused series to be styled differently, and only want to update the legend and the overlay with the crosshair and highlighted points closest to the cursor? I think that would be a great compromise between speed and functionality.

@leeoniya
Copy link
Owner

leeoniya commented Feb 3, 2023

i'll look into adding selective alpha for for series, legend, and hover points. something like cursor.focus.alpha: [seriesAlpha, legendAlpha, cursorPtsAlpha] (when hovering plotting area) and legend.focus.alpha: [seriesAlpha, legendAlpha, cursorPtsAlpha] (when hovering legend)

@Minardil
Copy link

Minardil commented Feb 6, 2024

it's not a question of how many series, but how many series are overlapping each other, and how much noise and nulls there are. i said this in my first comment. the focus can change on every mousemove event in noisy overlapping data because it is based on closest datapoint to cursor position. redrawing hundreds of times per second is always going to suck.

take a look at the last commit in the cursor.focus.value-2 branch. it tries to do a local 20-point average to stabilize the focus proximity. as i expected, it helps, but does not solve this in cases when the series are too densly packed or too noisy, so their local averages alternate.

the question was never why or where there was a problem, but how you expect a solution to work with such data. do you have an example in another charting library that works as you expect with the same data?

We have such proprietary library which I want to get rid of. uplot have the closest performance but still have two bottlenecks: focus and resizing. We are doing focus on different canvas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants