Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics tab extremely unstable #142

Closed
lukaszsamson opened this issue May 27, 2020 · 10 comments · Fixed by #144
Closed

Metrics tab extremely unstable #142

lukaszsamson opened this issue May 27, 2020 · 10 comments · Fixed by #144
Labels
help wanted Extra attention is needed

Comments

@lukaszsamson
Copy link

Environment

  • Elixir version (elixir -v): Erlang/OTP 23 [erts-11.0.1] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [hipe]
    Elixir 1.10.3 (compiled with Erlang/OTP 23)
  • Phoenix version (mix deps): 1.5.3
  • Phoenix LiveView version (mix deps): 0.13.1
  • Phoenix Dashboard version (mix deps): master@53d95a1
  • Operating system: macos 10.15.5
  • Browsers you attempted to reproduce this bug on (the more the merrier): chrome

Actual behavior

I have a few metrics defined that each output ~200 data points/s
On Chrome
The metric boxes flicker a lot or stop displaying any data.
A lot o JS errors are reported in the console. The UI from time to time navigates to index page by itself.
On Firefox
Metrics look good but the browser UI becomes unresponsive. It looks like live dashboard is starving the UI thread.
I couldn't get the console to load.
On Safari
UI totally unresponsive, metrics not showing

Example errors logged in chrome:

metrics?group=phoenix:105 Uncaught TypeError: Cannot read property 'push' of undefined
    at Pt (metrics?group=phoenix:105)
    at metrics?group=phoenix:105
    at Array.forEach (<anonymous>)
    at jt (metrics?group=phoenix:105)
    at Mt (metrics?group=phoenix:105)
    at Ht (metrics?group=phoenix:105)
    at ft (metrics?group=phoenix:105)
    at Object.ht [as setData] (metrics?group=phoenix:105)
    at e.value (metrics?group=phoenix:105)
    at e.value (metrics?group=phoenix:105)

metrics?group=my_app:105 Uncaught TypeError: Cannot read property '2' of undefined
    at metrics?group= my_app:105
    at Array.map (<anonymous>)
    at Object.values (metrics?group= my_app:105)
    at metrics?group= my_app:105
    at Array.forEach (<anonymous>)
    at jt (metrics?group= my_app:105)
    at Mt (metrics?group= my_app:105)
    at Ht (metrics?group= my_app:105)
    at ft (metrics?group= my_app:105)
    at Object.ht [as setData] (metrics?group=my_app:105)

It seems the stability deteriorated in the last week as I did'n notice such problems on previous versions.

Expected behavior

No crashes, no UI hangs

@josevalim
Copy link
Member

Can you please let us know what was the previous commit or version that you were running on?

@lukaszsamson
Copy link
Author

lukaszsamson commented May 27, 2020 via email

@lukaszsamson
Copy link
Author

Unfortunately, I couldn't get older versions to run any more. Live view kept crashing unable to start Channel.

How can I run my app with live dashboard in dev mode? It would help tracking down the issue if I was able to see unminified JS.

@josevalim
Copy link
Member

You would need to use LV 0.13.0 with an older version. But I believe @mcrumm has already identifier the root cause. :)

@mcrumm
Copy link
Member

mcrumm commented May 28, 2020

Hi @lukaszsamson! It appears that, when we switched to millisecond timestamps, a move that was intended to improve chart stability by reducing duplicate values, it caused a major hit to performance over time, as we're now drawing significantly more points/lines.

First, I'll note that static chart performance is not impacted, even with very large datasets. Our current problem is that we are redrawing the charts far too often. Chart interactions (hover, drag, etc.) exacerbate the problem, as they also redraw, and with enough interactions while events are in-flight, animations stack up, and eventually we crash the channel (maybe because we missed a heartbeat? i'm not 100% sure on that part, yet.)

I have what may ultimately be only a half-measure in #135, to set a max number of events per chart, but rate of input appears to be the bigger problem.

I wonder if we need a way to "turn off the firehose", so to speak, when a user is interacting with the charts. Further, I wonder if we should batch events on the client and only redraw with new events every N seconds. I'm open to suggestions! :)

@mcrumm mcrumm added the help wanted Extra attention is needed label May 28, 2020
@leeoniya
Copy link

leeoniya commented May 28, 2020

I'm open to suggestions! :)

some thoughts:

you guys may want 2 modes.

one where it's limited by event count (configurable by user) rather than time. this mode retains all the data up to the defined limit and you can zoom in for full details. this would be good for debugging scenarios, and does not completely neuter the fire-hose scenario.

the other mode can be an accumulator with a predefined bucket size (set by the user, maybe 5s default) and a max total bucket count (set by user, maybe 24h worth). this mode basically ingests everything, accumulates the timestamp/in/min/max/out per 5s bucket and then feeds it into the final data array that has a length of 24h in 5s buckets. this mode would not zoomable to higher resolutions than the buckets size. the rendering for this mode can look something like OHLC bars: leeoniya/uPlot#241 (comment)

@lukaszsamson
Copy link
Author

@mcrumm I checked the latest master with #144 merged in and the performance now looks much better but the JS errors are still there.

@mcrumm
Copy link
Member

mcrumm commented May 30, 2020

@lukaszsamson Can you check against d028a86? The latest uPlot should resolve the remaining JS errors.

@lukaszsamson
Copy link
Author

I just checked against bdb5918 and the JS errors are still there
e.g.

metrics?group=my_app:101 Uncaught TypeError: Cannot read property '2' of undefined
    at metrics?group= my_app:101
    at Array.map (<anonymous>)
    at Object.values (metrics?group= my_app:101)
    at metrics?group= my_app:101
    at Array.forEach (<anonymous>)
    at jt (metrics?group= my_app:101)
    at Mt (metrics?group= my_app:101)
    at Ht (metrics?group= my_app:101)
    at pt (metrics?group= my_app:101)
    at Object.i.setSize (metrics?group= my_app:101)

Can you provide source maps or not minified JS so I can debug it?

@lukaszsamson
Copy link
Author

I'm no longer seeing those crashes on master@21d61a464434606e19c39464ffcc71c690f5f358

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants