Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scattered3D plot loads cpu and memory up to total system freeze #1525

Closed
kelegorm opened this issue Mar 29, 2017 · 24 comments · Fixed by #1570
Closed

Scattered3D plot loads cpu and memory up to total system freeze #1525

kelegorm opened this issue Mar 29, 2017 · 24 comments · Fixed by #1570

Comments

@kelegorm
Copy link

kelegorm commented Mar 29, 2017

Example: https://codepen.io/anon/pen/VpVLMN
OS X, Win 10, Chrome.

How to reproduce: run example by link above and wait some time(10-30min). While wait open a Chrome's task manager to look how many memory and cpu browser tab with example takes.

It's not usual memory leaks, it is a memory rage. Default, minimal memory is ~150Mb. Then it grows up to 170 by few seconds, then drops to default, after garbage collector have done it's job. Max memory level grows too, I have seen tab takes more than 1Gb (comment out marker and colors in example to get 1Gb fast).

When job for GB is becoming bigger, it takes more CPU, up to 100% on my mac.

It's not a problem when you need to show charts just for few minutes, nobody notice. But for dashboards it's crutical bug.

If something isn't clear, ask me.

@rreusser
Copy link
Contributor

Have you edited and re-saved the codepen? It looks like most of it is commented out. I just want to be sure we're looking at the same thing.

@kelegorm
Copy link
Author

kelegorm commented Mar 29, 2017

Yes, I have tried comment different things to check if they affects problem somehow. So, they just makes leaks faster.

At last time I have left it with x,y,z data only.

@rreusser
Copy link
Contributor

rreusser commented Mar 29, 2017

I'm not currently able to reproduce it, and I'm not sure I fully understand exactly the scenario that leads to the memory leak. I certainly would be believe there could be a memory leak, but when I run the above codepen it just sits idle. In the four minutes after the initial plot, it allocates about 700kb of memory, mostly after clicking the animate button (which to be clear just runs an action once and then does nothing further) and a bit more when moving the mouse a bit. I'm running Chrome 56.0 on Mac OS X.

The memory profile I see for that plot:

screen shot 2017-03-29 at 16 36 12

If possible, could you profile memory usage and confirm that it's what happens when you run the codepen with exactly the parts commented as are currently saved in that pen?

Additional notes on what might be the problem:

I believe there's been an effort to stamp out memory leaks in the gl code, so to anticipate what might be the problem here, one small note on animations is that if you keep adding frames, they will persist in memory.

You don't need to add frames in order to animate though.The python documentation for animations has a nice table on the arguments for animations. Here's a corresponding javascript example. The important thing to note is that if the changes are ephemeral (i.e. if you don't need to retain all named frames in memory permanently), then you can use the alternate inline syntax. That is, instead of writing

Plotly.addFrames('graph', [{name: 'foo', data: {data: [...], layout: {...}}}])
Plotly.animate('graph', ['foo'])

You can simply write:

Plotly.animate('graph', [{data: {data: [...], layout: {...}}}])

The behavior is identical, except the second doesn't store a frame in memory that can be referenced by name. To keep the API surface area small the function is a bit more overloaded than I love because animation can mean so many different things to different people. I'll add a note to the docs that makes this distinction on memory retention slightly clearer.

(One last note: Regarding the usage of deleteFrames, I've filed what I think could be considered a bug (missing feature?) here: #1528 )

@kelegorm
Copy link
Author

kelegorm commented Mar 29, 2017

I've recorded a video for you about that problem, I hope you can take a look (stupid flash player :)). If you won't be able to reproduce, can you tell me what to do else to show you problem.

P.S.

So, animation is not a problem at all.

@rreusser
Copy link
Contributor

The main thing I don't understand is that from the codepen, it sounds like this has nothing to do with animation since there's only a single animate call that runs once and then does nothing further. If that's the case, then this would probably still be an issue with a static gl3d plot. Do you see a memory leak on gl3d plots that are just sitting there not doing anything?

@kelegorm
Copy link
Author

Sorry, I forgot left a link to video. I recorder one new instead, take a look: video
Example from video: here.

@rreusser
Copy link
Contributor

rreusser commented Mar 30, 2017

Thanks for the extra info. I don't happen to have flash player installed, and to be honest I'm not sure a video is the most effective way to understand it. If possible, could you provide a minimal codepen that reproduces it and a screenshot of the profiler that indicates the memory leak? By minimal, I mean exactly the code corresponding to the profiled memory leak and with no commented code. Thanks!

Edit: minimal also meaning that if it leaks without any animation at all or without a continuously running animation, then that's where we need to start and can address more complicated cases later.

@kelegorm
Copy link
Author

kelegorm commented Mar 30, 2017

So, I have just opened example link and wait 10 minutes and took profile shapshot.

Summary time:
screen shot 2017-03-30 at 18 27 26

Overall snapshot:
screen shot 2017-03-30 at 18 26 43

Some period with big load:
screen shot 2017-03-30 at 18 28 49

JS Code tree:
screen shot 2017-03-30 at 18 27 12

Example on codepen: here

@etpinard
Copy link
Contributor

@kelegorm thanks very much for this report.

If you're interested, it would be helpful for us to see if the same large memory fluxes are happenning for other 3d trace types: surface and mesh3d.

@dfcreative any thoughts on how to reduce these memory fluxes?

@rreusser
Copy link
Contributor

rreusser commented Mar 30, 2017

Might be obvious to those who know the code better, but I tracked the raf down to gl-plot3d which seems to allocate about 1.5MB on the js heap per raf. All I can think of is to 🔪 large sections of that file one at a time to get a better sense for whether the allocation is evenly distributed throughout or whether there's a hotspot that's doing something problematic.

@dy
Copy link
Contributor

dy commented Mar 31, 2017

Chrome task manager draws larger and larger memory variance with very slowly increasing average, and it happens in relation with raf - inactive tab preserves the same amount of memory. It seems not depend on the input data size.
It feels like there is more and more memory required for redraw method with the time, though that is effectively collected every time, so the heap just increases or so.
One fast-made possible solution is redrawing only at some events rather than every frame (as used in gl-waveform). That would postpone the problem to indefinite time though.

I think I am going to open 5 windows with different guesses about the mem consumption and leave it for a night to see the trouble.

@rreusser
Copy link
Contributor

rreusser commented Mar 31, 2017

With little knowledge of the particular code, I'd be willing to bet it's something like an 80/20 or 90/10 distribution of code causing lots of GC. That is, can probably pick of some low hanging fruit, but it's a matter of degrees since there will always be plenty to GC. A quick survey of the source might be nice.

The main question I have at the moment is whether this actually leaks memory. For me it fluctuated but seemed kinda static overall.

@dy
Copy link
Contributor

dy commented Mar 31, 2017

@rreusser I def see increase in background chrome window, but once I hover it GC triggers and cleans to static low value

@kelegorm
Copy link
Author

kelegorm commented Mar 31, 2017

@dfcreative As I could see, browser tab should be present on the screen (for OS X) and be active tab for browser's window. If you make it inactive or minimize window - it's calm down, memory goes to it's lower level, cpu load - too. But it doesn't matter if other application overlap browser's tab.

I guess It depends on input data. The more data you put - the faster max level grows.

I'm not sure what is a more bug - leaks or cpu load, but for sure in some time cpu problem becomes a bad problem. Specially, if chart is using on thin client like iPad, mounted on a wall.

@dy
Copy link
Contributor

dy commented Mar 31, 2017

Ran gl-plot3d/example/scatter for 1.5h. Memory consumption increased more than twice.
image
↑ started from around 17,000K

But looking at heap allocation timeline there is no signs of not-retained memory.
image.
Although constant reallocation does not look sane, considering static render.

Should note also that CPU load is not the less important, 15% idle load is annoying, my laptop started overheating after half an hour.

I would try to focus on reducing CPU load at first.

@rreusser
Copy link
Contributor

rreusser commented Mar 31, 2017

The breakdown, IMO:

  1. Re: cpu usage. 👍 Would be great to have better dirty flags that avoid redrawing when static.
  2. Re: static plot: I'm still not quite satisfied with the answer to whether a static plot allocates memory until the computer freezes. That would be surprising to me since the only obvious way for that to happen is either plotly retaining lots of references (which should be obvious because those references have to go somewhere, right?) or a browser-level memory leak (which seems unlikely though not impossible. misuse of a web api or something?). As pointed out, fixing (1) would only delay the inevitable.
  3. Re: computer freezing: If a static plot doesn't crash the computer, then I'm suspicious that the animation API might leak memory, whether internally/accidentally or through incorrect usage of the api that should be better communicated (as described above).

Fixing (1) would be great, but I wouldn't consider it an effective way to address (2) or (3). IMO next is to nail down (2).

Specifically, @dfcreative it sounds like you're seeing allocation but not an unbounded leak. @kelegorm is this consistent with what you're seeing?

@dy
Copy link
Contributor

dy commented Mar 31, 2017

Sometimes there are uncollected allocations inside 3d-view-controls. Possibly a leak.
image
image

@dy
Copy link
Contributor

dy commented Mar 31, 2017

  1. Easy-made shortcut to reduce CPU load a bit: PR
  2. Turns out there is a typo causing memory bloat: PR

Those two seem to solve that problem.

@kelegorm
Copy link
Author

kelegorm commented Apr 3, 2017

@rreusser, as I can see it's not strictly leaks, but something like that. there is two problems.

  1. Static 3D plot cause intensive memory usage. But garbage collector works perfectly. Just through time plot gets more and more memory. But thankfully GC frees all that memory. I saw usage over 1Gb less in half of hour. But it is not so big deal except next case.
  2. CPU usage. When memory is allocated and then frees, it's making job for CPU. And the bigger memory allocation - the bigger the job. So, when it comes to 1Gb CPU is loaded a lot.

About computer lags. It's just about CPU used a lot so it cause performance problems in all other applications.

@kelegorm
Copy link
Author

kelegorm commented Apr 6, 2017

@dfcreative PR 2 is merged.

@dy
Copy link
Contributor

dy commented Apr 6, 2017

Merged and published both gl-plot3d and matrix-camera-controller.

@kelegorm
Copy link
Author

kelegorm commented Apr 7, 2017

I have tested with updated gl-plot3d 1.5.4, bug is really fixed! Thank you, guys! When is milestone release?

@cpsievert
Copy link

it would be helpful for us to see if the same large memory fluxes are happenning for other 3d trace types: surface and mesh3d.

Not totally sure if this is relevant, but -- plotly/plotly.R#483

Also, a bit off topic, as this seems to apply to all trace types, but there seems to be some issues around selfcontained HTML via pandoc -- plotly/plotly.R#721

@rreusser
Copy link
Contributor

rreusser commented Apr 7, 2017

@cpsievert yeah something is definitely up. Firefox just screeches to a halt in a way that makes it difficult to debug. Not sure how releases work, but I get Plotly.version = "1.6.1" in chrome where it's perfectly fine. Seems the first thing might be to confirm whether it's the case with more recent plotly.js. It doesn't quite seem consistent with the slow, small (is it always small?) memory leak observed here though.

Also a small note that #1445 is very likely not relevant here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants