Server-side of HTTP admin interface uses a lot of resources on large clusters #1660

danielmewes · 2013-11-19T00:23:56Z

On a 32-node cluster with something like 200 tables, just opening the HTTP admin interface can max out a core on the server. It remains maxed-out until a moment after I close the browser window.

Some operation(s) there seem(s) to become very expensive for large clusters.
I will profile this some time.

neumino · 2013-11-19T00:26:09Z

There is just too much data in /ajax, some that are not really useful for the web interface.

Sending diff would be a quick workaround (and not too hard?)

danielmewes · 2013-11-19T00:29:41Z

Thanks for chiming in @neumino.
I could imagine that there are other even easier things we could optimize though. Let's wait for profiling results and then see how we best proceed from there.

neumino · 2013-11-19T00:31:45Z

Oh, there are also the stats and the request to distribution that could be expensive (there is a timeout for the stats though).

danielmewes · 2013-11-19T00:40:24Z

Also when I click "Tables" to get a table listing, my Firefox complains about a JS script that is taking too long, e.g. "http://magneto:8082/cluster-min.js?v=foo:16246". The exact line number varies though (foo is the version of my server).
@neumino: At some point, maybe we could have a look at that together?

neumino · 2013-11-19T00:58:45Z

Hum, my guess is that we just create too many backbone views.
We create one per table and 2 per database. If you have 500 tables, I can imagine it breaking.

I would tend to think that the current architecture of the code in admin/static/coffee/namespaces/index.coffee is just over engineered and create too many views. I can try to a quick/dirty thing to reduce the numbers of view and we can check if it works better.

neumino · 2013-11-19T01:00:45Z

And we compute the state of each namespace, which can be expensive (especially if you have tons of shards).

danielmewes · 2013-11-19T01:13:55Z

The profiler shows most time on the server side being spent in progress_app_t::handle(). This is when viewing the dashboard, not the table view.
The function seems to be so slow because it copies std::map<peer_id_t, cluster_directory_metadata_t> a lot, which makes sense. I think I can optimize this quite easily by making use of some of the new functions that I added for improving overall directory scalability. I will give it a try.

danielmewes · 2013-11-19T02:10:45Z

A fix for the main server-side problem (excessive copying, up to something like O(n^2 log n) ) is in code review 1036 by @neumino, and implemented in branch daniel_1660.

The client-side inefficiency when opening the "Tables" page is a different problem, and I'm going to open a separate issue for that.

danielmewes · 2013-11-19T21:49:20Z

In next as of f242d98

josephglanville · 2014-03-16T16:05:01Z

I have been using the /ajax endpoint for doing cluster monitoring.

It strikes me that this could be split up into multiple endpoints rather than just being a massive 'everything in one go' sort of thing.

neumino · 2014-03-16T19:41:52Z

The content in /ajax used to be split into multiple endpoints, but they were merged at some point.

Having its content spread accross multiple urls would mean that the web interface would have to do more http requests to retrieve all the data it wants -- which isn't a good thing.

coffeemug · 2014-03-17T18:51:15Z

@josephglanville -- you could go into a specific path to do monitoring FYI. For example, you can curl /ajax/directory to only get the contents of the directory.

On a different note, the monitoring experience currently leaves a lot to be desired. Would you mind writing up your experiences in #1392? I.e. what was it like to set up monitoring, what's good about it, what's bad about it, what could be improved, etc. If you could take a few minutes to do that, it would help immensely (and would make monitoring better for you!)

josephglanville · 2014-03-17T19:20:10Z

@coffeemug Cheers for the heads up.

ghost assigned danielmewes Nov 19, 2013

danielmewes mentioned this issue Nov 19, 2013

"Tables" page of web interface can become slow #1662

Closed

danielmewes closed this as completed Nov 19, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Server-side of HTTP admin interface uses a lot of resources on large clusters #1660

Server-side of HTTP admin interface uses a lot of resources on large clusters #1660

danielmewes commented Nov 19, 2013

neumino commented Nov 19, 2013

danielmewes commented Nov 19, 2013

neumino commented Nov 19, 2013

danielmewes commented Nov 19, 2013

neumino commented Nov 19, 2013

neumino commented Nov 19, 2013

danielmewes commented Nov 19, 2013

danielmewes commented Nov 19, 2013

danielmewes commented Nov 19, 2013

josephglanville commented Mar 16, 2014

neumino commented Mar 16, 2014

coffeemug commented Mar 17, 2014

josephglanville commented Mar 17, 2014

Server-side of HTTP admin interface uses a lot of resources on large clusters #1660

Server-side of HTTP admin interface uses a lot of resources on large clusters #1660

Comments

danielmewes commented Nov 19, 2013

neumino commented Nov 19, 2013

danielmewes commented Nov 19, 2013

neumino commented Nov 19, 2013

danielmewes commented Nov 19, 2013

neumino commented Nov 19, 2013

neumino commented Nov 19, 2013

danielmewes commented Nov 19, 2013

danielmewes commented Nov 19, 2013

danielmewes commented Nov 19, 2013

josephglanville commented Mar 16, 2014

neumino commented Mar 16, 2014

coffeemug commented Mar 17, 2014

josephglanville commented Mar 17, 2014