Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Freeze on graph page when working with a large amount of metrics due to no upper limit on insertable metric dropdown. #5421

Open
SpencerMalone opened this Issue Mar 29, 2019 · 4 comments

Comments

Projects
None yet
3 participants
@SpencerMalone
Copy link

SpencerMalone commented Mar 29, 2019

Bug Report

What did you do?
After upgrading from 2.6.1 -> 2.8.0, we start seeing large page freezes just after page load. It's like... Page load, click around for a moment, interact with the expression bar, then there's a heavy loading pause (10 seconds?), then it goes to normal with some brief pauses afterwards. We have seen this across all of our instances of 2.8.0, but be aware that we do have pretty beefy deployments.

Once it finishes loading, it's often OK, but that into chug is painful.

Read below, but this turned out to be a coincidence, the problem was a large increase in metric labels that caused the DOM node population on the insertable metric dropdown to cause heavy slowdowns. We should have an upper limit on the amount of DOM nodes we create in https://github.com/prometheus/prometheus/blob/master/web/ui/static/js/graph/index.js#L276

What did you expect to see?
Not the spinning loading wheel.

What did you see instead? Under which circumstances?
An inability to interact with the GUI

Environment

  • System information:
    Linux 4.4.161-1.el7.elrepo.x86_64 x86_64

  • Prometheus version:

prometheus --version
prometheus, version 2.8.0 (branch: HEAD, revision: 59369491cfdfe8dcb325723d6d28a837887a07b9)
  build user:       root@4c4d5c29b71f
  build date:       20190312-07:46:58
  go version:       go1.11.5
  • Logs:
    There's no JS errors logging, but here is a gif of the behavior. What are you are looking at is the lengthy freeze where the text cursor stops blinking, and the expression bar stays highlighted.

Screen Recording 2019-03-29 at 03 48 PM

@SpencerMalone SpencerMalone changed the title UI has length page freeze on load / touching the expression text box after 2.8.0 upgrade UI has lengthy page freeze on load / touching the expression text box after 2.8.0 upgrade Mar 29, 2019

@SpencerMalone

This comment has been minimized.

Copy link
Author

SpencerMalone commented Apr 4, 2019

I'm thinking this has something to do with https://github.com/prometheus/prometheus/blob/master/web/ui/static/js/graph/index.js#L262 or the typeahead, but am having trouble reliably profiling exactly what is causing the pain. I found a small improvement in using a doc fragment for creating options, but nothing huge yet.

@SpencerMalone

This comment has been minimized.

Copy link
Author

SpencerMalone commented Apr 10, 2019

OK, I am pretty confident that this is the insertable metric selector. I'm gonna update the title.

Note that we fixed this for our instance, the tl;dr is that we had a cardinality explosion involving someone putting templated data into a metric name with a high cardinality, after fixing the problem and deleting the data, we were unaware that tombstone cleaning was required to remove the entries from the API method being called by the UI. Cleaning those fixed us up, BUT, I still think it's worth having an upper limit on the "insert metrics" dropdown selection.

@SpencerMalone SpencerMalone changed the title UI has lengthy page freeze on load / touching the expression text box after 2.8.0 upgrade Freeze page when working with a large amount of metrics due to no upper limit on insertable metric dropdown. Apr 10, 2019

@SpencerMalone SpencerMalone changed the title Freeze page when working with a large amount of metrics due to no upper limit on insertable metric dropdown. Freeze on graph page when working with a large amount of metrics due to no upper limit on insertable metric dropdown. Apr 10, 2019

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Apr 10, 2019

We already have a 10k limit as of 2.8, are you sure you're running at least that?

@SpencerMalone

This comment has been minimized.

Copy link
Author

SpencerMalone commented Apr 11, 2019

My understanding is that 2.8 limit was for the lookahead stuff, correct? These heavy pauses lined up with the amount of dom nodes inserted in the code block...

        pageConfig.allMetrics = json.data; // todo: do we need self.allMetrics? Or can it just live on the page
        for (var i = 0; i < pageConfig.allMetrics.length; i++) {
          self.insertMetric[0].options.add(new Option(pageConfig.allMetrics[i], pageConfig.allMetrics[i]));
        }

at https://github.com/prometheus/prometheus/blob/master/web/ui/static/js/graph/index.js#L274 for the insert metric at cursor dropdown population

When I used the chrome debugger to artificially limit the amount of data allowed in that loop, the pause decreased dramatically. The hardest thing is communicating that the amount of metrics populated into that dropdown has been limited. I was looking at something like...

        pageConfig.allMetrics = json.data; // todo: do we need self.allMetrics? Or can it just live on the page
        insertMetric = self.insertMetric[0]
        var length = json.data.length;
        if (length > 50000) {
          length = 0
          insertMetric.options[0].text = "- disabled due to volume (too many metric names)"
        }

  
        var fragment = document.createDocumentFragment();
  
        for (var i = 0; i < length; i++) {
          var el = document.createElement('option');
          el.value = json.data[i]
          el.text = json.data[i]
          fragment.appendChild(el);
        }

        insertMetric.appendChild(fragment)

But that totally disables that functionality when this occurs. Is that OK?

It ends up looking like so:
Screen Shot 2019-04-10 at 8 37 37 PM
Screen Shot 2019-04-10 at 8 37 40 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.