Minimal code changes run rls at 500% CPU for minutes #688

sfackler · 2018-01-29T16:04:31Z

Clone https://github.com/sfackler/typed-headers
Write a line of code somewhere.
Watch the rls binary run at 500-600% CPU for 1-2 minutes.

I'm using rls-preview 0.124.0-nightly (511321a 2018-01-23) from the latest nightly. Did request cancellation break maybe?

The text was updated successfully, but these errors were encountered:

sfackler · 2018-01-29T16:07:57Z

I tried running with RUST_LOG=debug, but it produced so much output that vscode basically soft locked. It seemed to be a bunch of stuff from racer though.

sfackler · 2018-01-30T14:34:50Z

This also appears to be the case for the rls that's shipped with the current stable release.

nrc · 2018-02-01T05:04:22Z

So, I tried to repro but Cargo couldn't make the patch for http work, so I removed that from the Cargo.toml. On the altered project I couldn't repro - I got 120% CPU for a short time - 30 sec or less. (I'm using master, but I don't think much has changed compared with 0.124).

One thing that changed somewhat recently is that if there is not a build running, then we will start one immediately on typing instead of waiting for some time (we only wait on subsequent builds). We still don't run more that one build at once (though we may be running a build at the same time as indexing the previous build). We've never been able to cancel builds once they start.

nrc · 2018-02-05T01:37:02Z

I have a hypothesis for something which might cause this. Builds in the RLS are a cargo check followed by post-processing the generated save-analysis data. The former is usually much longer than the latter. The way we queue builds is that we never start one until the current one has finished and we wait for a timeout before starting one. However, we don't queue the save-analysis indexing - we start that as soon as the build is finished and can do it in parallel with another build.

It is possible that if you have a crate where the indexing time is greater than the build time, then you could end up with lots of indexing happening at the same time, even though multiple builds are not. I wonder if something like that is happening here?

@sfackler could you run with RUST_LOG=rls=debug please? That would avoid getting spammed by Racer or the compiler.

sfackler · 2018-02-05T02:29:47Z

Will do, but I won’t have a chance until Wednesday

…

On Sun, Feb 4, 2018 at 5:37 PM Nick Cameron ***@***.***> wrote: I have a hypothesis for something which might cause this. Builds in the RLS are a cargo check followed by post-processing the generated save-analysis data. The former is usually much longer than the latter. The way we queue builds is that we never start one until the current one has finished and we wait for a timeout before starting one. However, we don't queue the save-analysis indexing - we start that as soon as the build is finished and can do it in parallel with another build. It is possible that if you have a crate where the indexing time is greater than the build time, then you could end up with lots of indexing happening at the same time, even though multiple builds are not. I wonder if something like that is happening here? @sfackler <https://github.com/sfackler> could you run with RUST_LOG=rls=debug please? That would avoid getting spammed by Racer or the compiler. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#688 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABY2UX_n0KFHT7-dWMlbLT9BRskK1soBks5tRls_gaJpZM4Rw5tW> .

alexheretic · 2018-02-06T18:59:46Z

I can reproduce this behaviour. The culprit in my case is racer. For this to happen racer completion must take far longer than our request-timeout, so medium/large crates. After timeout these tasks remain running in their thread. Though we do limit the max tasks to num_cpus, this does technically allow textDocument/completion spam in such crates to cause 100% cpu for as long as these racer tasks take.

This is how I reproduce it by rapidly typing qwer qwer qwer qwer qwer qwer qwer qwer etc causing my client to span completion requests to the server.

First balm for the wound will be failing requests as soon as our request work pool is full (currently they'll just stack up). This will at least bound the 100% cpu time to the completion of N requests. I'll raise a pr to do this, and log a warning of what tasks have consumed the work pool. This will also help confirm my suspicion that racer is causing this in general.

Really we'd want to be able to cancel long running threads, like racer requests and builds.

nrc · 2018-02-07T01:13:50Z

Interesting, thanks for investigating!

Instead of/as well as limiting the number of active requests we should probably limit the number of active Racer tasks, probably to 1 (like we do for builds). That would allow other requests without us getting bogged down in Racer jobs. I hope we can do that without a whole queuing infrastructure like we do for builds...

alexheretic · 2018-02-08T10:52:48Z

Unlike the builds where we have to deal with slowness, request actions are expected to be fast because they're generally useless unless that's true. Because of this expectation we time out such requests. Without cancellation we are faced with either blocking future tasks until the timed out task finishes, or starting the new ones additionally trusting the timed out tasks to finish before long.

We go with the latter, #702 then is about mitigating worst edge cases that can cause this issue. I was thinking it was better to address this generally rather than racer specific rules. Racer shouldn't be so slow & intensive right?

The #702 rules in general shouldn't be hit, for example for textDocument/completion to be failed on a 4 thread cpu we'd need 2 tasks already running. Since each task is waited on for 1.5 seconds that means one of those tasks has been running at least 3 seconds and the other 1.5 seconds. New thinking is: Once a task is taking so long it becomes acceptable shift our trust and cancel future tasks of the same type.

For racer to cause this issue (6 thread utilisation) on current logic then a racer task had to running for more than 7.5-9 seconds. #702 will help, but after merging can we find out why racer is so slow & intensive?

alexheretic · 2018-02-08T13:46:31Z

After I added the long-task logging I see:

WARN 2018-02-08T13:27:37Z: rls::actions::requests: `textDocument/completion` took 62.5s
WARN 2018-02-08T13:27:36Z: rls::actions::requests: `textDocument/completion` took 62.7s

So beyond #702 we need to get to the bottom of why racer takes so long. Maybe in the meantime add logic to disable racer completion functionality after we detect it's slow?

nrc · 2018-02-11T22:07:53Z

cc @jwilm since this is a Racer issue

nrc · 2018-02-11T22:11:25Z

Without cancellation we are faced with either blocking future tasks until the timed out task finishes

This seems pretty fine as a throttling mechanism to me. It effectively detects slow runs of Racer and prevents it running - it is not optimal since we are still running Racer and thus using some CPU, but we shouldn't really crush multiple cores because we're only launching one task at a time.

I was thinking it was better to address this generally rather than racer specific rules

I'm in favour of both general and specific mechanisms here - I don't think it is over-specific - we only outsource work to rustc, Racer, and Rustfmt, so I think specific mechanisms for all three are still OK, complexity-wise.

alexheretic · 2018-02-12T12:14:23Z

This seems pretty fine as a throttling mechanism to me. It effectively detects slow runs of Racer and prevents it running - it is not optimal since we are still running Racer and thus using some CPU, but we shouldn't really crush multiple cores because we're only launching one task at a time.

Yep we can absolutely do better with what we have, that's what #702 is about. But cancellation just seems such a good fit for Rls, that's why I keep harping on about it. At least it's something to revisit when a decent solution exists in std or a multiplatform crate.

I'm in favour of both general and specific mechanisms here

Fair enough. It is worth noting that on some projects racer isn't slow and our assumptions work. I don't think it was particularly wrong that we have a optimistic "tasks should finish fast" approach where we leave timed out tasks on the pool and expect them not to live long.

That's why I think it's ok to have 2 concurrent racer-completion tasks running. When the 2nd one runs and finds another already running it means one racer task has timed out. Optimistically we should assume that the first will finish soon and go ahead with the new request (not generally wanting to cancel requests). However, when a 3rd request comes in and we have 2 ongoing tasks we can be much more sure there's an issue and fast-fail the request.

In this way we handle when completion is fast, when completion is roughly equal to the timeout and when completion is slow. Worst case: 2 cores are used for the duration of a single racer execution, while typing.

I think this reasoning will apply to every kind of request.

But of course if we can improve the worst case without harming the others all the better!

Maybe we should dynamically disable completion when we see it taking >N seconds? Because when racer is taking, say 5 seconds, you can't really use it and it's likely to be taking up a couple of cores.
Maybe we should allow disabling completion in config, so people having issues can manually remove functionality to address cpu usage?

mooman219 · 2018-03-02T22:16:35Z

I hate to +1 with a comment, but this bug is extremely frustrating to the point where I cannot edit projects using the latest RLS build. I've reverted back to the 01-01-2018 build where I only occasionally get the issue. Restarting the RLS temporarily resolved the issue.

jedisct1 · 2018-03-02T22:19:11Z

I completely stopped using RLS. It happens within the first minute, so RLS has become unusable for me.

sfackler · 2018-03-02T22:27:16Z

I still see this when working on typed-headers - it's just pinned at 200% CPU now due to #702.

nrc · 2018-03-02T22:46:47Z

Maybe in the meantime add logic to disable racer completion functionality after we detect it's slow?

We should think about how/if we can do this. It will be pretty annoying to disable Racer since it means no more code completion, but that seems better than no code completion and all of the CPU. What I'm trying to say is we need to be careful not to get too many false positives.

alexheretic · 2018-03-02T22:59:50Z

We should think about how/if we can do this. It will be pretty annoying to disable Racer since it means no more code completion, but that seems better than no code completion and all of the CPU. What I'm trying to say is we need to be careful not to get too many false positives.

Yes we need to be careful with disabling autocomplete automatically, but it would be useful if we can do it well.

However, having a Rls config option for autocomplete would be fine and allow manual disabling for large projects where autocomplete isn't behaving anyway.

# atom rls.toml
autocomplete = false

On another front, has anyone raised an issue for racer to try to improve this upstream? The issue should be reproducible with a more basic/direct racer plugin.

What's the status of autocomplete without racer, I'm guessing that's still far off?

cynecx · 2018-03-03T01:08:26Z

Sorry if this isn't the right place to ask this and I am not exactly familiar how rls works internally but why is rls actually using racer? Why can't rls "just use" the data from rustc? I am not quite sure what the status of racer is, but it seems quite buggy (it can't auto-complete features which it cannot parse, I assume?, see #588, it also seems like it uses an really old version of syntex_syntax, which may be the cause of some parsing issues?).

nrc · 2018-03-03T20:53:34Z

@cynecx basically because we can't compile the program fast enough - once the user is typing we need to compile the program and calculate the code completion results fast enough to be useful (apx <= 0.25s). We can't get anywhere near that right now. Hopefully as incremental compilation gets better plus some other changes, we will be able to.

mooman219 · 2018-03-03T22:18:46Z

I'm very much for a manual option to disable code completion. There's still a lot of value in error/warning highlighting that I can't get right now because code completion is preventing that.

alexheretic · 2018-03-04T01:57:39Z

I actually cloned typed-headers and wow this project really suffers from this, you weren't joking @sfackler.

I can confirm that the culprit is indeed racer completion. It's the first two requests which time out and keep running for ... I waited a long time to see how long they take. I think this project may somehow cause an infinite loop in racer. This should be reported on https://github.com/racer-rust/racer.

When I restarted Rls with racer-completion disabled, #747 style, the problem vanished. That projects like this can be such hard edges for racer is good enough reason to provide the config option.

Provides manual workaround for rust-lang#688

alexheretic · 2018-03-05T10:28:23Z

With #747 now merged the racer_completion config option should be available with the next rls release.

So, for example, in atom a line in your rls.toml with the following would disable racer completion.

# rls.toml for atom
racer_completion = false

nrc · 2018-06-21T08:35:13Z

I assume this is fixed by the last racer update, if not let me know.

sfackler changed the title ~~Adding a new function to a crate pegs rls at 500% CPU for minutes~~ Minimal code changes run rls at 500% CPU for minutes Jan 29, 2018

nrc added the bug label Feb 1, 2018

alexheretic mentioned this issue Feb 6, 2018

Address work-pool DoS with long running tasks #702

Merged

alexheretic added a commit to alexheretic/rls that referenced this issue Mar 4, 2018

Add racer_completion config

e16263f

Provides manual workaround for rust-lang#688

alexheretic mentioned this issue Mar 4, 2018

Add racer_completion config #747

Merged

alexheretic mentioned this issue Apr 20, 2018

RLS uses 25% CPU all the time and responds slowly #833

Closed

jasonwilliams mentioned this issue Apr 20, 2018

Debounce RLS invocations to make it less annoying rust-lang/rust-enhanced#271

Open

nrc added the racer an issue caused by or could be solved by Racer label Jun 21, 2018

nrc closed this as completed Jun 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minimal code changes run rls at 500% CPU for minutes #688

Minimal code changes run rls at 500% CPU for minutes #688

sfackler commented Jan 29, 2018

sfackler commented Jan 29, 2018

sfackler commented Jan 30, 2018

nrc commented Feb 1, 2018

nrc commented Feb 5, 2018

sfackler commented Feb 5, 2018 via email

alexheretic commented Feb 6, 2018

nrc commented Feb 7, 2018

alexheretic commented Feb 8, 2018 •

edited

alexheretic commented Feb 8, 2018

nrc commented Feb 11, 2018

nrc commented Feb 11, 2018

alexheretic commented Feb 12, 2018 •

edited

mooman219 commented Mar 2, 2018

jedisct1 commented Mar 2, 2018

sfackler commented Mar 2, 2018

nrc commented Mar 2, 2018

alexheretic commented Mar 2, 2018 •

edited

cynecx commented Mar 3, 2018

nrc commented Mar 3, 2018

mooman219 commented Mar 3, 2018

alexheretic commented Mar 4, 2018 •

edited

alexheretic commented Mar 5, 2018

nrc commented Jun 21, 2018

Minimal code changes run rls at 500% CPU for minutes #688

Minimal code changes run rls at 500% CPU for minutes #688

Comments

sfackler commented Jan 29, 2018

sfackler commented Jan 29, 2018

sfackler commented Jan 30, 2018

nrc commented Feb 1, 2018

nrc commented Feb 5, 2018

sfackler commented Feb 5, 2018 via email

alexheretic commented Feb 6, 2018

nrc commented Feb 7, 2018

alexheretic commented Feb 8, 2018 • edited

alexheretic commented Feb 8, 2018

nrc commented Feb 11, 2018

nrc commented Feb 11, 2018

alexheretic commented Feb 12, 2018 • edited

mooman219 commented Mar 2, 2018

jedisct1 commented Mar 2, 2018

sfackler commented Mar 2, 2018

nrc commented Mar 2, 2018

alexheretic commented Mar 2, 2018 • edited

cynecx commented Mar 3, 2018

nrc commented Mar 3, 2018

mooman219 commented Mar 3, 2018

alexheretic commented Mar 4, 2018 • edited

alexheretic commented Mar 5, 2018

nrc commented Jun 21, 2018

alexheretic commented Feb 8, 2018 •

edited

alexheretic commented Feb 12, 2018 •

edited

alexheretic commented Mar 2, 2018 •

edited

alexheretic commented Mar 4, 2018 •

edited