Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upManaged concurrency #923
Conversation
matklad
changed the title
rManaged concurrency
Managed concurrency
Jul 1, 2018
matklad
force-pushed the
matklad:managed-concurrency
branch
from
ba80d25
to
8d33e48
Jul 1, 2018
This comment has been minimized.
This comment has been minimized.
|
Thread sleep is inelegant, but is it a big problem in tests? You wait a bit for what you expect and fail if it never arrives, or arrives wrongly. The absolute worst case is the tests can take a while to fail, and the current timeout seems a little high. But I'm not sure it shows that we lack good control of concurrency in RLS. Here I'd be tempted to define "good" as sufficient. Shouldn't we add this ability only when we have a production code need for it? The other issue is I think we'd generally want to move the feature tests out into integration, or whatever you want to call it, style tests. Tests that start rls as a process and talk to it like a lsp client. If that's still true then in-process hooks won't be of any use to the tests. |
This comment has been minimized.
This comment has been minimized.
It certainly is a problem: in #918 (comment), quite some cycles were spent on figuring out that test fails on CI but not locally due to the timing issue, and not because this is some windows specific behavior. In other projects I've worked on unrestricted concurrency also posed challenges to writing robust and fast tests. In general, I strongly believe that logical clocks and causality should be used instead of timeouts if possible, and that APIs should be designed in such a way that using logical clocks is possible (i.e, if something spawns a concurrent thread of execution, it should return a handle which can be used to wait for that thread to finish). I don't have concrete strong arguments here except that it,... feels right I guess? :)
I agree that we should add tests that literally spawn the RLS process and communicate with it via standard streams: currently this layer seems untested? However, while I in general a strong believer in integration tests, I personally don't think that moving all feature tests to integration testing will be a good idea, for two reasons:
Long-term, I think we should split RLS into the "rls-library" layer, which is more or less oblivious of the details of the LSP and "rls-lsp binary", which wraps the API of the library into a specific protocol. That way, we'll have a smallish number of integration tests for binary, which spawn process and check that output indeed conforms to LSP, and most feature tests would be done on the library level. |
This comment has been minimized.
This comment has been minimized.
|
Well I do agree that eliminating the messaging timeout issue in the tests would be really nice. I'm just not sure keeping track of the jobs with this much extra prod-code is worth it. For me the test code is there to make the prod code better, this feels like the other way around. On the other hand perhaps it will serve the greater good by allowing better testing, and I do feel rls test code isn't really good enough at the moment. |
matklad
referenced this pull request
Jul 2, 2018
Closed
Number of progress messages seems undeterministic #925
This comment has been minimized.
This comment has been minimized.
Agree that this brings-in a significant amount of "clever" code with tricks like uninhabited enums and drop-bombs, which is a big drawback. A simpler alternative is possible: instead of keeping track of jobs, we can just drain all the work queues explicitly. That is, in I think that explicit tracking of all concurrent tasks is better, long term, than tracking of potential sources of concurrency, although it's more code:
|
This comment has been minimized.
This comment has been minimized.
|
@nrc curious what do you think about the problem in general and this specific approach? Is it something reasonable, or should we just stick with |
This comment has been minimized.
This comment has been minimized.
|
Sorry for the delay. My initial opinions (I haven't looked at the code yet):
I do agree with this sentiment too - there is something to be said for 'if it ain't broke, don't fix it'. |
nrc
reviewed
Jul 15, 2018
| @@ -73,7 +74,7 @@ impl BlockingNotificationAction for DidOpenTextDocument { | |||
| } | |||
|
|
|||
| impl BlockingNotificationAction for DidChangeTextDocument { | |||
| fn handle<O: Output>(params: Self::Params, ctx: &mut InitActionContext, out: O) -> Result<(), ()> { | |||
| fn handle<O: Output>(params: Self::Params, jobs: &mut Jobs, ctx: &mut InitActionContext, out: O) -> Result<(), ()> { | |||
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
matklad
Jul 16, 2018
Author
Member
Moved jobs to the Context. Initially I've avoided that because Context is clone, and so Arc<Mutex> is required. And having explicit &mut Jobs argument and -> ConcurrentJob return types gives better idea about when concurrency happens. However, that is indeed a lot of extra typing compared to just stuffing jobs into the ctx.
| expect_messages( | ||
| results.clone(), | ||
| &[ | ||
| ExpectedMessage::new(Some(0)).expect_contains(r#""codeLensProvider":{"resolveProvider":false}"#), |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
|
||
| impl Drop for ConcurrentJob { | ||
| fn drop(&mut self) { | ||
| if self.is_abandoned || self.is_completed() || thread::panicking() { |
This comment has been minimized.
This comment has been minimized.
nrc
Jul 15, 2018
Member
Why don't we panic on abandoned jobs? Seems like a bad thing if Jobs is dropped without completing the jobs
This comment has been minimized.
This comment has been minimized.
matklad
Jul 16, 2018
Author
Member
Hm, indeed, for normal termination process it is reasonable to require that all the jobs are awaited for!
Still not sure what's the best choice for abnormal termination (some thoughts in af2534f), but probably just abandoning jobs is fine, as things have gone south anyway.
matklad
force-pushed the
matklad:managed-concurrency
branch
from
8d33e48
to
ad66472
Jul 16, 2018
This comment has been minimized.
This comment has been minimized.
|
I think we should do this. I'm pretty excited to improve our flaky tests. Once we land it I'll publish a new version and that should show up any bugs pretty soon. @matklad could you rebase and address the review comments, then I'll merge. Thanks! |
This comment has been minimized.
This comment has been minimized.
|
@nrc rebased before your comment :) Fixing the comments now! |
matklad
force-pushed the
matklad:managed-concurrency
branch
3 times, most recently
from
ee1b647
to
4962bb7
Jul 16, 2018
This comment has been minimized.
This comment has been minimized.
|
I think I've addressed all comments. |
This comment has been minimized.
This comment has been minimized.
|
CI should be fixed by #944 |
This comment has been minimized.
This comment has been minimized.
matklad
force-pushed the
matklad:managed-concurrency
branch
from
4962bb7
to
dd1f805
Jul 16, 2018
This comment has been minimized.
This comment has been minimized.
|
Rebased! |
This comment has been minimized.
This comment has been minimized.
|
bors try We've tried to setup bors here, let's see if is working... |
This comment has been minimized.
This comment has been minimized.
|
Existing reviewers: click here to make matklad a reviewer |
This comment has been minimized.
This comment has been minimized.
|
I get a bors try cc @nrc |
This comment has been minimized.
This comment has been minimized.
|
Existing reviewers: click here to make Xanewok a reviewer |
This comment has been minimized.
This comment has been minimized.
|
Sorry about that. I updated my bors-ng instance's code and synced reviewer+members with people on this repo who have push permission. Try again? |
This comment has been minimized.
This comment has been minimized.
|
bors try (I don't need the @?) |
bors-voyager bot
added a commit
that referenced
this pull request
Jul 16, 2018
This comment has been minimized.
This comment has been minimized.
tryBuild succeeded |
This comment has been minimized.
This comment has been minimized.
|
@nrc yep, this the another bors, it does’t need an @. And you need bors r+ to do the merge. I’ve tried try just to check if bors is listening :-) |
This comment has been minimized.
This comment has been minimized.
|
bors r+ |
bors-voyager bot
added a commit
that referenced
this pull request
Jul 17, 2018
This comment has been minimized.
This comment has been minimized.
Build failed |
This comment has been minimized.
This comment has been minimized.
|
Ok, one more round of clippy bumping! |
matklad
force-pushed the
matklad:managed-concurrency
branch
from
dd1f805
to
833a2a9
Jul 17, 2018
This comment has been minimized.
This comment has been minimized.
|
Resolved conflict, squashing changes to a single commit in the process. |
nrc
merged commit f7b4a9b
into
rust-lang:master
Jul 18, 2018
This comment has been minimized.
This comment has been minimized.
|
Nice! Didn’t mean to add you extra work with that 2018 edition commit... |
matklad commentedJun 30, 2018
Hi!
In tests, we currently do synchronization via
thread::sleep. It is rather problematic by itself, but it also shows that we don't have a very good control of concurrency inside RLS :(I think this is a big problem: RLS is going to be a highly concurrent application, so it's better to establish reliable concurrency practices from the start. I don't have too much experience with designing concurrent though, so I am not sure if the proposed approach of dealing with the problem is not insane :D
In my experience, a lot of problems with managing concurrency stem from "fire an forget" approach: if a function schedules some work to be executed outside the dynamic extent of the function call itself (for example, by spawning a thread, or by submitting the work to some queue), and doesn't return a handle to the yet-to-be-computed result, then such function will be impossible to test without resorting to busy-waiting.
Current PR proposes to use a future-like
ConcurrentJobobject to keep track of tasks. Any background operation returns this object, which can be used to wait for that operation to finish. DroppingConcurrentJobwithout waiting for the result in general results in a panic, to protect from accidentally forgetting about a background operation.The end result is that
LsServicenow has await_for_background_jobswhich can be used to make sure that all background ops are finished.