Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixes in async_semaphore for concurrency ID calculation #10436

Merged
merged 2 commits into from Jul 24, 2020

Conversation

gshuflin
Copy link
Contributor

@gshuflin gshuflin commented Jul 23, 2020

Problem

This is an attempt to fix an issue where the concurrency ID passed into processes executed under the BoundedCommandRunner is being limited to a smaller set of IDs than the limit of the BoundedCommandRunner.

We were previously using the (usize) value of available_permits on the inner, locked portion of the AsyncSemaphor data structure that BoundedCommandRunner uses internally, as the value for the concurrency ID. However, this resulted in undesirable behavior where IDs passed to processes would count down from the local concurrency limit value (e.g. 16) to 1, and then never rise above 1 again. Every time a task finished, it would increment available_permits from 0 to 1, and then wake up the next process waiting on the queue, which would decrement available_permits again when it began to run and use 1 as its ID.

There was also an issue with always waking up the task in the front position in the queue. In the case where a large number of processes were all scheduled at once, they might all call wake_by_id on the same waiting task, meaning that any other waiting tasks after the first would wouldn't get awoken until either the first task, or a newly-scheduled task, completed and woke up tasks in the queue beyond the first.

Solution

To solve the concurrency ID issue, we add a new queue data structure to Inner, that keeps track of discrete IDs. When a permit to run is issued (in PermitFuture::poll), an ID is popped from the front of this queue, and when a task completes (in Permit::drop) that ID is pushed onto the back of the queue. So, a limited number of IDs will always be cycling through the AsyncSemaphor as long as it is running, and we won't see the behavior. This change also has the effect of causing IDs to count up from 1 rather than down from the local concurrency limit, which is only an implementation detail, since these IDs are meant to be opaque identifiers.

When an asynchronous task completes, instead of calling wake_by_ref() on the waiter at the front of the queue, use available_permits to index into the waiters queue and wake up a task. This ensures that each completing task will wake up at least one yet-unwoken waiting task (if there are any such waiting tasks), which will cause those waiting tasks to get scheduled immediately and take up a concurrency slot.

Also, a debug logger with the concurrency ID is added in this commit, to facilitate debugging any further concurrency-ID-related issues we may run into.

@gshuflin gshuflin changed the title WIP testing async_semaphor: wake more than a single waiting process on Drop Jul 23, 2020
@gshuflin gshuflin requested review from tdyas and stuhood July 23, 2020 21:23
@gshuflin gshuflin marked this pull request as ready for review July 23, 2020 21:23
@coveralls
Copy link

coveralls commented Jul 23, 2020

Coverage Status

Coverage remained the same at 0.0% when pulling 0d623e9 on gshuflin:fixing_concurrency_id into ef79642 on pantsbuild:master.

@gshuflin gshuflin changed the title async_semaphor: wake more than a single waiting process on Drop fixes in async_semaphor for concurrency ID caculation Jul 23, 2020
@gshuflin
Copy link
Contributor Author

As an illustration of the changes this patch makes, here is some log output from pants running ./pants --no-pantsd --local-execution-root-dir=/home/gregs/large-partition/ test src/python/pants:: on my system with "info" level logging of the concurrency ID:

Old:
old-code.log

New:
newer-code.log

@stuhood stuhood changed the title fixes in async_semaphor for concurrency ID caculation fixes in async_semaphore for concurrency ID caculation Jul 24, 2020
@stuhood stuhood changed the title fixes in async_semaphore for concurrency ID caculation fixes in async_semaphore for concurrency ID calculation Jul 24, 2020
Copy link
Sponsor Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

There was also an issue with always waking up the task in the front position in the queue. In the case where a large number of processes were all scheduled at once, they might all call wake_by_id on the same waiting task, meaning that any other waiting tasks after the first would wouldn't get awoken until either the first task, or a newly-scheduled task, completed and woke up tasks in the queue beyond the first.

Good fix. But IMO, we should really, really switch to tokio semaphore. I'm fine with not doing it in this PR as long as you're confident in the tests.

@@ -158,7 +166,8 @@ impl Future for PermitFuture {
// queue, so we don't have to waste time searching for it in the Drop
// handler.
self.waiter_id = None;
Some(inner.available_permits)
let id = inner.available_ids.pop_front().unwrap_or(0); // The unwrap_or case should never happen.
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be less error prone for this to either:

  1. replace the inner.available_permits counter entirely, so that we can be sure that we don't promise a permit unless there is an available_id.
  2. be decoupled from the semaphore and completely independent (such that it generated tokens lazily if it was empty, counting upward or something).

But if you trust that the test is sufficient, it's not a blocker per-se.

src/rust/engine/process_execution/src/lib.rs Outdated Show resolved Hide resolved
@stuhood stuhood merged commit 4441dd3 into pantsbuild:master Jul 24, 2020
@gshuflin gshuflin deleted the fixing_concurrency_id branch July 24, 2020 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants