Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force sequential runs under pantsd #7781

Merged

Conversation

Projects
None yet
4 participants
@blorente
Copy link
Contributor

commented May 21, 2019

Problem

It's not currently safe to have multiple parallel pants runs with the daemon enabled.
However, this was not enforced (context: #7751).

Solution

  • Implement a lock in PailgunServer, so that each request thread waits to acquire it (in PailgunServer.process_request_thread).
  • Implement an option that is passed via the environment, to each request, to communicate how long should a request wait for the lock before timing out.

Result

In the cases where a single pants run is running at a time, the UX is unchanged.
When a request is run, we have three cases:

  • If we specify a positive timeout (or if we leave it by default), and the request finishes before we time out, the request waits for a bit and then runs:
$ ./pants --enable-pantsd --pantsd-run-start-timeout=20 list src/scala::
Another pants run was found, starting waiting for up to 20.0s for it to finish
Waiting for request to finish (1.0s total)...
Waiting for request to finish (2.0s total)...
Waiting for request to finish (3.0s total)...
Waiting for request to finish (4.0s total)...
Waiting for request to finish (5.0s total)...
Waiting for request to finish (6.0s total)...
Waiting for request to finish (7.0s total)...
Waiting for request to finish (8.0s total)...
Waiting for request to finish (9.0s total)...
src/scala/org/pantsbuild/zinc/cache:cache
src/scala/org/pantsbuild/zinc/analysis:analysis
src/scala/org/pantsbuild/zinc/options:options
src/scala/org/pantsbuild/zinc/compiler:compiler
src/scala/org/pantsbuild/zinc/extractor:extractor
src/scala/org/pantsbuild/zinc/scalautil:scalautil
src/scala/org/pantsbuild/zinc/bootstrapper:bootstrapper
src/scala/org/pantsbuild/zinc/util:util
  • If we specify a timeout and hit it, the run will fail with a useful error:
$ ./pants --enable-pantsd --pantsd-run-start-timeout="0" list ::
Another pants run was found, starting waiting for up to 0.0s for it to finish
Traceback (most recent call last):
  File "/Users/bescobar/workspace/otherpants/src/python/pants/pantsd/pailgun_server.py", line 259, in process_request_thread
    with self.ensure_request_is_exclusive(environment, request):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py", line 112, in __enter__
    return next(self.gen)
  File "/Users/bescobar/workspace/otherpants/src/python/pants/pantsd/pailgun_server.py", line 246, in ensure_request_is_exclusive
    raise ExclusiveRequestTimeout("Timed out while waiting for another pants run to finish")
pants.pantsd.pailgun_server.ExclusiveRequestTimeout: Timed out while waiting for another pants run to finish
  • If we specify a negative timeout, the request will block potentially forever:
$ ./pants --enable-pantsd --pantsd-run-start-timeout=-1 list src/scala::
Another pants run was found, starting waiting forever for it to finish
Waiting for request to finish (1.0s total)...
Waiting for request to finish (2.0s total)...
Waiting for request to finish (3.0s total)...
Waiting for request to finish (4.0s total)...
Waiting for request to finish (5.0s total)...
Waiting for request to finish (6.0s total)...
Waiting for request to finish (7.0s total)...
Waiting for request to finish (8.0s total)...
Waiting for request to finish (9.0s total)...
Waiting for request to finish (10.0s total)...
Waiting for request to finish (11.0s total)...
src/scala/org/pantsbuild/zinc/extractor:extractor
src/scala/org/pantsbuild/zinc/compiler:compiler
src/scala/org/pantsbuild/zinc/analysis:analysis
src/scala/org/pantsbuild/zinc/bootstrapper:bootstrapper
src/scala/org/pantsbuild/zinc/options:options
src/scala/org/pantsbuild/zinc/util:util
src/scala/org/pantsbuild/zinc/scalautil:scalautil
src/scala/org/pantsbuild/zinc/cache:cache

All of the new output is to stderr

Fixes #7751.
Commits should mostly make sense independently.

@blorente blorente force-pushed the blorente:blorente/7751/disallow-parallel-runs branch 2 times, most recently from f04ded1 to f25fa70 May 23, 2019

@blorente blorente changed the title [wip] Disallow parallel runs Disallow parallel runs May 23, 2019

@blorente blorente marked this pull request as ready for review May 23, 2019

@blorente

This comment has been minimized.

Copy link
Contributor Author

commented May 23, 2019

I have removed the WIP tag, this is ready for review.

I'm very interested in opinions about the UX, in particular about the usefulness of messages.

@Eric-Arellano
Copy link
Contributor

left a comment

Very sound improvement. Thank you.

Show resolved Hide resolved src/python/pants/option/global_options.py Outdated
Show resolved Hide resolved src/python/pants/option/global_options.py Outdated
Show resolved Hide resolved src/python/pants/option/global_options.py Outdated
Show resolved Hide resolved src/python/pants/option/global_options.py Outdated
Show resolved Hide resolved src/python/pants/pantsd/pailgun_server.py
Show resolved Hide resolved src/python/pants/pantsd/pailgun_server.py Outdated
Show resolved Hide resolved src/python/pants/pantsd/pailgun_server.py Outdated
Show resolved Hide resolved src/python/pants/pantsd/pailgun_server.py Outdated
Show resolved Hide resolved tests/python/pants_test/pantsd/test_pailgun_server.py Outdated
Show resolved Hide resolved tests/python/pants_test/pantsd/test_pailgun_server.py Outdated
@Eric-Arellano
Copy link
Contributor

left a comment

Thanks for the changes and especially for testing for determinism :)

Show resolved Hide resolved src/python/pants/bin/remote_pants_runner.py Outdated
@@ -273,6 +273,16 @@ def register_bootstrap_options(cls, register):
help='Create a new pantsd server, and use it, and shut it down immediately after. '
'If pantsd is already running, it will shut it down and spawn a new instance (Beta)')

# NB: We really don't want this option to invalidate the daemon, because different clients might have
# different needs. For instance, an IDE might have a very long timeout because it only wants to refresh
# a project in the background, while a user might want a shorter timeout for interactivity.

This comment has been minimized.

Copy link
@Eric-Arellano

Eric-Arellano May 23, 2019

Contributor

This note could be made a bit more clear what the takeaway is. Maybe something like:

NB: We intentionally set the default this high because a timeout will raise an exception that then invalidates the daemon. Still, we set this as an option because different clients have different needs, such as an IDE using a very long timeout because it only wants to refresh a project in the background vs. a user wanting a shorter timeout for interactivity.

However, question on this. Does it make sense for the examples to be based on different "clients," i.e. runtime contexts like IDE vs interactive user? I think the relevant context is how long the pants goals take to run, e.g. ./pants list might be really fast but ./pants test might take a long time. Whether you run Pants through your IDE or on the command line, it doesn't really matter; what matters is the goals you use.

This comment has been minimized.

Copy link
@blorente

blorente May 23, 2019

Author Contributor

So, a couple of points: This option doesn't invalidate the daemon, especially because it would be really annoying if the IDE is running something in the background, and a client makes a request with a different timeout and crashes the run of the IDE (or vice versa).

Then, the timeout should be set according to "what I expect the other runs are doing", not "how long this particular run is going to take". Since in general this is unknowable, I'm not sure how to phase it better. That said, I agree that 60s might be too long, I'm more than open to suggestions for other numbers.

This comment has been minimized.

Copy link
@stuhood

stuhood May 28, 2019

Member

Then, the timeout should be set according to "what I expect the other runs are doing", not "how long this particular run is going to take". Since in general this is unknowable, I'm not sure how to phase it better. That said, I agree that 60s might be too long, I'm more than open to suggestions for other numbers.

Our p90 is way past 60 seconds, so if anything, 60 isn't long enough. Something like 10 minutes might be better, but is easy to iterate on and/or override in our repo.

Show resolved Hide resolved src/python/pants/option/global_options.py

@blorente blorente changed the title Disallow parallel runs Force sequential runs under pantsd May 23, 2019

# NB: We really don't want this option to invalidate the daemon, because different clients might have
# different needs. For instance, an IDE might have a very long timeout because it only wants to refresh
# a project in the background, while a user might want a shorter timeout for interactivity.
register('--pantsd-start-invocation-timeout', advanced=True, type=float, default=60.0, daemon=False,

This comment has been minimized.

Copy link
@illicitonion

illicitonion May 24, 2019

Contributor

This option name initially made me think this was a timeout for pantsd to start (e.g. watchman init), not for the request to start.

Maybe we could name this --pantsd-multiple-invocation-start-timeout? Not great either, but maybe better?

This comment has been minimized.

Copy link
@blorente

blorente May 28, 2019

Author Contributor

'--pantsd-timeout-when-multiple-invocations', ? I not much better either, but I think this option should be rare enough that we should favour expressiveness over conciseness,

Show resolved Hide resolved src/python/pants/option/global_options.py Outdated
self.logger.debug("released request lock.")

time_polled = 0.0
polling_interval = 1.0 # check the lock every second

This comment has been minimized.

Copy link
@illicitonion

illicitonion May 24, 2019

Contributor

I don't know python's threading.Event, but 1s seems like a large polling interval; is there a reason not to drop this by an order of magnitude or two?

This comment has been minimized.

Copy link
@blorente

blorente May 28, 2019

Author Contributor

So the polling_interval should probably be called user_notification_interval, since we don't actually block for the whole second, we block until either the lock is released or a second has passed, and the notify the user of the timeout has been reached.

So this variable is purely user-facing (to know when to display the "Waiting for 1.0s for a thread to start..." sort of messages), as a request will immediately start when the lock is released.

Will change the name of the variable.

Show resolved Hide resolved src/python/pants/pantsd/pailgun_server.py Outdated
Show resolved Hide resolved src/python/pants/pantsd/pailgun_server.py Outdated
Show resolved Hide resolved tests/python/pants_test/pantsd/test_pailgun_server.py Outdated
def yield_and_release(time_waited):
try:
self.logger.debug("request lock acquired {}.".format("on the first try" if time_waited == 0 else "in {} seconds".format(time_waited)))
self.free_to_handle_request_lock.clear()

This comment has been minimized.

Copy link
@illicitonion

illicitonion May 24, 2019

Contributor

I don't know threading.Event, but from reading its documentation it seems like nothing guarantees that only one thread can clear an Event at a time. I suspect you may either need to either guard the clearing in a lock, or use a threading.Condition (or possibly threading.Semaphore, or just a threading.Lock with blocking=False) to ensure that only one thread can be woken up at a time.

It would be good to change the test to coordinate 3 or 10 threads, rather than 2, to make sure this works for n>2.

This comment has been minimized.

Copy link
@blorente

blorente May 28, 2019

Author Contributor

Changed for a temporary implementation of Lock with Conditional and a boolean. This should be gone after Py3 is the only thing we care about, but for now it will do.

@blorente

This comment has been minimized.

Copy link
Contributor Author

commented May 24, 2019

I'm still addressing comments, might want to hold off on reviews.

@blorente blorente force-pushed the blorente:blorente/7751/disallow-parallel-runs branch from cc7b7f5 to 7232e33 May 28, 2019

@blorente blorente requested a review from illicitonion May 28, 2019

@blorente

This comment has been minimized.

Copy link
Contributor Author

commented May 28, 2019

Commits should be independenty reviewable (probablly won't run independently, but they're good units of work)

@illicitonion
Copy link
Contributor

left a comment

Works for me, if CI's green! (Modulo comments :))

Show resolved Hide resolved tests/python/pants_test/pantsd/test_pailgun_server.py Outdated
Show resolved Hide resolved tests/python/pants_test/pantsd/test_pailgun_server.py Outdated
if self.threads_running == 0:
self.threads_running_cond.notify_all()
else:
while not self.threads_running == 0:

This comment has been minimized.

Copy link
@illicitonion

illicitonion May 28, 2019

Contributor
Suggested change
while not self.threads_running == 0:
while self.threads_running != 0:
Show resolved Hide resolved tests/python/pants_test/pantsd/test_pailgun_server.py Outdated
Show resolved Hide resolved tests/python/pants_test/pantsd/test_pailgun_server.py

@blorente blorente force-pushed the blorente:blorente/7751/disallow-parallel-runs branch from fe0155f to c5c9f5a May 28, 2019

@blorente

This comment has been minimized.

Copy link
Contributor Author

commented May 28, 2019

Please do not stop any of these CI runs, I need them both. If the first one succeeds, we don't need to blacklist the test.

@stuhood stuhood added this to the 1.16.x milestone May 28, 2019

@stuhood
Copy link
Member

left a comment

Thanks, looks great!

@@ -273,6 +273,16 @@ def register_bootstrap_options(cls, register):
help='Create a new pantsd server, and use it, and shut it down immediately after. '
'If pantsd is already running, it will shut it down and spawn a new instance (Beta)')

# NB: We really don't want this option to invalidate the daemon, because different clients might have
# different needs. For instance, an IDE might have a very long timeout because it only wants to refresh
# a project in the background, while a user might want a shorter timeout for interactivity.

This comment has been minimized.

Copy link
@stuhood

stuhood May 28, 2019

Member

Then, the timeout should be set according to "what I expect the other runs are doing", not "how long this particular run is going to take". Since in general this is unknowable, I'm not sure how to phase it better. That said, I agree that 60s might be too long, I'm more than open to suggestions for other numbers.

Our p90 is way past 60 seconds, so if anything, 60 isn't long enough. Something like 10 minutes might be better, but is easy to iterate on and/or override in our repo.

self.logger.debug("released request lock.")

time_polled = 0.0
user_notification_interval = 1.0 # Stop polling to notify the user every second.

This comment has been minimized.

Copy link
@stuhood

stuhood May 28, 2019

Member

5 seconds might be more reasonable given that this could be multiple minutes of waiting.

@stuhood

This comment has been minimized.

Copy link
Member

commented May 28, 2019

Both CI runs passed. I'll merge without the blacklist commit in a few hours if you don't do that first. Thanks!

@stuhood stuhood force-pushed the blorente:blorente/7751/disallow-parallel-runs branch from 9617409 to c5c9f5a May 29, 2019

@stuhood stuhood merged commit bd5e70d into pantsbuild:master May 29, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

stuhood added a commit that referenced this pull request May 29, 2019

Force sequential runs under pantsd (#7781)
It's not currently safe to have multiple parallel pants runs with the daemon enabled.
However, this was not enforced (context: #7751).

- Implement a lock in `PailgunServer`, so that each request thread waits to acquire it (in `PailgunServer.process_request_thread`).
- Implement an option that is passed via the environment, to each request, to communicate how long should a request wait for the lock before timing out.

In the cases where a single pants run is running at a time, the UX is unchanged.
When a request is run, we have three cases:
- If we specify a positive timeout (or if we leave it by default), and the request finishes before we time out, the request waits for a bit and then runs:
```
$ ./pants --enable-pantsd --pantsd-run-start-timeout=20 list src/scala::
Another pants run was found, starting waiting for up to 20.0s for it to finish
Waiting for request to finish (1.0s total)...
Waiting for request to finish (2.0s total)...
Waiting for request to finish (3.0s total)...
Waiting for request to finish (4.0s total)...
Waiting for request to finish (5.0s total)...
Waiting for request to finish (6.0s total)...
Waiting for request to finish (7.0s total)...
Waiting for request to finish (8.0s total)...
Waiting for request to finish (9.0s total)...
src/scala/org/pantsbuild/zinc/cache:cache
src/scala/org/pantsbuild/zinc/analysis:analysis
src/scala/org/pantsbuild/zinc/options:options
src/scala/org/pantsbuild/zinc/compiler:compiler
src/scala/org/pantsbuild/zinc/extractor:extractor
src/scala/org/pantsbuild/zinc/scalautil:scalautil
src/scala/org/pantsbuild/zinc/bootstrapper:bootstrapper
src/scala/org/pantsbuild/zinc/util:util
```

- If we specify a timeout and hit it, the run will fail with a useful error:
```
$ ./pants --enable-pantsd --pantsd-run-start-timeout="0" list ::
Another pants run was found, starting waiting for up to 0.0s for it to finish
Traceback (most recent call last):
  File "/Users/bescobar/workspace/otherpants/src/python/pants/pantsd/pailgun_server.py", line 259, in process_request_thread
    with self.ensure_request_is_exclusive(environment, request):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py", line 112, in __enter__
    return next(self.gen)
  File "/Users/bescobar/workspace/otherpants/src/python/pants/pantsd/pailgun_server.py", line 246, in ensure_request_is_exclusive
    raise ExclusiveRequestTimeout("Timed out while waiting for another pants run to finish")
pants.pantsd.pailgun_server.ExclusiveRequestTimeout: Timed out while waiting for another pants run to finish
```

- If we specify a **negative** timeout, the request will block _potentially_ forever:
```
$ ./pants --enable-pantsd --pantsd-run-start-timeout=-1 list src/scala::
Another pants run was found, starting waiting forever for it to finish
Waiting for request to finish (1.0s total)...
Waiting for request to finish (2.0s total)...
Waiting for request to finish (3.0s total)...
Waiting for request to finish (4.0s total)...
Waiting for request to finish (5.0s total)...
Waiting for request to finish (6.0s total)...
Waiting for request to finish (7.0s total)...
Waiting for request to finish (8.0s total)...
Waiting for request to finish (9.0s total)...
Waiting for request to finish (10.0s total)...
Waiting for request to finish (11.0s total)...
src/scala/org/pantsbuild/zinc/extractor:extractor
src/scala/org/pantsbuild/zinc/compiler:compiler
src/scala/org/pantsbuild/zinc/analysis:analysis
src/scala/org/pantsbuild/zinc/bootstrapper:bootstrapper
src/scala/org/pantsbuild/zinc/options:options
src/scala/org/pantsbuild/zinc/util:util
src/scala/org/pantsbuild/zinc/scalautil:scalautil
src/scala/org/pantsbuild/zinc/cache:cache
```
**All of the new output is to stderr**

Fixes #7751.
Commits should _mostly_ make sense independently.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.