Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have the daemon wait until we stop writing to the socket. #7782

Merged

Conversation

Projects
None yet
3 participants
@blorente
Copy link
Contributor

commented May 21, 2019

Problem

As described in #7465, when we pipe the output of a pants run with pantsd into anything, it may truncate output.
The root cause:

  • NailgunStreamWriter runs in another thread, and is periodically reading from std{out,err} and writing it in the Nailgun socket. Among other uses, we create one of these whenever we need to redirect stdout/err from pants into the client.
  • In the current implementation of pantsd, this happens in DaemonPantsRunner, whenever we call to nailgunned_stdio and either stdout, err or in are not TTYs (there's more nuance, but it's not relevant). We create pipes and give the writing end to the DaemonPantsRunner, while the NailgunStreamWriter listens to the reading end for output. The DaemonPantsRunner then redirects sys.stdout/err (and some other stuff) to the reading ends of those pipes, through stdio_as. This happens in two places:
    • When we warm up the v2 product graph, where things like v2 list are computed.
    • When we actually execute the pants run under pantsd.
  • When the daemon is finished with nailgunned_stdio, it stops the NailgunStreamWriter thread that is transmitting output to the client through the socket, and then it joins until it's done.
    - The issue: When the daemon stop()s the thread, this stops reading immediately, even if it hasn't finished transmitting every chunk to the client. Therefore, there is a race condition between NailgunStreamWriter trying to send all the chunk through the socket and DaemonPantsRunner trying to close it. This can be checked by inserting a time.sleep(0.1) here, which will make the issue repro consistently.

Solution

Depends on #7740, and includes commits from that.

The only commit worth reviewing is b9d0c95

The DaemonPantsRunner must no longer stop() the thread, but rather mark the pipes that are is stdout/err as "I'm not going to write to this anymore, please close yourself when you're done reading from this".

We create the class PipedNailgunStreamWriter, wich is aware that it's reading end belongs to a pipe. Whenever we don't have anything to read, we check if the write end of the pipe is closed. If so, we stop trying to read from that pipe. When no pipes are left, we close ourselves.

Result

Hopefully, ./pants --enable-pantsd list :: | wc -l should give consistent output, and #7465 should be resolved.

@stuhood

This comment has been minimized.

Copy link
Member

commented May 21, 2019

I'm not really able to add anything useful here, because it's not clear to me which portion represents the fix vs refactoring to enable the fix. Not a big deal (ie, no need to rebase things out), but it would be helpful to call that out in the description before review time.

@blorente blorente force-pushed the blorente:blorente/7465/pantsd-output-truncated branch from 14be5f3 to b9d0c95 May 21, 2019

@blorente blorente changed the title [wip] Have the daemon wait until we stop writing to the socket. Have the daemon wait until we stop writing to the socket. May 21, 2019

@blorente

This comment has been minimized.

Copy link
Contributor Author

commented May 21, 2019

Force-pushed and removed WIP tag, this is more or less ready for review. Please review only b9d0c95 as the rest are from other PRs.

@stuhood
Copy link
Member

left a comment

Thanks!

And the description is wonderful, thank you.

@@ -229,10 +232,11 @@ def finalizer():
stderr.flush()
finally:
time.sleep(.001) # HACK: Sleep 1ms in the main thread to free the GIL.
writer.stop()
stdout_pipe.stop_writing()

This comment has been minimized.

Copy link
@stuhood

stuhood May 21, 2019

Member

So, files/pipes already support a "I'm closed" flag. Would closing the write side (ie, sending EOF) work here?

This comment has been minimized.

Copy link
@blorente

blorente May 22, 2019

Author Contributor

So, if we didn't use the Pipe abstraction, that's what we would have to do. However, pipes are not closed when the write end is closed. Instead, the select in NailgunStreamWriter will just not return the read fd ever. This makes the cases "I haven't read anything yet" and "I have finished reading and the pipe is closed" indistinguishable for NSW as it stands before this PR.

Therefore, the PipedNailgunStreamWriter would have to keep track of the write file descriptors to be able to tell when a pipe is closed. That's feasible, but given that we have the Pipe abstraction, I thought it best to use that for readability.

This comment has been minimized.

Copy link
@stuhood
Show resolved Hide resolved src/python/pants/bin/daemon_pants_runner.py

blorente and others added some commits May 21, 2019

@stuhood stuhood force-pushed the blorente:blorente/7465/pantsd-output-truncated branch from b9d0c95 to 02e6507 May 21, 2019

@stuhood

This comment has been minimized.

Copy link
Member

commented May 21, 2019

Have rebased this.

Show resolved Hide resolved src/python/pants/java/nailgun_io.py Outdated
Show resolved Hide resolved src/python/pants/java/nailgun_io.py Outdated
Show resolved Hide resolved tests/python/pants_test/java/test_nailgun_io.py Outdated

blorente and others added some commits May 22, 2019

@illicitonion illicitonion merged commit f55c850 into pantsbuild:master May 22, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

illicitonion added a commit that referenced this pull request May 22, 2019

Have the daemon wait until we stop writing to the socket. (#7782)
## Problem

As described in #7465, when we pipe the output of a pants run with pantsd into anything, it may truncate output.
The root cause:
- `NailgunStreamWriter` runs in another thread, and is periodically reading from `std{out,err}` and writing it in the Nailgun socket. Among other uses, we create one of these whenever we need to redirect stdout/err from pants into the client.
- In the current implementation of pantsd, this happens in DaemonPantsRunner, whenever we call to `nailgunned_stdio` and either stdout, err or in are not TTYs (there's more nuance, but it's not relevant). We create pipes and give the writing end to the `DaemonPantsRunner`, while the `NailgunStreamWriter` listens to the reading end for output. The `DaemonPantsRunner` then redirects `sys.stdout/err` (and some other stuff) to the reading ends of those pipes, through `stdio_as`. This happens in two places:
  - When we warm up the v2 product graph, where things like v2 list are computed.
  - When we actually execute the pants run under pantsd.
- When the daemon is finished with `nailgunned_stdio`, it stops the `NailgunStreamWriter` thread that is transmitting output to the client through the socket, and then it joins  until it's done.
**- The issue:** When the daemon `stop()`s the thread, this stops reading immediately, even if it hasn't finished transmitting every chunk to the client. Therefore, there is a race condition between `NailgunStreamWriter` trying to send all the chunk through the socket and `DaemonPantsRunner` trying to close it. This can be checked by inserting a `time.sleep(0.1)` [here](https://github.com/pantsbuild/pants/blob/3c17cd2b54b468590ddf8f486f37652ed0db6afc/src/python/pants/java/nailgun_io.py#L201), which will make the issue repro consistently.

## Solution

Depends on #7740, and includes commits from that.

**The only commit worth reviewing is b9d0c95**

The `DaemonPantsRunner` must no longer `stop()` the thread, but rather mark the pipes that are is stdout/err as "I'm not going to write to this anymore, please close yourself when you're done reading from this".

We create the class `PipedNailgunStreamWriter`, wich is aware that it's reading end belongs to a pipe. Whenever we don't have anything to read, we check if the write end of the pipe is closed. If so, we stop trying to read from that pipe. When no pipes are left, we close ourselves.

## Result

Hopefully, `./pants --enable-pantsd list :: | wc -l` should give consistent output, and #7465 should be resolved.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.