Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bash-engine: Check stdout readable before getline #211

Merged
merged 2 commits into from
Jul 22, 2017

Conversation

mbland
Copy link
Contributor

@mbland mbland commented Jul 20, 2017

Pull #193 (written to resolve #192) caused the mbland/go-script-bash Travis build to fail. That build clones SimonKagstrom/kcov each time, and after that pull request, kcov began hanging after printing the number of Bats test cases on Linux, e.g.:

This change resolves the issue by calling file_readable on standard output before calling getline.

Though I don't have a perfect understanding yet, it appears that Bash scripts that run for some amount of time and produce some amount of standard output eventually end up blocking when trying to write. Previously, when the parent kcov process would subsequently block in getline, the child process running the Bash script somehow wasn't able to resume, implying some sort of deadlock.

At any rate, the new bash_handle_all_output test case reliably reproduces the failure and validates the fix. It required adding a timeout capability to KcovTestCase.do.

I've also removed @unittest.expectedFailure from bash_subshell, as it isn't necessary and it confirms that the behavior implemented by #193 remains intact.

Pull SimonKagstrom#193 (written to resolve SimonKagstrom#192)
caused the mbland/go-script-bash Travis build to fail. That build clones
SimonKagstrom/kcov each time, and after that pull request, kcov began
hanging after printing the number of Bats test cases on Linux, e.g.:

  https://travis-ci.org/mbland/go-script-bash/jobs/246933167

This change resolves the issue by calling `file_readable` on standard
output before calling `getline`. However, I am still working to write a
focused test case, both to understand the exact problem and to prevent a
regression.
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) to 63.234% when pulling 88ef253 on mbland:fix-#193 into e55b297 on SimonKagstrom:master.

@mbland
Copy link
Contributor Author

mbland commented Jul 20, 2017

Hmm, there's something decidedly funky about that failed Linux Clang build. Maybe just restart that one task?

Also, the reason the coverage went down slightly is because the new handleStdout implementation has fewer lines of code.

@mbland
Copy link
Contributor Author

mbland commented Jul 20, 2017

And, duh, an strace -f kcov ... just now revealed the smoking gun: The xtraceFd (FD 782) buffer was getting full:

[pid 30363] <... read resumed> "550\n", 4096) = 4
[pid 30363] write(1, "550\n", 4550
)        = 4
[pid 30363] read(5,  <unfinished ...>
[pid 30366] <... write resumed> )       = 4
[pid 30366] write(782, "kcov@handle-all-output.sh@3@(( +"..., 38) = 38
[pid 30366] write(782, "kcov@handle-all-output.sh@3@(( i"..., 44) = 44
[pid 30366] write(782, "kcov@handle-all-output.sh@4@echo"..., 37

This is where it started to hang. So a buffer was getting filled, but it wasn't stdout. I'm actually going to update the last commit to include this message; guess it'll kick off some new builds as well.

Per the previous commit, this reproduces the problem caused by
SimonKagstrom#193 whereby multiple calls to `getline` within the
`handleStdout` function in `src/engines/bash-engine.cc` would cause kcov
to hang indefinitely.

The problem was due to the fact that the buffer to which Bash was
writing kcov tracing messages was getting full, so Bash would eventually
block on `write` calls to the `xtraceFd` (FD 782). After that point,
calls to `getline` to read the standard output from Bash would block as
well, creating a deadlock between the kcov parent and Bash child
processes.

Using a build of kcov from before the previous commit, running an
`strace` command similar to the following revealed that the xtraceFd (FD
782) buffer was getting full:

```
$ strace -f build/src/kcov coverage_dir tests/bash/handle-all-output.sh

[ ...snip ... ]

[pid 30363] <... read resumed> "550\n", 4096) = 4
[pid 30363] write(1, "550\n", 4550
)        = 4
[pid 30363] read(5,  <unfinished ...>
[pid 30366] <... write resumed> )       = 4
[pid 30366] write(782, "kcov@handle-all-output.sh@3@(( +"..., 38) = 38
[pid 30366] write(782, "kcov@handle-all-output.sh@3@(( i"..., 44) = 44
[pid 30366] write(782, "kcov@handle-all-output.sh@4@echo"..., 37
```

At this point the kcov (pid 30363) and Bash (pid 30366) processes hang
indefinitely.

The new `bash_handle_all_output` test case reliably reproduces the
failure and validates the fix from the previous commit. It required
adding a timeout capability to `KcovTestCase.do`.

I've also removed `@unittest.expectedFailure` from `bash_subshell`, as
it isn't necessary and confirms that the desired behavior implemented by
SimonKagstrom#193 remains intact.
@codecov
Copy link

codecov bot commented Jul 20, 2017

Codecov Report

Merging #211 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #211      +/-   ##
==========================================
- Coverage   62.69%   62.68%   -0.02%     
==========================================
  Files          56       56              
  Lines        3699     3698       -1     
==========================================
- Hits         2319     2318       -1     
  Misses       1380     1380
Impacted Files Coverage Δ
src/engines/bash-engine.cc 86.54% <100%> (-0.05%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e55b297...08fb2a4. Read the comment docs.

@codecov
Copy link

codecov bot commented Jul 20, 2017

Codecov Report

Merging #211 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #211      +/-   ##
==========================================
- Coverage   62.69%   62.68%   -0.02%     
==========================================
  Files          56       56              
  Lines        3699     3698       -1     
==========================================
- Hits         2319     2318       -1     
  Misses       1380     1380
Impacted Files Coverage Δ
src/engines/bash-engine.cc 86.54% <100%> (-0.05%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e55b297...08fb2a4. Read the comment docs.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) to 63.234% when pulling 08fb2a4 on mbland:fix-#193 into e55b297 on SimonKagstrom:master.

@mbland
Copy link
Contributor Author

mbland commented Jul 21, 2017

Wow, there's something seriously strange going on with that Linux Clang build. Wonder if it has to do with the Trusty update noted just before the logs:

This job ran on our Trusty environment, which was updated on Wednesday, July 12th. Read all about it in the docs and take note that you can add group: deprecated-2017Q3 in your .travis.yml file to use the previous version.

@SimonKagstrom SimonKagstrom merged commit b048ef0 into SimonKagstrom:master Jul 22, 2017
@mbland mbland deleted the fix-#193 branch July 22, 2017 18:53
@SimonKagstrom
Copy link
Owner

The Clang issue is probably caused by me using clang-3.9 from a PPA. In the system-mode-ng branch, I've changed a few things around this, so the clang builds should work again once that's merged. Don't worry about that problem.

Anyway, thanks for the fix! Also compliments for the very detailed commit message and bug report!

@mbland
Copy link
Contributor Author

mbland commented Jul 22, 2017

My pleasure! Thanks for merging, and for such a great tool! (I mean, c'mon, code coverage for Bash—who knew?) 😄

mbland added a commit to mbland/go-script-bash that referenced this pull request Jun 9, 2018
I noticed during the build for #227 that test coverage dropped
precipitously for no apparent reason:

  https://coveralls.io/builds/14508796

The coverage dropped because SimonKagstrom/kcov@0dd22bd added the
ability for kcov to automatically discover all scripts residing in the
same directory of the script under test. In this case, it unfortunately
discovered all the scripts in ./git and tests/kcov, as well as various
*.md and *.yml files that happened to have "bash" in the first line
regardless of the presence of a #!.

The interim fix for this was to add the --bash-dont-parse-binary-dir and
to update the tests accordingly.

Also, since a bit of time elapsed since I returned to the problem, I
realized the latest tip of kcov's master branch exhibited the hanging bug I
helped squash in SimonKagstrom/kcov#211. The bug was reintroduced in
SimonKagstrom/kcov@ad17136, which tried to fix a bug reported in
SimonKagstrom/kcov#248 whereby kcov would hang on output that didn't
contain newlines.

SimonKagstrom/kcov#249 attempted to resolve the bug, but didn't quite
work and was abandoned. I may take a stab at fixing it one day, but in
the meanwhile, kcov v35 with --bash-dont-parse-binary-dir works to get
this project's test coverage back in shape.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing some bash stdout with self-built binary
3 participants