Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix testsuite --chain-lint option, add timeout to some tests, and other testing fixes #1125

Merged
merged 7 commits into from Aug 2, 2017

Conversation

Projects
None yet
4 participants
@grondo
Copy link
Contributor

grondo commented Jul 25, 2017

This branch started out as a small fix to propagate sharness' --chain-lint option to test_under_flux so that broken &&-chains can be detected for tests that run under a flux instance.

After that, I went through all the tests and fixed broken &&-chains so that --chain-lint ran clean.
In most cases the broken && chain wasn't an actual problem, but I fixed the test anyway. In other cases it wasn't possible to make --chain-lint happy (e.g. test that use & to run a process in the background). For these cases I added a NO_CHAIN_LINT prereq that can be explicitly added to a tests so that test is skipped during a run with --chain-lint.

Finally, the flux version check (once it was fixed) didn't work in travis (at least on my personal branch) because travis uses a shallow clone of the repo. A workaround was added to .travis.yml to fetch tags and "unshallow" the clone so that hopefully git describe can create a valid version.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 25, 2017

Coverage Status

Coverage decreased (-0.02%) to 78.422% when pulling 59974bc on grondo:chain-lint into ccf25dc on flux-framework:master.

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Jul 25, 2017

Codecov Report

Merging #1125 into master will increase coverage by 0.26%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1125      +/-   ##
==========================================
+ Coverage   77.34%   77.61%   +0.26%     
==========================================
  Files         184      157      -27     
  Lines       31887    28679    -3208     
==========================================
- Hits        24662    22258    -2404     
+ Misses       7225     6421     -804
Impacted Files Coverage Δ
src/common/libkvs/jansson_dirent.c 64.44% <0%> (-2.23%) ⬇️
src/common/libflux/request.c 87.17% <0%> (-1.29%) ⬇️
src/broker/overlay.c 71.67% <0%> (-0.7%) ⬇️
src/common/libkvs/kvs_txn.c 60.5% <0%> (-0.64%) ⬇️
src/common/libflux/message.c 81.29% <0%> (-0.12%) ⬇️
t/loop/log.c
t/module/parent.c
t/barrier/tbarrier.c
t/request/treq.c
t/kvs/blobref.c
... and 33 more
@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 25, 2017

Coverage Status

Coverage decreased (-0.004%) to 78.441% when pulling 59974bc on grondo:chain-lint into ccf25dc on flux-framework:master.

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 25, 2017

This PR has been having trouble with intermittent hangs in Travis. Not sure what is going on here, but another variable is that Travis-CI is transitioning to their Trusty environment (Ubuntu 14.04). However, other PRs don't seem to have a problem, so this one's not a merge candidate for now.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 25, 2017

Coverage Status

Coverage decreased (-0.2%) to 78.238% when pulling bd06cda on grondo:chain-lint into ccf25dc on flux-framework:master.

@grondo grondo force-pushed the grondo:chain-lint branch 2 times, most recently from 66b3504 to 6c2a053 Jul 25, 2017

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 25, 2017

Just had a spin through and looks like a lot of good changes. I didn't spot anything wrong.

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 26, 2017

Still can't pin down why this PR seems to cause intermittent but frequent hangs in Travis.

What's the equivalent of tearing a PR into pieces, throwing in the garbage, and lighting it on fire?

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 26, 2017

Coverage Status

Changes Unknown when pulling 6c2a053 on grondo:chain-lint into ** on flux-framework:master**.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 26, 2017

Coverage Status

Coverage decreased (-0.09%) to 78.003% when pulling 6c2a053 on grondo:chain-lint into 5f48548 on flux-framework:master.

@grondo grondo force-pushed the grondo:chain-lint branch from 6c2a053 to 76ea515 Jul 26, 2017

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 26, 2017

Coverage Status

Coverage decreased (-0.06%) to 78.036% when pulling 76ea515 on grondo:chain-lint into 5f48548 on flux-framework:master.

@grondo grondo force-pushed the grondo:chain-lint branch from 76ea515 to d5f24d1 Jul 26, 2017

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 26, 2017

Coverage Status

Coverage decreased (-0.06%) to 78.036% when pulling d5f24d1 on grondo:chain-lint into 5f48548 on flux-framework:master.

@grondo grondo force-pushed the grondo:chain-lint branch from d5f24d1 to 092b481 Jul 26, 2017

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 26, 2017

Coverage Status

Coverage decreased (-0.05%) to 78.047% when pulling 092b481 on grondo:chain-lint into 5f48548 on flux-framework:master.

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 26, 2017

Ok, I reworked the changes in t1002-kvs-cmd.t and now the hangs seem to have stopped, though I didn't see anything wrong that would cause hangs. I suppose also something could be different this AM in Travis itself. 🤷‍♂️

I also added to this PR some debug for issue #997 (will fix the typo when we figure the problem with the test)

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 26, 2017

I spoke too soon, one of the tests (for grondo/flux-core, not flux-framework/flux-core) did hang, but very early in the unit tests (apparently, difficult to tell) before the sharness tests under t even started:

PASS: test_tagpool.t 16 - group: tagpool_alloc worked 256 times
PASS: test_tagpool.t 17 - group: pool depleted by 256
PASS: test_tagpool.t 18 - group: allocated tags contain no duplicates
PASS: test_tagpool.t 19 - group: tagpool_free restored all to pool
PASS: test_tagpool.t 20 - group: entire poool allocated by tagpool_alloc loop
PASS: test_tagpool.t 21 - group: pool is exhausted
No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received
The build has been terminated

@grondo grondo force-pushed the grondo:chain-lint branch 2 times, most recently from ac97a50 to 092b481 Jul 26, 2017

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 26, 2017

21 is the last tagpool test. Maybe the hanging one is libflux/test/security.c, which comes after (normally). I wonder if it would make sense to call alarm() in that one, e.g.

diff --git a/src/common/libflux/test/security.c b/src/common/libflux/test/security.c
index 8b2bcaf..ede851f 100644
--- a/src/common/libflux/test/security.c
+++ b/src/common/libflux/test/security.c
@@ -469,10 +469,19 @@ void test_curve (void)
     unlink_recursive (path);
 }
 
+void alarm_callback (int arg)
+{
+    diag ("test timed out");
+    exit (1);
+}
+
 int main (int argc, char *argv[])
 {
     plan (NO_PLAN);
 
+    signal (SIGALRM, alarm_callback);
+    alarm (30);
+
     test_ctor_dtor ();
     test_keygen ();
     test_munge ();
@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 26, 2017

Hm, not a bad idea, though I've never seen that test hang before. At least then we'd know something. I retried my original branch and it hung 2/5 times (at least), so repushed my latest and we'll see what happens. Maybe I'll also include your debug patch above in case we hit the other "hang"

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 26, 2017

Coverage Status

Coverage decreased (-0.07%) to 78.021% when pulling ac97a50 on grondo:chain-lint into 5f48548 on flux-framework:master.

@grondo grondo force-pushed the grondo:chain-lint branch from 092b481 to ba91008 Jul 26, 2017

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Jul 26, 2017

I rebased on master with @garlick's rc2_timeout patch and added a default timeout of 5m to all tests.

Unfortunately I don't think the timer is compatible with the libfaketime-based cron tests, as these are now always timing out, though I thought ev_timer watchers measured real time not wall clock time, which should be the only thing libfaketime affects (but I can't remember the exact distinctions right now).

For now, I'll add an option to disable the timeout on a per-test basis.

I'll also add @garlick's patch for test_security.t above because the travis runs hit a hang there as well.

@grondo grondo force-pushed the grondo:chain-lint branch from ba91008 to da1c938 Jul 26, 2017

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 26, 2017

Coverage Status

Coverage decreased (-0.07%) to 77.998% when pulling ba91008 on grondo:chain-lint into e047dd4 on flux-framework:master.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 26, 2017

Coverage Status

Coverage remained the same at 78.068% when pulling 751bf7e on grondo:chain-lint into e047dd4 on flux-framework:master.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 26, 2017

Coverage Status

Coverage decreased (-0.04%) to 78.027% when pulling 8115a38 on grondo:chain-lint into e047dd4 on flux-framework:master.

@chu11 chu11 referenced this pull request Jul 27, 2017

Merged

KVS more oom() fixes #1128

@grondo grondo force-pushed the grondo:chain-lint branch from c3db917 to 94cbfe4 Jul 27, 2017

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 28, 2017

Coverage Status

Coverage decreased (-0.1%) to 77.932% when pulling 406f397 on grondo:chain-lint into e047dd4 on flux-framework:master.

@grondo grondo force-pushed the grondo:chain-lint branch from 406f397 to efd3d46 Jul 28, 2017

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 28, 2017

Coverage Status

Coverage decreased (-0.03%) to 77.924% when pulling efd3d46 on grondo:chain-lint into 38fc4a6 on flux-framework:master.

@grondo grondo force-pushed the grondo:chain-lint branch from efd3d46 to e0fdd90 Jul 31, 2017

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 31, 2017

Coverage Status

Coverage decreased (-0.04%) to 77.917% when pulling e0fdd90 on grondo:chain-lint into 38fc4a6 on flux-framework:master.

@grondo grondo force-pushed the grondo:chain-lint branch from e0fdd90 to 0213075 Jul 31, 2017

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 31, 2017

Coverage Status

Coverage decreased (-0.07%) to 77.884% when pulling 0213075 on grondo:chain-lint into 38fc4a6 on flux-framework:master.

@grondo grondo force-pushed the grondo:chain-lint branch from 0213075 to 088df09 Aug 1, 2017

@coveralls

This comment has been minimized.

Copy link

coveralls commented Aug 1, 2017

Coverage Status

Coverage increased (+0.05%) to 77.982% when pulling 088df09 on grondo:chain-lint into ae10412 on flux-framework:master.

grondo added some commits Jul 26, 2017

flux-sharness.sh: add default timeout to test_under_flux
Add a default timeout for test_under_flux() of 5 minutes, to detect
and kill possibly hung tests rather than hanging indefinitely.
flux-sharness.sh: pass --chain-lint to sub-sharness
When running test_under_flux(), do not drop --chain-lint in
sub-sharness run in flux sesssion.

This allows --chain-lint to be used to check for broken `&&` chains
in tests.
flux-sharness.sh: add NO_CHAIN_LINT prereq
Some sharness tests are not able to  make --chain-lint happy, so add a prereq
`NO_CHAIN_LINT` so that these tests are skipped during runs with --chain-lint,
and the explicit broken &&-chain is explicitly purposeful.

The main case where &&-chains are broken and it is ok is when a process
is sent into the background during a test.
codecov: fix ignore of test code under t/.*
Attempt to fix the ignore pattern for files under t/* in codecov.io
travis-ci: output test_reactor.log on failure
Add debugging to travis for failures in test_reactor.t test.
test/libflux: add timeout to security test
Add timeout patch from @garlick to detect hung test_security.t
test in travis-ci builds.
t0016-cron-faketime.t: disable test_under_flux timeout
Disable the timeout for this test in case it is not compatible
with libfaketime monkeying around with the reported time.

@grondo grondo force-pushed the grondo:chain-lint branch from ae6f515 to 6ce23ce Aug 1, 2017

@coveralls

This comment has been minimized.

Copy link

coveralls commented Aug 1, 2017

Coverage Status

Coverage increased (+0.03%) to 77.96% when pulling 6ce23ce on grondo:chain-lint into ae10412 on flux-framework:master.

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Aug 1, 2017

I pared this PR down to a very small set of fixes that do not seem to cause travis-ci hangs (today at least) anymore.

The only fixes now included is the fix to propagate --chain-lint through test_under_flux, addition of a default 5m timeout to all test_under_flux runs using the new init.rc2_timeout broker attr, the addition of a timeout to test_security.t as proposed by @garlick, along with a couple other debug and codecov.io fixes.

I propose we just merge this for now, and then I can push test fixes on top one by one to see what the hangs were caused by (or just drop those fixes for now)

@grondo grondo changed the title Fix testsuite --chain-lint option and many broken &&-chains in tests Fix testsuite --chain-lint option, add timeout to some tests, and other testing fixes Aug 1, 2017

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Aug 2, 2017

Right let's do it!

@garlick garlick merged commit 71b40ff into flux-framework:master Aug 2, 2017

4 checks passed

codecov/patch Coverage not affected when comparing ae10412...6ce23ce
Details
codecov/project 77.61% (+0.26%) compared to ae10412
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.03%) to 77.96%
Details

@grondo grondo referenced this pull request Aug 23, 2017

Closed

0.8.0 Release #1160

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.