Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spurious hydra test failures in travis #1169

Open
grondo opened this Issue Aug 29, 2017 · 3 comments

Comments

Projects
None yet
2 participants
@grondo
Copy link
Contributor

commented Aug 29, 2017

not ok 5 - Flux libpmi-client wire protocol works with Hydra

Saw the failure above on travis test on master

Build log

I didn't see any clues in the log.

@garlick

This comment has been minimized.

Copy link
Member

commented Aug 31, 2017

This just failed in travis:

Hydra sets PMI_RANK to unique value
expecting success: 
	test `mpiexec.hydra -n 4 printenv PMI_SIZE | sort | uniq | wc -l` -eq 1
not ok 4 - Hydra sets PMI_SIZE to uniform value

Nothing much in the logs here either. although the config.log did contain a write error

conftest.c:64:25: error: duplicate case value '0'
switch (0) case 0: case (sizeof (long long) == 4):;
                        ^
conftest.c:64:17: note: previous case defined here
switch (0) case 0: case (sizeof (long long) == 4):;
                ^
1 error generated.
configure:13591: $? = 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "flux-core"
| #define PACKAGE_TARNAME "flux-core"cat: write error: Resource temporarily unavailable

@garlick garlick changed the title not ok 5 - Flux libpmi-client wire protocol works with Hydra spurious hydra test failures in travis Aug 31, 2017

grondo added a commit to grondo/flux-core that referenced this issue Oct 20, 2017

t2004-hydra.t: enhance debug output
Add more debug output to the hydra tests to attempt to capture data
useful for resolving flux-framework#1169.

grondo added a commit to grondo/flux-core that referenced this issue Oct 20, 2017

t2004-hydra.t: enhance debug output
Add more debug output to the hydra tests to attempt to capture data
useful for resolving flux-framework#1169.

grondo added a commit to grondo/flux-core that referenced this issue Oct 20, 2017

t2004-hydra.t: enhance debug output
Add more debug output to the hydra tests to attempt to capture data
useful for resolving flux-framework#1169.
@garlick

This comment has been minimized.

Copy link
Member

commented Oct 13, 2018

I hit the PMI_RANK test failure on my desktop and noted the output file was empty. This makes me wonder if mpiexec.hydra is exiting without flushing its output, and the very short test with very little output is managing to get cut off before it can produce anything?

Two mpiexec.hydra options might be interesting to play with

-outfile-pattern                 direct stdout to file
-launcher                        launcher to use (ssh rsh fork slurm ll lsf sge manual persist)

If buffered I/O is used, redirecting to a file might result in an fclose(file) which would implicitly flush, whereas stdout might be left open?

The default launcher is ssh, clearly not necessary in the test case. I don't know if switching it to fork would be likely to help this problem, but it does seem like it would eliminate another location where I/O is buffered (in ssh or sshd).

I should mention I am hitting this on Ubuntu 18.04.1 LTS with mpich 3.3~a2-4.

garlick added a commit to garlick/flux-core that referenced this issue Oct 13, 2018

testsuite: run mpiexec.hydra -launcher fork
Problem: occasionally mpiexec output from spawned tasks
is lost, causing test to fail sporadically.

Try adding the "-launcher fork" option.  This overrides
the default launcher, which is "ssh".

Maybe this will fix flux-framework#1169

garlick added a commit to garlick/flux-core that referenced this issue Oct 13, 2018

testsuite: run mpiexec.hydra -outfile
Problem: occasionally mpiexec output from spawned tasks
is lost, causing tests to fail sporadically.

Instead of redirecting stdout, use the mpiexec -outfile
option to let mpiexec redirect the output internally.

Maybe this will fix flux-framework#1169

garlick added a commit to garlick/flux-core that referenced this issue Oct 15, 2018

testsuite: run mpiexec.hydra -launcher fork
Problem: occasionally mpiexec output from spawned tasks
is lost, causing test to fail sporadically.

Try adding the "-launcher fork" option.  This overrides
the default launcher, which is "ssh".

Maybe this will fix flux-framework#1169

garlick added a commit to garlick/flux-core that referenced this issue Oct 15, 2018

testsuite: run mpiexec.hydra -outfile
Problem: occasionally mpiexec output from spawned tasks
is lost, causing tests to fail sporadically.

Instead of redirecting stdout, use the mpiexec -outfile
option to let mpiexec redirect the output internally.

Maybe this will fix flux-framework#1169

garlick added a commit to garlick/flux-core that referenced this issue Oct 15, 2018

testsuite: run mpiexec.hydra -launcher fork
Problem: occasionally mpiexec output from spawned tasks
is lost, causing test to fail sporadically.

Try adding the "-launcher fork" option.  This overrides
the default launcher, which is "ssh".

Maybe this will fix flux-framework#1169

garlick added a commit to garlick/flux-core that referenced this issue Oct 15, 2018

testsuite: run mpiexec.hydra -outfile
Problem: occasionally mpiexec output from spawned tasks
is lost, causing tests to fail sporadically.

Instead of redirecting stdout, use the mpiexec -outfile
option to let mpiexec redirect the output internally.

Maybe this will fix flux-framework#1169

garlick added a commit to garlick/flux-core that referenced this issue Oct 16, 2018

testsuite: run mpiexec.hydra -launcher fork
Problem: occasionally mpiexec output from spawned tasks
is lost, causing test to fail sporadically.

Try adding the "-launcher fork" option.  This overrides
the default launcher, which is "ssh".

Maybe this will fix flux-framework#1169

garlick added a commit to garlick/flux-core that referenced this issue Oct 16, 2018

testsuite: run mpiexec.hydra -outfile
Problem: occasionally mpiexec output from spawned tasks
is lost, causing tests to fail sporadically.

Instead of redirecting stdout, use the mpiexec -outfile
option to let mpiexec redirect the output internally.

Maybe this will fix flux-framework#1169
@garlick

This comment has been minimized.

Copy link
Member

commented Oct 16, 2018

I've seen that failure again even with the proposed mpiexec options, so I'll drop those suggested fixes from my PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.