Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wreck: add wreck plugin for spectrum MPI #1588

Merged
merged 2 commits into from Jul 23, 2018

Conversation

Projects
None yet
6 participants
@SteVwonder
Copy link
Member

SteVwonder commented Jul 20, 2018

Add support for spectrum MPI on Sierra via optional wreck plugin.
Activated by calling wreckrun or submit with -o mpi=spectrum.

Credit goes to @grondo and @garlick.

Closes #1584
Closes #1382

@SteVwonder SteVwonder added this to the release 0.10.0 milestone Jul 20, 2018

@SteVwonder SteVwonder requested review from garlick and grondo Jul 20, 2018

@SteVwonder

This comment has been minimized.

Copy link
Member Author

SteVwonder commented Jul 20, 2018

Not sure how to test this with the CI other than maybe checking that the environment variables and stack limit were set. Thoughts?

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented Jul 20, 2018

@SteVwonder: don't think there is a way to run spectrum MPI directly on Travis CI at all. I like your suggestion: a launch job can simply echo back the environment variables and make sure those are set correctly?

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Jul 20, 2018

Yes, that would be a good sanity check IMO

@grondo

grondo approved these changes Jul 20, 2018

Copy link
Contributor

grondo left a comment

Looks good to me on a cursory check! Thanks!

@garlick
Copy link
Member

garlick left a comment

@dongahn's suggestion for a test is a good one if you have time. If not we can maybe circle back and add a new sharness script that checks all the "mpi plugins" in that way.

@@ -0,0 +1,47 @@
-- Set environment specific to spectrum_mpi (derived from openmpi)

This comment has been minimized.

@garlick

garlick Jul 20, 2018

Member

Commit message body should be filled in. Maybe something like

Add mpi "personality" for IBM spectrum MPI, enabled by user with wreckrun -o mpi-spectrum.
Note that this plugin assumes MPI is installed in /opt/ibm/spectrum_mpi.
It also disables PAMI, the spectrum enhanced collectives due to their dependency on the RM providing
a PMIx server. See #1382 for further details.
It also sets the soft stack limit to a value the MPI runtime seems to require.

See #1382 for more details.

Fixes #1584

This comment has been minimized.

@SteVwonder

SteVwonder Jul 20, 2018

Author Member

Will do. Sorry for the sloppiness there.

This comment has been minimized.

@SteVwonder

SteVwonder Jul 20, 2018

Author Member

Done


function rexecd_task_init ()
-- approximately ulimit -Ss 10240
posix.setrlimit ("stack", 10485760)

This comment has been minimized.

@garlick

garlick Jul 20, 2018

Member

If we know a version of lua.posix that this doesn't work on, perhaps note it here in a comment?

Also perhaps note that this is set to shut up the spectrum MXM plugin.

This comment has been minimized.

@SteVwonder

SteVwonder Jul 20, 2018

Author Member

I will look into seeing exactly when luaposix switched this interface to deprecated and add a conditional that uses the newer interface on newer luaposix libraries.

And I will add the comment about the spectrum plugin

This comment has been minimized.

@grondo

grondo Jul 20, 2018

Contributor

If you look at src/bindings/lua/flux/posix.lua we already have a wrapper around the luaposix module to "work around" at least one compat issue.

You could use the same pcall() trick (used in there to check for posix.time) to check for posix.sys.resource and use that module's setrlimit() instead of the one from deprecated.lua.

This comment has been minimized.

@grondo

grondo Jul 20, 2018

Contributor

Though now that I think about it I wonder if flux.posix is loaded in wrexecd context... have to check on that

This comment has been minimized.

@SteVwonder

SteVwonder Jul 20, 2018

Author Member

I was going down the route of parsing the luaposix version string and specializing based on that version, when I realized posix.setrlimit ("stack", 10485760, -1) always does the right thing. As long as we are OK using this deprecated function, there is no need for any of the version string parsing nastiness, etc. Thoughts?

This comment has been minimized.

@grondo

grondo Jul 20, 2018

Contributor

Does that fail if the current hard limit isn't unlimited?

This comment has been minimized.

@SteVwonder

SteVwonder Jul 20, 2018

Author Member

Spoke with @grondo and @garlick in person. For the sake of getting this out the door, the existing code should be fine. We can revisit if any problems in the wild pop up.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 20, 2018

Coverage Status

Coverage increased (+0.04%) to 79.411% when pulling 1c784c3 on SteVwonder:spectrum-mpi into 5ad5396 on flux-framework:master.

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Jul 20, 2018

Codecov Report

Merging #1588 into master will increase coverage by 0.04%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1588      +/-   ##
==========================================
+ Coverage   79.19%   79.24%   +0.04%     
==========================================
  Files         171      171              
  Lines       31340    31340              
==========================================
+ Hits        24820    24834      +14     
+ Misses       6520     6506      -14
Impacted Files Coverage Δ
src/bindings/lua/lua-hostlist/hostlist.c 59.89% <0%> (-0.22%) ⬇️
src/common/libflux/message.c 81.14% <0%> (+0.23%) ⬆️
src/common/libflux/future.c 91.15% <0%> (+0.88%) ⬆️
src/modules/wreck/wrexecd.c 72.89% <0%> (+0.99%) ⬆️
@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 20, 2018

So much nicer without all the noise!

$ flux wreckrun -o mpi=spectrum -n 32 ./hello
0: completed MPI_Init in 2.368s.  There are 32 tasks
0: completed first barrier in 0.002s
0: completed MPI_Finalize in 0.218s

That was a single node test. Let me try one across nodes.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 20, 2018

Within a bsub allocation obtained with

$ bsub -Is -nnodes 4 -G guests /usr/bin/bash

I ran a script containing:

#!/bin/bash
jsrun -a 40 -c ALL_CPUS --bind=none -r 1 -n 4 \
   $HOME/flux-core/src/cmd/flux start \
	flux wreckrun -o mpi=spectrum -n 128 $HOME/flux-core/t/mpi/hello

and got

... Warning: more than 1 task/rank assigned to a core 
0: completed MPI_Init in 3.283s.  There are 128 tasks
0: completed first barrier in 0.002s
0: completed MPI_Finalize in 0.335s

Victory!

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Jul 20, 2018

Is every MPI job a spectrum MPI job on this platform? Maybe we could do better with a platform check or something so the -o mpi=spectrum isn't required? (Not a valid concern if there are a mix of MPIs)

@SteVwonder

This comment has been minimized.

Copy link
Member Author

SteVwonder commented Jul 20, 2018

@grondo: I think everything right now is Spectrum, but I don't know if that will always be the case.

# herbein1 at sierra4362 in ~/workspace/packages/blueos3/flux-core/spack/etc/wreck/lua.d [10:31:58]
→ find /usr/tcetmp/modulefiles -iname "*mpi*"
/usr/tcetmp/modulefiles/Compiler
/usr/tcetmp/modulefiles/Compiler/clang/3.9.1/spectrum-mpi
/usr/tcetmp/modulefiles/Compiler/clang/4.0.0/spectrum-mpi
/usr/tcetmp/modulefiles/Compiler/clang/coral-2017.03.15/spectrum-mpi
/usr/tcetmp/modulefiles/Compiler/clang/coral-2017.03.29/spectrum-mpi
/usr/tcetmp/modulefiles/Compiler/clang/coral-2017.05.03/spectrum-mpi
/usr/tcetmp/modulefiles/Compiler/clang/coral-2017.05.19/spectrum-mpi
/usr/tcetmp/modulefiles/Compiler/clang/coral-2017.06.08/spectrum-mpi
/usr/tcetmp/modulefiles/Compiler/clang/coral-2017.06.12/spectrum-mpi
/usr/tcetmp/modulefiles/Compiler/clang/coral-2017.06.14/spectrum-mpi
/usr/tcetmp/modulefiles/Compiler/gcc/4.8-redhat/spectrum-mpi
/usr/tcetmp/modulefiles/Compiler/gcc/4.9.3/spectrum-mpi
/usr/tcetmp/modulefiles/Compiler/pgi/16.10/spectrum-mpi
/usr/tcetmp/modulefiles/Compiler/pgi/17.1/spectrum-mpi
/usr/tcetmp/modulefiles/Compiler/pgi/17.3/spectrum-mpi
/usr/tcetmp/modulefiles/Compiler/pgi/17.4/spectrum-mpi
/usr/tcetmp/modulefiles/Compiler/pgi/17.7/spectrum-mpi
/usr/tcetmp/modulefiles/Compiler/pgi/17.9/spectrum-mpi
/usr/tcetmp/modulefiles/Compiler/xl/2016.12.02/spectrum-mpi
/usr/tcetmp/modulefiles/Compiler/xl/beta-2017.03.28/spectrum-mpi
/usr/tcetmp/modulefiles/Compiler/xl/beta-2017.04.11/spectrum-mpi
/usr/tcetmp/modulefiles/Compiler/xl/beta-2017.05.08/spectrum-mpi
/usr/tcetmp/modulefiles/Compiler/xl/beta-2017.06.07/spectrum-mpi
/usr/tcetmp/modulefiles/Compiler/xl/beta-2017.06.29/spectrum-mpi
/usr/tcetmp/modulefiles/Compiler/xl/beta-2017.07.28/spectrum-mpi
/usr/tcetmp/modulefiles/Core/mpifileutils
find: ‘/usr/tcetmp/modulefiles/Core/essl’: Permission denied
/usr/tcetmp/modulefiles/Core/debugCQEmpi.lua
/usr/tcetmp/modulefiles/MPI
/usr/tcetmp/modulefiles/MPI/clang/coral-2017.03.15/spectrum_mpi
/usr/tcetmp/modulefiles/MPI/gcc/4.8-redhat/spectrum_mpi
/usr/tcetmp/modulefiles/MPI/gcc/4.9.3/spectrum_mpi
/usr/tcetmp/modulefiles/MPI/xl/default/spectrum_mpi

@SteVwonder SteVwonder force-pushed the SteVwonder:spectrum-mpi branch from 0153cd4 to 539c46f Jul 20, 2018


OPTS="-o mpi=spectrum"
test_expect_success "spectrum mpi sets the soft stack limit" '
run_timeout 5 flux wreckrun -n ${SIZE} -N ${SIZE} $OPTS bash -c "ulimit -s" \

This comment has been minimized.

@SteVwonder

SteVwonder Jul 20, 2018

Author Member

I gave up trying to get the quoting of bash -c "ulimit -s" to play nice with run_program. If anyone has any pointers or suggestions, I'm open to hearing them. I'm also content leaving it as is.

This comment has been minimized.

@dongahn

dongahn Jul 20, 2018

Contributor

""ulimit -s"" doesn't work?

This comment has been minimized.

@garlick

garlick Jul 20, 2018

Member

What you have seems to be working. What's the problem?

This comment has been minimized.

@SteVwonder

SteVwonder Jul 20, 2018

Author Member

Unfortunately not.
"ulimit -s" and ""ulimit -s"" both result in the -s not being passed to ulimit.
If I put an echo in front of the run_timeout ... call in run_program and then test various quotings, I get:

run_timeout 5 flux wreckrun -o mpi=spectrum -n17 -N17 bash -c ulimit -s
run_timeout 5 flux wreckrun -o mpi=spectrum -n17 -N17 bash -c ulimit -s
run_timeout 5 flux wreckrun -o mpi=spectrum -n17 -N17 bash -c 'ulimit -s'
run_timeout 5 flux wreckrun -o mpi=spectrum -n17 -N17 bash -c 'ulimit -s'

from these:

OPTS="-o mpi=spectrum"
test_expect_success "spectrum mpi sets MPI_ROOT" '                                                                                                                                                 
  run_program 5 ${SIZE} ${SIZE} bash -c "ulimit -s"                                                                                                                                             
'

OPTS="-o mpi=spectrum"
test_expect_success "spectrum mpi sets MPI_ROOT" '                                                                                                                                                 
  run_program 5 ${SIZE} ${SIZE} bash -c ""ulimit -s""                                                                                                                                               
'

CMD="bash -c 'ulimit -s'"
OPTS="-o mpi=spectrum"
test_expect_success "spectrum mpi sets MPI_ROOT" '                                                                                                                                                 
  run_program 5 ${SIZE} ${SIZE} ${CMD}                                                                                                                                               
'

CMD="bash -c 'ulimit -s'"
OPTS="-o mpi=spectrum"
test_expect_success "spectrum mpi sets MPI_ROOT" '                                                                                                                                                 
  run_program 5 ${SIZE} ${SIZE} "${CMD}"
'

And none of them succeed. I would have sworn that the last two would work but 🤷‍♂️

This comment has been minimized.

@dongahn

dongahn Jul 20, 2018

Contributor

A complex case for me was https://github.com/flux-framework/flux-scalability/blob/master/t/t0001-sched-scale.t#L53 to pass single quotes to awk. Don't know if this will be helpful.

This comment has been minimized.

@SteVwonder

SteVwonder Jul 20, 2018

Author Member
OPTS="-o mpi=spectrum"
test_expect_success "spectrum mpi sets the soft stack limit" '                                                                                                                                     
  run_program 5 ${SIZE} ${SIZE} bash -c '\''ulimit -s'\'' \                                                                                                                                        
    | tee spectrum-mpi.ulimit \                                                                                                                                                                    
    && test "$(uniq spectrum-mpi.ulimit)" = "10240"                                                                                                                                                
'

results in

run_timeout 5 flux wreckrun -o mpi=spectrum -n17 -N17 bash -c ulimit -s

and the incorrect result.

My bash-fu isn't good enough to know what is really going on but I think the $* in run_program is what is changing the quoting.

This comment has been minimized.

@grondo

grondo Jul 20, 2018

Contributor

Using double quote inside of single-quoted test script works in other sharness tests in the testsuite, see the wreck and aggregator tests, and I think some kvs tests too.

I wonder what is going wrong here.

This comment has been minimized.

@grondo

grondo Jul 20, 2018

Contributor

If I put an echo in front of the run_timeout ... call in run_program and then test various quotings, I get:

But the echo doesn't necessarily show the arguments (which may contain whitespace) passed down to the command.

Sometimes I resort to putting a set -x before the code and set +x after see what is actually being run.

This comment has been minimized.

@grondo

grondo Jul 20, 2018

Contributor

Oops, sorry for the noise, somehow my version of this issue wasn't updated with the latest comments.

This comment has been minimized.

@SteVwonder

SteVwonder Jul 21, 2018

Author Member

Sometimes I resort to putting a set -x before the code and set +x after see what is actually being run.

Thanks @grondo. I'll give that a shot.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 20, 2018

LGTM (unless there is still a problem with the ulimit test).

Just needs a rebase on current master.

@SteVwonder SteVwonder force-pushed the SteVwonder:spectrum-mpi branch 2 times, most recently from 837a28c to 0515bb2 Jul 20, 2018

@SteVwonder

This comment has been minimized.

Copy link
Member Author

SteVwonder commented Jul 20, 2018

Ok. I just forced push a fix for an error on Travis in the tests: test "string" = "string" rather than test "string" == "string", and then I rebased.

Have a great weekend guys!

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 20, 2018

Was that the problem that you thought was a quoting problem?

Is this ready to merge once travis passes?

@SteVwonder

This comment has been minimized.

Copy link
Member Author

SteVwonder commented Jul 21, 2018

No that was a problem that caused some of Travis tests to fail. Whatever shell on Sierra was running the tests was content with the == but the shell on Travis wasn't a fan. So all of the non-coverage tests failed. Looks like one of the tests is still failing. Lemme take a look. I'll try and push a fix tonight.

@SteVwonder SteVwonder force-pushed the SteVwonder:spectrum-mpi branch from 0515bb2 to d2afa42 Jul 22, 2018

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 23, 2018

Here's a discussion on = vs ==: https://unix.stackexchange.com/questions/382003/what-are-the-differences-between-and-in-conditional-expressions

I can't remember the original reason, but the other sharness scripts use #!/bin/sh (and do not assume it's linked to bash as it is on some systems). At this point it's probably best to continue that convention for consistency.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Jul 23, 2018

I can't remember the original reason, but the other sharness scripts use #!/bin/sh (and do not assume it's linked to bash as it is on some systems).

The original git testsuite instructed all tests to begin with #!/bin/sh, and this is carried forward in sharness. In upstream sharness PRs, bashisms in scripts are rejected.

https://github.com/chriscool/sharness/blob/master/README.git#L221

SteVwonder added some commits Jul 20, 2018

wreck: add wreck plugin for spectrum MPI
Add mpi "personality" for IBM spectrum MPI, enabled by user with
wreckrun -o mpi-spectrum.  Note that this plugin assumes MPI is
installed in /opt/ibm/spectrum_mpi.  It also disables PAMI, the
spectrum enhanced collectives due to their dependency on the RM
providing a PMIx server. See #1382 for further details.  It also sets
the soft stack limit to a value the MPI runtime seems to require.

See #1382 for more details.

Fixes #1584
wreck: add tests for several lua mpi plugins
Specifically tests the intel and spectrum mpi "personalities".

For intel mpi, test that when the I_MPI_PMI_LIBRARY environment
variable is set, it is overwritten with the path to Flux's library.

For spectrum mpi, test that the MPI_ROOT environment variable is set
to the expected path and that the soft stack limit is set to 10240.

@SteVwonder SteVwonder force-pushed the SteVwonder:spectrum-mpi branch from d2afa42 to 1c784c3 Jul 23, 2018

@SteVwonder

This comment has been minimized.

Copy link
Member Author

SteVwonder commented Jul 23, 2018

@grondo: thanks for pointing out the chain_lint flag. That seems to have been the problem 😄
Just rebased on master. Hopefully Travis turns all green now.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 23, 2018

Looks good, hash bang fixed, t3001-mpi-personalities.t --chain-lint now passing for me.

Ready to merge once travis passes?

@SteVwonder

This comment has been minimized.

Copy link
Member Author

SteVwonder commented Jul 23, 2018

Yep. 😁 I'm happy with this once Travis is happy.

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Jul 23, 2018

Nice work @SteVwonder.

@garlick garlick merged commit 3760c0b into flux-framework:master Jul 23, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@SteVwonder SteVwonder deleted the SteVwonder:spectrum-mpi branch Feb 16, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.