Skip to content

Conversation

@rhc54
Copy link
Contributor

@rhc54 rhc54 commented May 21, 2016

No description provided.

@hjelmn
Copy link
Member

hjelmn commented May 21, 2016

There is one other issue I don't know how to solve. The command line parser stops on an unknown switch. So if the command line has anything that isn't -mca, -am, or -tune its possible not all the options will be parsed. I have a fix but it will potentially parse options to the app. I think that is probably fine. I will put together a patch and we can discuss it.

@rhc54
Copy link
Contributor Author

rhc54 commented May 21, 2016

@jjhursey How in the world do I find the error being reported by your tests? All I see is a mass of output, and I can't find the error.

@rhc54
Copy link
Contributor Author

rhc54 commented May 21, 2016

@hjelmn I took a couple of simple shots at it, but ran into infinite loops. So I'll have to defer until you have your patch available. Can you please set it up as a PR against this PR?

@hjelmn
Copy link
Member

hjelmn commented May 21, 2016

Sure. I ran into the same issue. I will create a PR later today.

@jjhursey
Copy link
Member

@rhc54 Looking at the few tests that failed, the bottom of file looks like:

#################
Run Examples
#################
+ cd /gpfs/gpfs_stage1/jhursey/jenkins/workspace/ompi_public_pr_master/ompi-src/examples
+ timeout --preserve-status -k 22s 20s mpirun -np 2 -mca btl tcp,vader,sm,self hello_c

#################
# Summary
#################
# Tests Passed: 5 of 7
#---------------------
IBM_CI_SUCCESS : Autogen
IBM_CI_SUCCESS : Configure
IBM_CI_SUCCESS : Make
IBM_CI_SUCCESS : Make Install
IBM_CI_SUCCESS : Make examples
#################

When a CI test fails I give you all of the output generated from the whole build process, which is a lot. I put a summary at the bottom of what passed to help a bit with that. These seem to have failed in the test run - see the line below (the + in front of the line is bash showing you the command that it is about to execute next - this way whoever looks at the logs can see all of the commands used by the script in addtion to the output)

+ timeout --preserve-status -k 22s 20s mpirun -np 2 -mca btl tcp,vader,sm,self hello_c

That must have timedout. I'll take a note to see if I can get it to print an error message when that happens before it fails the build.

Maintain the python bindings

Only call cmd_line_setup once as those params will persist

Temporarily disable the AMCA option
@rhc54
Copy link
Contributor Author

rhc54 commented May 23, 2016

@hjelmn I think we've made a good-faith attempt, but it's time to surrender pending identification of a maintainer for this code. I've added a check-and-abort so we don't silently ignore the AMCA option. Meantime, I'd suggest we consider this code "non-operational pending identification of a maintainer" ala what we did for checkpoint-restart.

@rhc54
Copy link
Contributor Author

rhc54 commented May 23, 2016

@hjelmn Github problems are preventing the code on the web page from updating - when it does, you'll see that I squashed it down to a single commit and added the check-and-abort logic to mca_base_var.c.

hjelmn and others added 3 commits May 23, 2016 11:40
This commit changes the command line parsing behavior when ignoring
both unknown switches and tokens. In this case we are probably trying
to parse part of the command line (-mca, -am, -tune) before the
rest. Before this change the command line parser stopped on the first
unknown switch or token. Now it will continue to parse out
options. This includes options to the app being run by mpirun (unless
-- is specified) but since only -mca, -tune, and -am are affected this
probably is not a big deal. This change only affects the parsing of
command-line MCA parameters in orte-submit, all other parsers are
unchanged.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
opal/cmd_line: enable complete command line parsing
rhc54 referenced this pull request May 24, 2016
… - put them in with the rest of the OPAL MCA param registrations

Take another shot at untangling the spaghetti

orterun: fix for command line parsing

orte-submit calls opal_init_util () before parsing out MCA command line
options (-mca, -am, etc). This prevents mpirun from setting opal MCA
variables for some frameworks as well as the MCA base. This is because
when a framework is opened all of its variables are set to read-only.
Eventually we want to lift this restriction on some MCA variables but
since -mca is affected we must parse out the MCA command line options
before opal_init_util(). This commit fixes the bug by adding a new
option to opal_cmd_line_parse (ignore unknown option) so orte-submit
can pre-parse the command line for MCA options.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>

Minor cleanups to avoid releasing/recreating the cmd line
@rhc54
Copy link
Contributor Author

rhc54 commented May 24, 2016

See #1692

@rhc54
Copy link
Contributor Author

rhc54 commented May 24, 2016

Closed in favor of other fix

@rhc54 rhc54 closed this May 24, 2016
@rhc54 rhc54 deleted the topic/enval branch May 24, 2016 05:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants