changed mindist mapping policy specifier #494

elenash · 2015-03-25T05:14:35Z

Сhanged mindist mapping policy specifier from map-bt dist:device,modifiers to --map-by dist:modifiers -mca rmaps_dist_device device

mellanox-github · 2015-03-25T05:37:28Z

Coverity found 912 errors for all: http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr//ws/cov_build/all_376/output/errors/index.html
Coverity found 5 errors for oshmem: http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr//ws/cov_build/oshmem_376/output/errors/index.html
Coverity found 2 errors for fca: http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr//ws/cov_build/fca_376/output/errors/index.html

Sent from GH.

mellanox-github · 2015-03-25T05:39:54Z

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/376/
Test PASSed.

rhc54 · 2015-03-25T12:09:20Z

May one ask why? In general, I'd prefer not to expand the command line yet again

mike-dubman · 2015-03-25T12:22:13Z

currently, the same command line specifies device and map in one sentence
it is better two have two separate knobs to select device and mapping policy

in scripts, usually one have device selector variable and mapper selector (with mca params)
In the current form it is not friendly

mike-dubman · 2015-03-25T14:57:17Z

+1

rhc54 · 2015-03-25T15:16:10Z

Personally, I'd rather this not come into the code base. We continually here complaints about the large and constantly growing number of MCA params in this system. This new one doesn't do anything new at all - it just duplicates existing syntax. The argument that it is easier in a script doesn't make sense to me, as scripts have no issue with putting $vars in lists.

So a 👎 from me barring some more convincing argument.

mike-dubman · 2015-03-25T16:07:40Z

from usability perspective - having one parameter to specify many different things is too complicated.

-mca btl_tcp_if_include eth0,span

is not a user-friendly. Split into two (and keep the current syntax as well) is more user-friendly
The fix is not removing old behave, it just adds additional way to specify every component separately. can we agree on that?

rhc54 · 2015-03-25T18:00:08Z

I'm confused - why would you put span on a BTL include param??? It's a modifier to the mapping directive.

mike-dubman · 2015-03-25T18:06:42Z

it is just an example of mixing unrelated configurables in one parameter.
bad copy paste , but the point still stands.

rhc54 · 2015-03-25T20:46:23Z

I'm still not fully understanding your point. The current parameter specifies all mapping-related directives in one place. We went to that because people were unhappy with having to specify so many separate things. How is this not a step backward?

I'd really prefer to have only one way to specify things, not several ways of specifying the same thing.

jsquyres · 2015-03-26T15:01:01Z

I've been following this, and I confess that I'm a bit confused as well.

Here's my questions:

We likely already have too many MCA params. I am usually not hesitant to add more MCA params (especially when a user asks for a specific tunable knob to turn), but I don't see the new functionality that this param adds. What is new here?
I'm also usually not adverse to long names :-), but it looks like --map-by dist:mlx4_0,span is a whole lot fewer characters to type than -map-by dist:mlx4_0 -mca rmaps_base_map_modifiers span. I'm therefore not understanding the "user-friendly" argument for this change. Can you explain further?
Even though we do have (too) many MCA params, I think few (if any?) offer multiple different ways to get to the same functionality (aliases are the obvious exception -- but that's the same functionality available via multiple names). If there are different ways to get to the same functionality, then you have precedence issues to consider: if multiple ways are specified, which one wins? NOTE: This confusion is somewhat strengthened by the problem we're having with MCA param aliases right now -- the (slow) race condition that makes even multiple paths to the same functionality problematic.

mike-dubman · 2015-03-26T18:02:51Z

the only reason for this PR is to simplify current form. At many customers and our internal scripts we see common usage pattern for running MPI:

function/logic to select communication device
function/logic to select mapper

These are two different unrelated flows, using mix of MCA or SHELL params

-mca btl_openib_if_include $hca -x MXM_IB_DEVICE $hca ....

and different flow for selecting mapping method:

--map-by $map

There is not correlation between two flows.
The existing form adds this correlation and makes scripting and visual readability complicated.

Also, in mtt, we have a well-known param "alloc=..." which is device independent and should stay this way.

Mixing device and mapping selection in one sentence is not user friendly.

Now that we have 10k MCA params, we can add another two w/o hitting the overflow.
EndUser uses 5-10 params at max, all others are internal for special tuning and it is always good to have many knobs for rainy day :)

rhc54 · 2015-03-26T18:34:51Z

But there never will be a correlation between the two flows. Even in your proposed change, there is no correlation. You still specify the mapping policy separate from the selection of a particular interface.

The current system doesn't force any correlation. You specify your interfaces using the appropriate param. You then specify a mapping policy. If you want it to be "dist", then you have to tell us "distance from what?". If you then modify the policy, well you include that modification in the param.

This isn't a case of adding just one or two params. We'd have to add a bunch: n_per_node, n_per_socket, n_per_numa, etc. Like I said, I refuse to have multiple ways to do the same thing. So if you are going to move one modifier to a separate param, then you have to move all of them.

I just don't think that makes any sense.

hppritcha · 2015-03-30T15:03:50Z

Mike have there been specific customer requests for this change to the way procs are mapped to the resource being specified?

I understand your reasoning for suggesting a separation in the way the mapping resource is specified, from the way the procs are mapped to them, but given that currently there are 93 (with one deprecated) mca parameters for the rmaps framework, it might be better to consider
something more comprehensive than just adding to the list. Also, as Jeff comments, the proposed new mca parameter has quite a long name.

rhc54 · 2015-03-30T15:12:24Z

Where did you get this 93 number??? Running ompi_info --all thru "grep maps | wc -l" yields 51, which includes a bunch of priority, verbosity, and other non-germane things that are specific to seldom-used components..

There are exactly two MCA params today that are used by the RMAPS framework: map-by and rank-by

mike-dubman · 2015-03-30T15:16:26Z

look at the mtt configuration example below. see alloc= part and suggest a nice way to exercise mindist alg w/o entering ugly zone of carrying device name (selected in different part of mtt) as part of specifying allocation policy?

Same for customer scripts.

We keep the old format as well, having two more mca params will not break any flow. Having two new params will let mtt to exercise mindist gracefully and will make many customer scripts happy and logical.

The param name can be shorter.

[MPI Details: OMPI]
# Check &test_alloc() for byslot or bynode
exec = mpirun @alloc@ -np &test_np() @mca@ &test_executable() &test_argv()
parameters = &MPI::OMPI::find_mpirun_params(&test_command_line(), \
                                            &test_executable())
network = &MPI::OMPI::find_network(&test_command_line(), &test_executable())

alloc = &if(&eq(&test_alloc(), "node"), "--bynode", "--byslot")

rhc54 · 2015-03-30T15:24:03Z

Sounds to me like this is an MTT issue you are having, not an OMPI issue. Perhaps we should get together with Jeff and discuss the MTT issue separately?

mike-dubman · 2015-03-30T15:30:03Z

mtt suffers from the same problem. it uses same rationale as a customer scripts:

select communication device flow is unrelated to the process mapping and happening in different parts of ini/scripts.

rhc54 · 2015-03-30T15:31:47Z

Well, I'll reiterate - let's talk to Jeff about the MTT issue and see where that leads us. I still don't understand what is so hard about substituting a script variable into the MCA param (on the cmd line or in a setenv script cmd). I do it all the time.

rhc54 · 2015-03-30T16:53:55Z

I've got a few minutes between meetings, so let me try again to explain the problem I see with this proposal. If we set this up as a separate MCA param, and leave the existing params alone, then we have two ways of specifying mapping modifiers: in the map-by param, and in the "modifier" param.

The problem now is: how do I rationalize conflicts and precedence between those two params? The MCA param system cannot help me because they are two different params, so I now have to create an entirely separate logic path for dealing with it. Okay, let's try.

What do I do if the modifier param is set (e.g., to "span"), but the map-by param given in the cmd line has no modifier in it? Do I assume that the user didn't want this particular mpirun to use the modifier, and thus left it out? Or do I assume that the lack of a modifier means that the modifier param is to be used? Will the user be expecting that behavior, say if the modifier param is set out of his immediate view in the default MCA param file? How do they then indicate that they don't want the modifier for this specific run?

We would have to define every one of these conflict use-cases and write logic for dealing with it. This is a non-trivial problem, and your PR doesn't deal with it.

This is why we don't have MCA params that contain overlapping functions - resolving the precedence rules becomes really hard.

HTH
Ralph

hppritcha · 2015-03-30T17:14:33Z

The 93 came from the output of orte_info --mca rmaps all. The point was just to show there are indeed already a lot of parameters already for that framework. So hence a reluctance to add more too arbitrarily.

jsquyres · 2015-03-30T19:59:39Z

I guess I still don't understand the use case for this PR, either. @miked-mellanox you've cited MTT (and user scripts) -- but I think I'm failing to grok the issue: it seems like you need to specify the $device in multiple places on the command line. Is that the problem?

mike-dubman · 2015-03-30T20:02:25Z

@rhc54 - thanks for detailed info, it seems you are right on this. missed that angle.

@jsquyres - yes, the problem is that $device needs to appear multiple times in unrelated mca params.

jsquyres · 2015-03-30T20:06:50Z

@miked-mellanox Ah, ok. Is there a reason you can't assign the value to a variable and use that variable multiple times on the command line?

mike-dubman · 2015-03-30T20:23:12Z

looking for user-friendly, elegant and visually simple solution which will not lead to re-write many existing scripts and mtt.ini files to adopt. (select device and mapper are unrelated flows, where some of them are loaded from outside like modules or functions.sh files or mtt.ini)

Also, --map-by dist:nic,modifier is too overloaded way to express mapping.

Do you think it can have simple form, like;

--map-by-dev $hca --map-by node

anyway, Ralph is right in his argument, will close it.

rhc54 · 2015-03-30T20:33:08Z

What if you split the device out into a separate MCA param for that component? In other words, something like this:

--map-by dist:modifier -mca rmaps_dist_device $hca

mike-dubman · 2015-03-31T05:37:53Z

this could work too! thanks! we will go for it.

mike-dubman · 2015-03-31T05:39:05Z

@elenash - could you please rework this PR to support:
--map-by dist:modifier -mca rmaps_dist_device $hca

elenash · 2015-03-31T08:52:01Z

Done. But this change should be documented somehow, I guess. Users used to set device after the colon, now modifiers are being set there, it can lead to misunderstanding what is going wrong. However, there would be an error message if user tries to set device name into modifier field as previously like:
--map-by dist:mlx4_0

The mapping request contains an unrecognized modifier:

Request: dist:mlx4_0

Please check your request and try again.

elenash · 2015-03-31T08:55:56Z

PR for jenkins to cope with this change
mellanox-hpc/jenkins_scripts#4

mike-dubman · 2015-03-31T09:10:07Z

can we support both? detect if well-known modifier goes after dist or treat it as HCA if unknown?

mellanox-github · 2015-03-31T09:16:27Z

Coverity found 917 errors for all: http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr//ws/cov_build/all_402/output/errors/index.html
Coverity found 5 errors for oshmem: http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr//ws/cov_build/oshmem_402/output/errors/index.html
Coverity found 2 errors for fca: http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr//ws/cov_build/fca_402/output/errors/index.html

Sent from GH.

mellanox-github · 2015-03-31T09:18:48Z

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/402/
Test PASSed.

elenash · 2015-03-31T09:27:14Z

I don't think it's a good idea because when a user just put a wrong specifier name, it will be treated as hca device and user will never know what was the reason of the error.

mike-dubman · 2015-03-31T09:30:17Z

yep, sounds right

mike-dubman · 2015-03-31T16:51:25Z

updated jenkins with test for mindist new flags
bot:retest

elenash · 2015-04-01T08:38:38Z

bot:retest

mellanox-github · 2015-04-01T09:02:28Z

Coverity found 923 errors for all: http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr//ws/cov_build/all_403/output/errors/index.html
Coverity found 5 errors for oshmem: http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr//ws/cov_build/oshmem_403/output/errors/index.html
Coverity found 2 errors for fca: http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr//ws/cov_build/fca_403/output/errors/index.html

Sent from GH.

mellanox-github · 2015-04-01T09:04:55Z

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/403/
Test PASSed.

…fiers to --map-by dist:modifiers -mca rmaps_dist_device device

mellanox-github · 2015-04-01T12:36:57Z

Coverity found 923 errors for all: http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr//ws/cov_build/all_405/output/errors/index.html
Coverity found 5 errors for oshmem: http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr//ws/cov_build/oshmem_405/output/errors/index.html
Coverity found 2 errors for fca: http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr//ws/cov_build/fca_405/output/errors/index.html

Sent from GH.

mellanox-github · 2015-04-01T12:39:24Z

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/405/
Test PASSed.

mike-dubman · 2015-04-01T13:31:14Z

👍

changed mindist mapping policy specifier

elenash · 2015-04-02T09:10:59Z

@rhc54 @jsquyres How do you think this change should be documented? There is not enough information about how to use --map-by dist in man pages. As I remember, there was just a post about it in Cisco blog.

…59b9 Silence a warning by converting the bitmap to a string prior to print…

elenash force-pushed the modifiers branch from cdb5e55 to 13df959 Compare March 31, 2015 08:48

elenash changed the title ~~added rmaps_base_map_modifiers mca parameter to allow setting mapping mo...~~ changed mindist mapping policy specifier Mar 31, 2015

changed mindist mapping policy specifier from map-bt dist:device,modi…

1e913c7

…fiers to --map-by dist:modifiers -mca rmaps_dist_device device

elenash force-pushed the modifiers branch from 13df959 to 1e913c7 Compare April 1, 2015 12:08

mike-dubman added a commit that referenced this pull request Apr 1, 2015

Merge pull request #494 from elenash/modifiers

8914a9c

changed mindist mapping policy specifier

mike-dubman merged commit 8914a9c into open-mpi:master Apr 1, 2015

jsquyres added a commit to jsquyres/ompi that referenced this pull request Nov 10, 2015

Merge pull request open-mpi#494 from hppritcha/topic/cherry_pick_db3c…

37fc38f

…59b9 Silence a warning by converting the bitmap to a string prior to print…

changed mindist mapping policy specifier #494

changed mindist mapping policy specifier #494

Uh oh!

Conversation

elenash commented Mar 25, 2015

Uh oh!

mellanox-github commented Mar 25, 2015

Uh oh!

mellanox-github commented Mar 25, 2015

Uh oh!

rhc54 commented Mar 25, 2015

Uh oh!

mike-dubman commented Mar 25, 2015

Uh oh!

mike-dubman commented Mar 25, 2015

Uh oh!

rhc54 commented Mar 25, 2015

Uh oh!

mike-dubman commented Mar 25, 2015

Uh oh!

rhc54 commented Mar 25, 2015

Uh oh!

mike-dubman commented Mar 25, 2015

Uh oh!

rhc54 commented Mar 25, 2015

Uh oh!

jsquyres commented Mar 26, 2015

Uh oh!

mike-dubman commented Mar 26, 2015

Uh oh!

rhc54 commented Mar 26, 2015

Uh oh!

hppritcha commented Mar 30, 2015

Uh oh!

rhc54 commented Mar 30, 2015

Uh oh!

mike-dubman commented Mar 30, 2015

Uh oh!

rhc54 commented Mar 30, 2015

Uh oh!

mike-dubman commented Mar 30, 2015

Uh oh!

rhc54 commented Mar 30, 2015

Uh oh!

rhc54 commented Mar 30, 2015

Uh oh!

hppritcha commented Mar 30, 2015

Uh oh!

jsquyres commented Mar 30, 2015

Uh oh!

mike-dubman commented Mar 30, 2015

Uh oh!

jsquyres commented Mar 30, 2015

Uh oh!

mike-dubman commented Mar 30, 2015

Uh oh!

rhc54 commented Mar 30, 2015

Uh oh!

mike-dubman commented Mar 31, 2015

Uh oh!

mike-dubman commented Mar 31, 2015

Uh oh!

elenash commented Mar 31, 2015

Uh oh!

elenash commented Mar 31, 2015

Uh oh!

mike-dubman commented Mar 31, 2015

Uh oh!

mellanox-github commented Mar 31, 2015

Uh oh!

mellanox-github commented Mar 31, 2015

Uh oh!

elenash commented Mar 31, 2015

Uh oh!

mike-dubman commented Mar 31, 2015

Uh oh!

mike-dubman commented Mar 31, 2015

Uh oh!

elenash commented Apr 1, 2015

Uh oh!