Skip to content

Conversation

@elenash
Copy link
Contributor

@elenash elenash commented Mar 25, 2015

Сhanged mindist mapping policy specifier from map-bt dist:device,modifiers to --map-by dist:modifiers -mca rmaps_dist_device device

@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/376/
Test PASSed.

@rhc54
Copy link
Contributor

rhc54 commented Mar 25, 2015

May one ask why? In general, I'd prefer not to expand the command line yet again

@mike-dubman
Copy link
Member

currently, the same command line specifies device and map in one sentence
it is better two have two separate knobs to select device and mapping policy

in scripts, usually one have device selector variable and mapper selector (with mca params)
In the current form it is not friendly

@mike-dubman
Copy link
Member

+1

@rhc54
Copy link
Contributor

rhc54 commented Mar 25, 2015

Personally, I'd rather this not come into the code base. We continually here complaints about the large and constantly growing number of MCA params in this system. This new one doesn't do anything new at all - it just duplicates existing syntax. The argument that it is easier in a script doesn't make sense to me, as scripts have no issue with putting $vars in lists.

So a 👎 from me barring some more convincing argument.

@mike-dubman
Copy link
Member

from usability perspective - having one parameter to specify many different things is too complicated.

-mca btl_tcp_if_include eth0,span

is not a user-friendly. Split into two (and keep the current syntax as well) is more user-friendly
The fix is not removing old behave, it just adds additional way to specify every component separately. can we agree on that?

@rhc54
Copy link
Contributor

rhc54 commented Mar 25, 2015

I'm confused - why would you put span on a BTL include param??? It's a modifier to the mapping directive.

@mike-dubman
Copy link
Member

it is just an example of mixing unrelated configurables in one parameter.
bad copy paste , but the point still stands.

@rhc54
Copy link
Contributor

rhc54 commented Mar 25, 2015

I'm still not fully understanding your point. The current parameter specifies all mapping-related directives in one place. We went to that because people were unhappy with having to specify so many separate things. How is this not a step backward?

I'd really prefer to have only one way to specify things, not several ways of specifying the same thing.

@jsquyres
Copy link
Member

I've been following this, and I confess that I'm a bit confused as well.

Here's my questions:

  1. We likely already have too many MCA params. I am usually not hesitant to add more MCA params (especially when a user asks for a specific tunable knob to turn), but I don't see the new functionality that this param adds. What is new here?
  2. I'm also usually not adverse to long names :-), but it looks like --map-by dist:mlx4_0,span is a whole lot fewer characters to type than -map-by dist:mlx4_0 -mca rmaps_base_map_modifiers span. I'm therefore not understanding the "user-friendly" argument for this change. Can you explain further?
  3. Even though we do have (too) many MCA params, I think few (if any?) offer multiple different ways to get to the same functionality (aliases are the obvious exception -- but that's the same functionality available via multiple names). If there are different ways to get to the same functionality, then you have precedence issues to consider: if multiple ways are specified, which one wins? NOTE: This confusion is somewhat strengthened by the problem we're having with MCA param aliases right now -- the (slow) race condition that makes even multiple paths to the same functionality problematic.

@mike-dubman
Copy link
Member

the only reason for this PR is to simplify current form. At many customers and our internal scripts we see common usage pattern for running MPI:

  • function/logic to select communication device
  • function/logic to select mapper

These are two different unrelated flows, using mix of MCA or SHELL params

-mca btl_openib_if_include $hca -x MXM_IB_DEVICE $hca ....

and different flow for selecting mapping method:

--map-by $map

There is not correlation between two flows.
The existing form adds this correlation and makes scripting and visual readability complicated.

Also, in mtt, we have a well-known param "alloc=..." which is device independent and should stay this way.

Mixing device and mapping selection in one sentence is not user friendly.

Now that we have 10k MCA params, we can add another two w/o hitting the overflow.
EndUser uses 5-10 params at max, all others are internal for special tuning and it is always good to have many knobs for rainy day :)

@rhc54
Copy link
Contributor

rhc54 commented Mar 26, 2015

But there never will be a correlation between the two flows. Even in your proposed change, there is no correlation. You still specify the mapping policy separate from the selection of a particular interface.

The current system doesn't force any correlation. You specify your interfaces using the appropriate param. You then specify a mapping policy. If you want it to be "dist", then you have to tell us "distance from what?". If you then modify the policy, well you include that modification in the param.

This isn't a case of adding just one or two params. We'd have to add a bunch: n_per_node, n_per_socket, n_per_numa, etc. Like I said, I refuse to have multiple ways to do the same thing. So if you are going to move one modifier to a separate param, then you have to move all of them.

I just don't think that makes any sense.

@hppritcha
Copy link
Member

Mike have there been specific customer requests for this change to the way procs are mapped to the resource being specified?

I understand your reasoning for suggesting a separation in the way the mapping resource is specified, from the way the procs are mapped to them, but given that currently there are 93 (with one deprecated) mca parameters for the rmaps framework, it might be better to consider
something more comprehensive than just adding to the list. Also, as Jeff comments, the proposed new mca parameter has quite a long name.

@rhc54
Copy link
Contributor

rhc54 commented Mar 30, 2015

Where did you get this 93 number??? Running ompi_info --all thru "grep maps | wc -l" yields 51, which includes a bunch of priority, verbosity, and other non-germane things that are specific to seldom-used components..

There are exactly two MCA params today that are used by the RMAPS framework: map-by and rank-by

@mike-dubman
Copy link
Member

look at the mtt configuration example below. see alloc= part and suggest a nice way to exercise mindist alg w/o entering ugly zone of carrying device name (selected in different part of mtt) as part of specifying allocation policy?

Same for customer scripts.

We keep the old format as well, having two more mca params will not break any flow. Having two new params will let mtt to exercise mindist gracefully and will make many customer scripts happy and logical.

The param name can be shorter.

[MPI Details: OMPI]
# Check &test_alloc() for byslot or bynode
exec = mpirun @alloc@ -np &test_np() @mca@ &test_executable() &test_argv()
parameters = &MPI::OMPI::find_mpirun_params(&test_command_line(), \
                                            &test_executable())
network = &MPI::OMPI::find_network(&test_command_line(), &test_executable())

alloc = &if(&eq(&test_alloc(), "node"), "--bynode", "--byslot")

@rhc54
Copy link
Contributor

rhc54 commented Mar 30, 2015

Sounds to me like this is an MTT issue you are having, not an OMPI issue. Perhaps we should get together with Jeff and discuss the MTT issue separately?

@mike-dubman
Copy link
Member

mtt suffers from the same problem. it uses same rationale as a customer scripts:

select communication device flow is unrelated to the process mapping and happening in different parts of ini/scripts.

@rhc54
Copy link
Contributor

rhc54 commented Mar 30, 2015

Well, I'll reiterate - let's talk to Jeff about the MTT issue and see where that leads us. I still don't understand what is so hard about substituting a script variable into the MCA param (on the cmd line or in a setenv script cmd). I do it all the time.

@rhc54
Copy link
Contributor

rhc54 commented Mar 30, 2015

I've got a few minutes between meetings, so let me try again to explain the problem I see with this proposal. If we set this up as a separate MCA param, and leave the existing params alone, then we have two ways of specifying mapping modifiers: in the map-by param, and in the "modifier" param.

The problem now is: how do I rationalize conflicts and precedence between those two params? The MCA param system cannot help me because they are two different params, so I now have to create an entirely separate logic path for dealing with it. Okay, let's try.

What do I do if the modifier param is set (e.g., to "span"), but the map-by param given in the cmd line has no modifier in it? Do I assume that the user didn't want this particular mpirun to use the modifier, and thus left it out? Or do I assume that the lack of a modifier means that the modifier param is to be used? Will the user be expecting that behavior, say if the modifier param is set out of his immediate view in the default MCA param file? How do they then indicate that they don't want the modifier for this specific run?

We would have to define every one of these conflict use-cases and write logic for dealing with it. This is a non-trivial problem, and your PR doesn't deal with it.

This is why we don't have MCA params that contain overlapping functions - resolving the precedence rules becomes really hard.

HTH
Ralph

@hppritcha
Copy link
Member

The 93 came from the output of orte_info --mca rmaps all. The point was just to show there are indeed already a lot of parameters already for that framework. So hence a reluctance to add more too arbitrarily.

@jsquyres
Copy link
Member

I guess I still don't understand the use case for this PR, either. @miked-mellanox you've cited MTT (and user scripts) -- but I think I'm failing to grok the issue: it seems like you need to specify the $device in multiple places on the command line. Is that the problem?

@mike-dubman
Copy link
Member

@rhc54 - thanks for detailed info, it seems you are right on this. missed that angle.

@jsquyres - yes, the problem is that $device needs to appear multiple times in unrelated mca params.

@jsquyres
Copy link
Member

@miked-mellanox Ah, ok. Is there a reason you can't assign the value to a variable and use that variable multiple times on the command line?

@mike-dubman
Copy link
Member

looking for user-friendly, elegant and visually simple solution which will not lead to re-write many existing scripts and mtt.ini files to adopt. (select device and mapper are unrelated flows, where some of them are loaded from outside like modules or functions.sh files or mtt.ini)

Also, --map-by dist:nic,modifier is too overloaded way to express mapping.

Do you think it can have simple form, like;

--map-by-dev $hca --map-by node

anyway, Ralph is right in his argument, will close it.

@rhc54
Copy link
Contributor

rhc54 commented Mar 30, 2015

What if you split the device out into a separate MCA param for that component? In other words, something like this:

--map-by dist:modifier -mca rmaps_dist_device $hca

@mike-dubman
Copy link
Member

this could work too! thanks! we will go for it.

@mike-dubman
Copy link
Member

@elenash - could you please rework this PR to support:
--map-by dist:modifier -mca rmaps_dist_device $hca

@elenash elenash changed the title added rmaps_base_map_modifiers mca parameter to allow setting mapping mo... changed mindist mapping policy specifier Mar 31, 2015
@elenash
Copy link
Contributor Author

elenash commented Mar 31, 2015

Done. But this change should be documented somehow, I guess. Users used to set device after the colon, now modifiers are being set there, it can lead to misunderstanding what is going wrong. However, there would be an error message if user tries to set device name into modifier field as previously like:
--map-by dist:mlx4_0

The mapping request contains an unrecognized modifier:

Request: dist:mlx4_0

Please check your request and try again.

@elenash
Copy link
Contributor Author

elenash commented Mar 31, 2015

PR for jenkins to cope with this change
mellanox-hpc/jenkins_scripts#4

@mike-dubman
Copy link
Member

can we support both? detect if well-known modifier goes after dist or treat it as HCA if unknown?

@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/402/
Test PASSed.

@elenash
Copy link
Contributor Author

elenash commented Mar 31, 2015

I don't think it's a good idea because when a user just put a wrong specifier name, it will be treated as hca device and user will never know what was the reason of the error.

@mike-dubman
Copy link
Member

yep, sounds right

@mike-dubman
Copy link
Member

updated jenkins with test for mindist new flags
bot:retest

@elenash
Copy link
Contributor Author

elenash commented Apr 1, 2015

bot:retest

@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/403/
Test PASSed.

…fiers to --map-by dist:modifiers -mca rmaps_dist_device device
@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/405/
Test PASSed.

@mike-dubman
Copy link
Member

👍

mike-dubman added a commit that referenced this pull request Apr 1, 2015
changed mindist mapping policy specifier
@mike-dubman mike-dubman merged commit 8914a9c into open-mpi:master Apr 1, 2015
@elenash
Copy link
Contributor Author

elenash commented Apr 2, 2015

@rhc54 @jsquyres How do you think this change should be documented? There is not enough information about how to use --map-by dist in man pages. As I remember, there was just a post about it in Cisco blog.

jsquyres added a commit to jsquyres/ompi that referenced this pull request Nov 10, 2015
…59b9

Silence a warning by converting the bitmap to a string prior to print…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants