Orted prolog and epilog hooks #35

ompiteam · 2014-10-01T15:59:19Z

Terry and I were talking about the possibility of having per-job prolog and epilog steps in the orted. That is, an MCA parameter that identifies an argv to run before the first local proc of a job is launched on the node and after the last local proc of a job has completed. Typical argv would usually be a local script (perhaps to perform some site-specific administrative stuff). If the argv for the prolog/epilog is blank (which would be the default), then nothing would be launched for these steps. Hence, these would be hooks available to sysadmins if they want to use them.

I'm guessing/assuming that this would not be difficult to do -- it's mainly a matter of:

Finding the right place in the orted to run the prolog and epilog
Deciding what information to give to the prolog and epilog (e.g., passing a pile of relevant info in environment variables, such as the job ID, the session directory, the argv of the job, the exit conditions of the job, etc. -- anything that the prolog and epilog might want to know. Just about every resource manager have prolog/epilog functionality -- we might look to them for inspiration on what kind of information could be useful).

It ''might'' be useful to also have the same prolog/epilog hooks for each process in a job on the host as well. [shrug]

I'm initially marking this as a 1.3 milestone, but have no real requirement for it in v1.3 -- it seems like an easy / neat / useful idea, but there is no ''need'' to have it in v1.3. It could be pushed forward.

ompiteam · 2014-10-01T15:59:20Z

Imported from trac issue 1269. Created by jsquyres on 2008-04-11T11:23:25, last modified: 2011-01-11T07:45:51

ompiteam · 2014-10-01T15:59:20Z

Trac comment by jsquyres on 2008-06-23 13:32:57:

Yo Ralph -- I'm assuming there's no plans for this kind of feature in v1.3 (Terry and I were talking "pie in the sky" kinds of ideas when we came up with this one). Should we shift it to "Future"?

ompiteam · 2014-10-01T15:59:20Z

Trac comment by rhc on 2008-06-23 13:58:28:

As noted, it would be easy to implement, so I guess I don't care - could throw it into 1.3 or not.

Kinda up to you guys as to how badly you want it.

Ralph

ompiteam · 2014-10-01T15:59:21Z

Trac comment by tdd on 2008-06-24 10:25:59:

This feature is not absolutely necessary for 1.3 but I would like it in 1.3.1. I've discussed this with the RMs (Brad and George) and they are fine with this feature being added to 1.3.1.

ompiteam · 2014-10-01T15:59:21Z

Trac comment by rhc on 2010-01-27 22:29:02:

Damien has a somewhat related issue - what he needs is basically a "spawn agent" similar to our "launch agent". If provided, this would be a cmd that executes each app when spawned.

In other words, you take the argv that is going to be fork/exec'd and prepend the spawn agent in it. Thus, the spawn agent is what actually executes the app.

I'm not sure if Damien is doing this work or not - perhaps he could confirm? Otherwise, I'll implement it over the next week or two (actually rather trivial to do).

ompiteam · 2014-10-01T15:59:21Z

Trac comment by jsquyres on 2010-02-02 18:27:17:

Damien was having problems posting; he mailed his reply to me directly (see below). Damien: note that you can sign up for an account on our Trac and therefore be able to comment on tickets directly (our Trac does not currently accept emails as input).

I have received a query from the Openmpi tracker. The ticket https://svn.open-mpi.org/trac/ompi/ticket/1269 is an
enhancement to a "spawn agent".

This feature is not an issue for me.

My problem is to have an "orted local to mpirun",this is the only one
solution to , fix this problems:

have no launch difference between mpirun node and other node
have no ulimit difference between mpirun node and other node
have no cpuset difference between mpirun node and other node
have no bash difference between mpirun node and other node
have no environment difference between mpirun node and other node

Please refer to Ralph for history and sorry for confusion.

ompiteam · 2014-10-01T15:59:22Z

Trac comment by jsquyres on 2011-01-11 07:45:51:

I think it's pretty safe to say that this won't happen any time soon unless someone can free up some cycles to implement it.

…-v1.8 OSHMEM: spml ikrit: complete puts b4 memheap destruction

…lease sync with ompi-release/v1.8

The IPMI plugin tries to read the bmc credentials in an endless loop in case it fails to read for some reason, and no other compute node tries to send the bmc credential data. Fixes open-mpi#35 The nodepower plugin uses the ipmi_cmdraw command to retrieve data from the bmc. It needs to pass the length of the response buffer to the API so that the library knows how much memory is allocated to it and can pack the data accordingly. The current implementation does pass the repose length. but it doesn't initialize the length, which could lead to non deterministic results. Initialized teh lenght to 1024 to reflect the size of the data buffer. Fixes open-mpi#33 If the ipmi_cmdraw call in the nodepower plugin fails for some reason, then the failure is currently being ignored and the responseData is copied out and passed to the application. Added the code to handle a return failure of the ipmi_cmdraw call.

When get_bmc_cred fails in an aggregator, and no other compute node sends the ipmi credential data, the aggregator tries to read its bmc credential in a loop. Implemented a timeout as well as printing out a debug message indicating the issue. Refs open-mpi#35

Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>

ompiteam assigned rhc54 Oct 1, 2014

ompiteam added this to the Future milestone Oct 1, 2014

ompiteam added the enhancement label Oct 1, 2014

rhc54 pushed a commit that referenced this issue Oct 31, 2014

Merge pull request #35 from alex-mikheev/topic/oshmem_spml_ikrit_race…

5226ee0

…-v1.8 OSHMEM: spml ikrit: complete puts b4 memheap destruction

yosefe pushed a commit to yosefe/ompi that referenced this issue Mar 5, 2015

Merge pull request open-mpi#35 from miked-mellanox/topic/sync_with_re…

9f9d757

…lease sync with ompi-release/v1.8

rhc54 closed this as completed Jan 27, 2017

devreal added a commit to devreal/ompi that referenced this issue Sep 15, 2020

ADAPT: allocate nbc request type in ibcast (open-mpi#35)

fb0568d

Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Orted prolog and epilog hooks #35

Orted prolog and epilog hooks #35

ompiteam commented Oct 1, 2014

ompiteam commented Oct 1, 2014

ompiteam commented Oct 1, 2014

ompiteam commented Oct 1, 2014

ompiteam commented Oct 1, 2014

ompiteam commented Oct 1, 2014

ompiteam commented Oct 1, 2014

ompiteam commented Oct 1, 2014

Orted prolog and epilog hooks #35

Orted prolog and epilog hooks #35

Comments

ompiteam commented Oct 1, 2014

ompiteam commented Oct 1, 2014

ompiteam commented Oct 1, 2014

ompiteam commented Oct 1, 2014

ompiteam commented Oct 1, 2014

ompiteam commented Oct 1, 2014

ompiteam commented Oct 1, 2014

ompiteam commented Oct 1, 2014