-
Notifications
You must be signed in to change notification settings - Fork 936
docs: fix outdated descriptions of -output-filename
#11032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Can one of the admins verify this patch? |
|
The behaviour of Open MPI 4.1.4 is the same as reported in #7095. $ mpirun --version
mpirun (Open MPI) 4.1.4
Report bugs to http://www.open-mpi.org/community/help/
$ mpiexec -n 1 --output-filename out.txt echo "Hi"
Hi
$ find out.txt
out.txt
out.txt/1
out.txt/1/rank.0
out.txt/1/rank.0/stdout
out.txt/1/rank.0/stderrI haven’t tried the latest OpenMPI; feel free to close the PR if this is already fixed (anyway, the man is wrong in 4.1.4). |
|
ok to test |
docs/man-openmpi/man1/mpirun.1.rst
Outdated
| into `{filename}/{job}/rank.{rank}/std[out,err,diag]`, where `{rank}` is the | ||
| processes' rank in MPI_COMM_WORLD, left-filled with zero's for correct | ||
| ordering in listings. Any directories in the filename will automatically be | ||
| created. A relative path value will be converted to an absolute path based on | ||
| the cwd where mpirun is executed. Note that this will not work on | ||
| environments where the file system on compute nodes differs from that where | ||
| :ref:`mpirun(1) <man1-mpirun>` is executed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| into `{filename}/{job}/rank.{rank}/std[out,err,diag]`, where `{rank}` is the | |
| processes' rank in MPI_COMM_WORLD, left-filled with zero's for correct | |
| ordering in listings. Any directories in the filename will automatically be | |
| created. A relative path value will be converted to an absolute path based on | |
| the cwd where mpirun is executed. Note that this will not work on | |
| environments where the file system on compute nodes differs from that where | |
| :ref:`mpirun(1) <man1-mpirun>` is executed. | |
| into ``{filename}/{job}/rank.{rank}/std[out,err,diag]``, where ``{rank}`` is the | |
| processes' rank in ``MPI_COMM_WORLD``, left-filled with zero's for correct | |
| ordering in file listings. Any intermediate directories in the resulting output files will automatically be | |
| created. If ``filename`` is a relative path, it will be converted to an absolute path based on | |
| the diretory where :ref:`mpirun(1) <man1-mpirun>` is executed. Note that this will not work in | |
| environments where the file system on compute nodes differs from that where | |
| :ref:`mpirun(1) <man1-mpirun>` is executed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the comment "will not work" accurate? In an environment where <mpirun cwd> does not exist on the remote nodes, doesn't the filename/directory hierarchy get created relative to $HOME on the remote nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In an environment where
<mpirun cwd>does not exist on the remote nodes, doesn't the filename/directory hierarchy get created relative to$HOMEon the remote nodes?
I tried on a cluster, and the hierarchy is created relative to the current working directory:
$ ompi_info --version
Open MPI v4.1.1
http://www.open-mpi.org/community/help/
$ mpirun -map-by ppr:1:node rm -rf /tmp/foo # make sure the directory does not exist
$ mkdir /tmp/foo # /tmp is not shared among the nodes
$ cd /tmp/foo
$ mpirun -map-by ppr:1:node hostname
node1
node2
$ mpirun -map-by ppr:1:node pwd
/tmp/foo
/home/meAt this point, the directory is not created; it is created when :mpirun with -output-filename is finished
$ mpirun -map-by ppr:1:node -output-filename bar pwd
/tmp/foo
/home/me
$ mpirun -map-by ppr:1:node pwd
/tmp/foo
/tmp/foo
$ mpirun -map-by ppr:1:node find $PWD
/tmp/foo
/tmp/foo/bar
/tmp/foo/bar/1
/tmp/foo/bar/1/rank.0
/tmp/foo/bar/1/rank.0/stdout
/tmp/foo/bar/1/rank.0/stderr
/tmp/foo
/tmp/foo/bar
/tmp/foo/bar/1
/tmp/foo/bar/1/rank.1
/tmp/foo/bar/1/rank.1/stdout
/tmp/foo/bar/1/rank.1/stderr
$ mpirun -map-by ppr:1:node find $PWD -type f -name stdout -exec cat {} +
/tmp/foo
/home/meThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure if mpirun always behaves like this, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is created when
mpirunwith-output-filenameis finished
Wrong: the directory is created when mpirun is invoked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhc54 Gotcha.
@e-kwsm Keep in mind that the docs on main are effectively the docs for v5.0 -- these are not the docs for v4.1.x (the ReadTheDocs / Sphinx docs are new for main / v5.0.x and were not back-ported to v4.1.x or earlier). Hence, for main:docs/, we want to document what is happening in the main / v5.0.x mpirun.
My question about the <mpirun cwd> comment was specifically asking about the case where the CWD of mpirun does not exist on a node. In that case, I have a dim recollection that the output tree for that node will be created in $HOME (since the CWD of mpirun does not exist n that node). Is that no longer the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That remains the case. Assuming you don't give us an absolute path (which you can do - the directory then must exist everywhere), then the path is relative to the local PRRTE daemon. mpirun will use its CWD, and the default CWD of a remote daemon (if the CWD of mpirun doesn't exist there) will be $HOME of the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I built 8113e7c by myself, and found that the pattern is changed:
$ ompi_info --version
Open MPI v5.1.0a1
https://www.open-mpi.org/community/help/
$ mpirun --output-filename foo -n 2 sh -c 'echo $$' : -n 2 sh -c 'hostname >&2'
334108
334109
localhost
localhost
$ ls foo.*
foo.prterun-localhost-334102@1.0.out
foo.prterun-localhost-334102@1.1.out
foo.prterun-localhost-334102@1.2.err
foo.prterun-localhost-334102@1.3.errA filename seems to be {arg}.prterun-{hostname}-{PID of mpirun}@1.{rank}.(out|err).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is correct - we tag it to avoid collisions with other mpirun instances that were given the same output-filename option and for shared filesystems. Just one clarification for cases where the application calls MPI_Comm_spawn - the "@1" represents the local jobid of the application. So if the app called spawn, there would be another set of files with an "@2" for the new spawned job. Continued for every spawn.
|
Hello! The Git Commit Checker CI bot found a few problems with this PR: acf0ab4: Update docs/man-openmpi/man1/mpirun.1.rst
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
acf0ab4 to
efd5fba
Compare
see open-mpi#7095 Signed-off-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com>
efd5fba to
262830c
Compare
| <man1-mpirun>` is executed. Note that this will not work in environments | ||
| where the file system on compute nodes differs from that where | ||
| :ref:`mpirun(1) <man1-mpirun>` is executed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that, per the discussion on the PR, that the resulting files will be created, but they may end up elsewhere...?
see #7095
Signed-off-by: Eisuke Kawashima e-kwsm@users.noreply.github.com