Skip to content

ompio and MPI_MODE_SEQUENTIAL and NFS #4991

@dshrader

Description

@dshrader

Background information

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

2.1.2 and 3.0.0

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

release tarball

Please describe the system on which you are running

  • Operating system/version: RHEL 7
  • Computer hardware: Intel Xeon CPUs
  • Network type: Intel Omnipath

Details of the problem

When using MPI_MODE_SEQUENTIAL in MPI_FILE_OPEN with an NFS target and using ompio, we get the following error:

mca_sharedfp_lockedfile_file_open: Error during file open

Here is a reproducer written in fortran:

PROGRAM main
    use mpi

    integer ierr, i, myrank, BUFSIZE, thefile, c50, c22, c19, c16

    call MPI_INIT(ierr)
    call MPI_COMM_RANK(MPI_COMM_WORLD, myrank, ierr)

    call MPI_FILE_OPEN(MPI_COMM_WORLD, 'testfile', &
                       MPI_MODE_WRONLY + MPI_MODE_CREATE + MPI_MODE_SEQUENTIAL, &
                       MPI_INFO_NULL, thefile, ierr)

    call MPI_FILE_CLOSE(thefile, ierr)
    call MPI_FINALIZE(ierr)
END PROGRAM main

If the above program is compiled as a.out, here is the output from trying to run it while in a NFS directory:

shell$ mpirun --map-by ppr:1:node ./a.out
[001.localdomain:151509] mca_sharedfp_lockedfile_file_open: Error during file open
[002.localdomain:36249] mca_sharedfp_lockedfile_file_open: Error during file open
shell$

If romio is used instead, the program runs fine:

shell$ mpirun --map-by ppr:1:node -mca io romio314 ./a.out
shell$

If MPI_MODE_SEQUENTIAL is removed from the source, the program runs without an error with both ompio and romio. Running out of lustre also works for both ompio and romio.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions