Skip to content
This repository was archived by the owner on Sep 30, 2022. It is now read-only.

Conversation

@edgargabriel
Copy link
Member

@hppritcha would you mind giving this pr a quick test ? I tested it on the Stuttgart Cray and it did what it was supposed to be doing ( namely OMPIO reduced its priority), and on my machine (in which OMPIO kept its priority).

@mellanox-github
Copy link

Test FAILed.
See http://bgate.mellanox.com/jenkins/job/gh-ompi-release-pr/1225/ for details.

@hppritcha hppritcha self-assigned this Jan 17, 2016
@hppritcha
Copy link
Member

hmm... what's going on with mlnx jenkins?
bot:retest

@mellanox-github
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ompi-release-pr/1231/ for details.

@hppritcha hppritcha added this to the v2.0.0 milestone Jan 19, 2016
@hppritcha
Copy link
Member

@edgargabriel unfortunately with mpi2basic_tests/file/filetest I'm getting a segfault using romio314:

Checking for MPI_File_set_view_:
    setting view using indexed type........working
    setting view using vector type.........working
    using file view w/ displacement........working
    using the file view repetitivly........working
    write less than the file view..........working
    writing a file view with gaps..........working
    reading a file view with gaps..........working
[nid00011:33335] *** Process received signal ***
[nid00011:33336] *** Process received signal ***
[nid00011:33336] Signal: Segmentation fault (11)
[nid00011:33336] Signal code: Address not mapped (1)
[nid00011:33336] Failing at address: (nil)
[nid00011:33337] *** Process received signal ***
[nid00011:33337] Signal: Segmentation fault (11)
[nid00011:33337] Signal code: Address not mapped (1)
[nid00011:33337] Failing at address: (nil)
[nid00011:33338] *** Process received signal ***
[nid00011:33338] Signal: Segmentation fault (11)
[nid00011:33338] Signal code: Address not mapped (1)
[nid00011:33338] Failing at address: (nil)
[nid00011:33339] *** Process received signal ***
[nid00011:33339] Signal: Segmentation fault (11)
[nid00011:33339] Signal code: Address not mapped (1)
[nid00011:33339] Failing at address: (nil)
[nid00011:33335] Signal: Segmentation fault (11)
[nid00011:33335] Signal code: Address not mapped (1)
[nid00011:33335] Failing at address: (nil)
[nid00011:33335] [ 0] /lib64/libpthread.so.0(+0xf810)[0x2aaaab044810]
[nid00011:33335] [ 1] /global/homes/h/hpp/ompi_release_install/lib/openmpi/mca_io_romio314.so(ADIOI_Flatten+0x159e)[0x2aaab963347e]
[nid00011:33335] [ 2] /global/homes/h/hpp/ompi_release_install/lib/openmpi/mca_io_romio314.so(ADIOI_Flatten_datatype+0xe3)[0x2aaab9634153]
[nid00011:33335] [ 3] /global/homes/h/hpp/ompi_release_install/lib/openmpi/mca_io_romio314.so(ADIO_Set_view+0x1fd)[0x2aaab9629b9d]
[nid00011:33335] [ 4] /global/homes/h/hpp/ompi_release_install/lib/openmpi/mca_io_romio314.so(mca_io_romio314_dist_MPI_File_set_view+0x2f6)[0x2aaab960a0d6]
[nid00011:33336] [ 0] /lib64/libpthread.so.0(+0xf810)[0x2aaaab044810]
[nid00011:33336] [ 1] /global/homes/h/hpp/ompi_release_install/lib/openmpi/mca_io_romio314.so(ADIOI_Flatten+0x159e)[0x2aaab9a1a47e]
[nid00011:33336] [ 2] /global/homes/h/hpp/ompi_release_install/lib/openmpi/mca_io_romio314.so(ADIOI_Flatten_datatype+0xe3)[0x2aaab9a1b153]
[nid00011:33336] [ 3] /global/homes/h/hpp/ompi_release_install/lib/openmpi/mca_io_romio314.so(ADIO_Set_view+0x1fd)[0x2aaab9a10b9d]
[nid00011:33336] [ 4] /global/homes/h/hpp/ompi_release_install/lib/openmpi/mca_io_romio314.so(mca_io_romio314_dist_MPI_File_set_view+0x2f6)[0x2aaab99f10d6]
[nid00011:33336] [ 5] [nid00011:33337] [ 0] /lib64/libpthread.so.0(+0xf810)[0x2aaaab044810]

should we expect this failure when using romio314? the test works with ompiio

@hppritcha
Copy link
Member

I should add this failure is lustre fs specific.

@edgargabriel
Copy link
Member Author

romio314 does not pass a number of tests from the filetest, but on UFS it does not segfault as far as I remember. I have not ran filetest with romio314 on lustre however.

@hppritcha
Copy link
Member

IMB-IO seems to run fine with this commit on nersc lustre. So giving thumbs up. 👍

@hppritcha
Copy link
Member

@jsquyres this is ready to go

jsquyres added a commit that referenced this pull request Jan 20, 2016
reduce the priority of ompio if lustre file system detected
@jsquyres jsquyres merged commit e905ea1 into open-mpi:v2.x Jan 20, 2016
hppritcha added a commit to hppritcha/ompi-release that referenced this pull request Jan 21, 2016
PR open-mpi#896 broke os-x build. This commit fixes the problem.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
@hppritcha
Copy link
Member

@jsquyres
@edgargabriel

Should this be applied to master but without the reduction in lustre priority?

@edgargabriel
Copy link
Member Author

I would have hoped that it is not necessary. Ideally, we do not need to check for the filesystem type in initializing the IO component, that is really just a solution because of the lustre problems (that I hope is on the way of being resolved).

@edgargabriel edgargabriel deleted the pr/set-lustre-default-to-romio branch January 21, 2016 16:39
@hppritcha
Copy link
Member

Okay. Then I think we'll add a label to this PR to revert it when we branch 2.0.X from 2.X.

hppritcha added a commit to hppritcha/ompi-release that referenced this pull request Jan 21, 2016
PR open-mpi#896 broke os-x build. This commit fixes the problem.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants