-
Notifications
You must be signed in to change notification settings - Fork 858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not able to set stripe and stripe sie using mpi_info_set #10132
Comments
It is possible Are striping parameters used if you
|
Hi, |
With Open MPI 4.0.3, the ROMIO component is
|
mpirun -np 4 --mca io romio321 ./a.out still, it does not set the stripe size and stripe count which should be 8. Below are the results from IntelMPI: |
Here is my test program in fortran:
include <pnetcdf.inc> |
I briefly checked the A possible explanation is lustre support was not detected at What if you replace |
I replaced so you want to say that OpenMPI is not built/configured with lustre support. I will write to our support. Is there any other way to check it? Thanks a lot Alok |
Right, the message means that the ROMIO component has no lustre support. I am not sure how to verify Lustre support was built from the install tree, so better check the output of |
just to clarify briefly an item, ompio does recognize on luster file systems the info objects "stripe_size", and "stripe_width"(We might have to adjust the names of the info objects to match the MPI spec, I actually missed this aspect). However, in the 4.0.x series romio is the default io component on Lustre file systems, that changed starting from the 4.1.x release series. And second, if I recall my tests correctly, setting stripe size and stripe width only works if the file does not exist at the time file opening, lustre will not allow to change the settings of an existing, non-empty file. |
Hi, In 2012 we wrote and tested our functions to use MPI I/O to have good performances while doing I/O on a Lustre filesystem. Everything was fine about "striping_factor" we passed to file creation. Now I am trying to verify some performance degradation we observed and I am surprised because it looks like I am unable to create a new file with a given "striping_factor" with any mpi flavor. I attached a simple example for file creation with hints, and tried it the following way with OpenMPI: OpenMPI-4.0.3:
Not forcing romio321:
First, as you can see, even if I ask for a striping_factor of 2, I only get one! I tried to write some data too, but it changed nothing...
Where am I wrong? Second, I was expecting that when I re-open the file for read-only, I would have some information in "MPI_Info" but it is empty... is that normal? For example, using mpich-3.2.1 I have the following output:
but still have only a striping_factor of 1 on the created file... while Thanks, Eric |
Oups, I forgot... here is the example I used:
|
Hi, @edgargabriel @ggouaillardet I am really confused about hint names. Please help me understand. Lustre document "stripe_count" as: "Indicates the number of OSTs that this file will be striped across." OpenMPI calls:
with so far so good. But, in commit 1631f38, support for "striping_factor" and "striping_unit" has been introduced from MPI standard, which defines "striping_factor" (integer) [SAME]: This hint specifies the number of I/O devices that (see https://www.mpi-forum.org/docs/mpi-4.0/mpi40-report.pdf, page 656) So I understand that "striping_factor" in MPI is the "stripe_count" in Lustre. However, the commit 1631f38 links
then Am I confused or this a bug? Thanks, Eric |
@ericch1 sorry for the late reply, I was out of office last week. I will have a look in a bit, there is a good chance that you are right and this is a bug. |
No problem @edgargabriel , there is no hurry for me... Thanks for looking into this! :) |
@ericch1 would you have a chance to test https://github.com/edgargabriel/ompi/tree/topic/lustre-info-swap to see whether this fixes the info object assignment on lustre? I do not have currently access to a lustre file system to test this out, but I think you are right that I accidentally swapped the meaning of the two MPI info objects on lustre. |
Ok, thanks @edgargabriel , I am trying to install it right now, but I looked at the commit and I am very confident that it will works since the hints "stripe_size" and "stripe_width" where working for me with OpenMPI 4.1.1... |
I am stucked here:
Does it exists a quick fix? |
hm, I didn't get this error, can you maybe try setting --with-pmix=internal and --with-prrte=internal during configure time? I am wondering whether configure accidentally picks up an external pmix. |
I'm guessing that these are legit warnings in the PMIx build; I notice that @ericch1 has "treat warnings as errors", and the compile therefore fails. |
Still have the errors with Il will try with an older compiler than gcc 10.3.0... |
Are those from OMPI main? or OMPI v5? I'm wondering if the submodules are stale as I thought we fixed that quite a while ago. |
Just took a quick peek and those are definitely fixed in HEAD of OMPI main. The "prm/tm" component does not exist in the HEAD of OMPI v5.0.x branch. |
I do know why I am at that point, but looking at the logs, the version Edgard gave me is not so outdated:
Ok, I got it... I did a :
but forgot to do a:
and submodules are now:
But now that I did it, relauched "./autogen" and "configure", I have these new errors at make:
It looks like I am unable to configure/compile correctly... :( Where am I wrong? @edgargabriel did you compiled successfully this branch/sha? Eric |
@ericch1 yes, it did work for me. I think this last error is because your configure picked up an 'old' libfabric installation somewhere which does not have yet FI_HMEM_ROCR declared. You can prevent this by adding '--with-ofi=no' to your configure line (assuming you are not using the ofi components), or explicitely point Open MPI to a newer libfabric installation. |
Ok, I have been able to test it! It works:
and when I try to set the striping_factor to an existing file I have a very nice and comprehensible error! :) Thanks for that! Eric |
@ericch1 excellent, thank you very much! I will file a pr to bring this into master and 5.0.x |
This is awesome! Thank you all! @edgargabriel Do you think this fix can also be backported to the 4.x releases? (Or was the swapped name only in the |
We are using PnetCDF in CESM and usually, it sets automatic stripes on files using following call:
call mpi_info_set(info,"striping_factor",stripestr,ierr)
call mpi_info_set(info,"striping_unit",stripestr2,ierr)
while using cray mpich and intel MPI we are not having any problem. We are having a lustre file system and it is OpenMPI 4.0.3 compiled using intel/2020a compiler.
Please could you look over it.
Thanks a lot Alok
The text was updated successfully, but these errors were encountered: