-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable PIO with mpiuni #205
Conversation
With these changes, I can successfully build PIO. I tested a standalone PIO build (i.e., outside of an ESMF build) with: CC=clang FC=gfortran cmake -DPIO_USE_MPISERIAL=ON -DPIO_ENABLE_TIMING=OFF -DPIO_ENABLE_EXAMPLES=OFF -DPIO_ENABLE_FORTRAN=OFF -DMPISERIAL_PATH=/Users/sacks/esmf/esmf1/src/Infrastructure/stubs/mpiuni -DPIO_ENABLE_TESTS=OFF . make VERBOSE=1
According to MPI documentation, most MPI routines check sendbuf, but some (MPI_Scatter, MPI_Scatterv and maybe others; note that these are not yet implemented yet in mpiuni) check recvbuf, and some (e.g., MPI_Sendrecv_replace) don't check for MPI_IN_PLACE at all. To make mpiuni respect the MPI standard in this respect, I have added an argument to MPIUNI_Memcpy saying whether to check the source (sendbuf), dest (recvbuf) or neither for equality with MPI_IN_PLACE. (We could probably get away with keeping things simpler by always checking both a and b for equality with MPI_IN_PLACE, but following the MPI standard in this respect seems marginally safer.)
This is needed for the PIO build. I'm using the same definition as is used in mpi-serial.
This no longer depends on whether we're using mpiuni I thought that we might need to add some logic to handle the case where the user explicitly sets ESMF_PIO=OFF: I thought that would appear in the esmf.mk file. But it turns out that ESMF_PIO doesn't appear in the esmf.mk file in that case, so the logic here seems correct.
Implement MPI_Alltoallw, MPI_Type_create_indexed_block and MPI_Type_size. These are needed for a srcfield.read call in ESMPy to work. With these changes, the following python program appears to work correctly: import os import esmpy from esmpy.util.cache_data import DATA_DIR from esmpy.util.exceptions import DataMissing datafile = os.path.join(DATA_DIR, "so_Omon_GISS-E2.nc") meshfile = os.path.join(DATA_DIR, "mpas_uniform_10242_dual_counterclockwise.nc") grid = esmpy.Grid(filename=os.path.join(DATA_DIR, datafile), filetype=esmpy.FileFormat.GRIDSPEC) srcfield = esmpy.Field(grid, staggerloc=esmpy.StaggerLoc.CENTER, ndbounds=[33, 2]) srcfield.read(filename=os.path.join(DATA_DIR, datafile), variable="so", timeslice=2) print("Worked!")
- Implement MPI_Scatterv; this is needed for PIO's subset rearranger, which is used to read a Mesh (and is the only rearranger allowed by PIOc_InitDecomp_ReadOnly) - Generalize the implementation of MPI_Type_create_indexed_block to allow count > 1 With these changes, the example in https://earthsystemmodeling.org/esmpy_doc/nightly/develop/html/examples.html#grid-mesh-and-field-created-from-file runs to completion
I had a check that the displacements go in increments of blocklength, but from some experimentation with checking the MPI_Type_size of the new type returned from this function from openmpi, it seems that the resulting type's size is independent of the displacements. So I don't think we need this check.
and MPI_Type_hvector These are needed in PIO. I am implementing both MPI_Type_create_hvector and MPI_Type_hvector because PIO potentially uses the latter when it thinks it's using mpi-serial.
mpiuni's mpirun was always prepending './' to the command, even for a command in your path. This prevented running python3 via mpirun (needed in the ESMPy testing), for example. This change only prepends './' if the given command isn't in your PATH. This seems to be the behavior for openmpi's mpirun.
int MPIUNI_Memcpy(void *a,const void* b,int n,enum CheckForMPIInPlace_Flag check_flag) { | ||
switch(check_flag) { | ||
case CHECK_FOR_MPI_IN_PLACE_NONE: | ||
// No pre-check in this case; proceed to the actual memcpy | ||
break; | ||
case CHECK_FOR_MPI_IN_PLACE_SOURCE: | ||
if (b == MPI_IN_PLACE) { | ||
// If the source is MPI_IN_PLACE, do nothing | ||
return 0; | ||
} | ||
break; | ||
case CHECK_FOR_MPI_IN_PLACE_DEST: | ||
if (a == MPI_IN_PLACE) { | ||
// If the dest is MPI_IN_PLACE, do nothing | ||
return 0; | ||
} | ||
break; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the changes I made to mpiuni are additions, but this one is a behavior change. I have done a fair amount of testing of this change and it feels pretty safe to me, but I'd be happy to have a second set of eyes on it.
PIO_CMAKE_OPTS += -DPIO_USE_MPISERIAL=ON -DMPISERIAL_PATH=$(ESMF_DIR)/src/Infrastructure/stubs/mpiuni | ||
|
||
# There are problems building PIO's tests with mpiuni; for now, just disable this internal testing | ||
PIO_CMAKE_OPTS += -DPIO_ENABLE_TESTS=OFF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I am setting PIO_ENABLE_TESTS=OFF.
This means that you can't run 'make tests', 'ctest' or 'make check' on the PIO installation. Is that a problem?
This was needed for the mpiuni build. I am leaving this at its default (ON) for the non-mpiuni build, though if people feel it's never used with the internal PIO build, we could just set it to OFF always.
I didn't spend much time digging into the problems I had building the tests. If people feel it's important to keep these tests enabled, I can dig into this more.
} | ||
|
||
int ESMC_MPI_Type_create_indexed_block(int count, int blocklength, | ||
const int array_of_displacements[], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I ignore array_of_displacements here: it seems like, for mpiuni, the important thing about type creation is to end up with a value that reflects the size of the resulting type, and from some experimentation with this MPI function with openmpi, it seems like the MPI_Type_size of the result is independent of the displacements. (I originally had a check that these displacements were evenly spaced, but removed this check (in commit 59a996d) because it seems unnecessary if what we care about is the size of the result.)
But I would be happy to have a second set of eyes on this.
return MPI_SUCCESS; | ||
} | ||
|
||
int ESMC_MPI_Type_create_hvector(int count, int blocklength, MPI_Aint stride, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I ignore stride here: it seems like, for mpiuni, the important thing about type creation is to end up with a value that reflcts the size of the resulting type, and from some experimentation with this MPI function with openmpi, it seems like the MPI_Type_size of the result is independent of the stride.
However, this function is the one that I'm least certain of, so would be happy to have a second set of eyes on this.
Sorry for not responding sooner. I'll review this soon (hopefully this week), and we can check if Gerhard also wants to take a look. |
No rush! As you may see from the commit dates, this has been very on-again-off-again work anyway. And I'm actually thinking of putting this on pause so I can work on prototyping the Julia model coupling next, since that feels high priority in terms of time. |
I have run testing on my Mac on the latest version (cc470b7), both with openmpi and mpiuni. All tests pass (using exhaustive testing), and ESMPy tests pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks for all of your hard work going down all the rabbit holes on this one!
I have done enough testing on this that it feels ready to merge, and I'm going to do so in order to get nightly test results. I still want to add some more unit tests that invoke I/O and are run with mpiuni: I noticed that much of our I/O testing (e.g., ArrayIOUTest and similar) is only set up for multi-processor testing. My thought is to add at least some basic Array I/O tests that are run when testing with mpiuni. I'm thinking that I'll do that by introducing a new test file like ESMF_ArrayIOBasicUTest, where the Array setup will work on single-processor testing. But I want to get this merged to see initial test results, so I'll add that later. @oehmke let me know if you have any thoughts on this plan of adding a new unit test file for the sake of adding some tests that will work with mpiuni: it feels a little weird to me to add a new file for that purpose, but I don't see a way around it other than doing some potentially significant rework of all of the tests in ESMF_ArrayIOUTest to support single-processor testing - since my understanding is that, if any tests in a file only support multi-processor, then ALL tests in the file need to be for multi-processor only. |
This PR enables building and running with the internal PIO when using mpiuni. This is especially relevant for people using ESMPy, since ESMPy is sometimes built without mpi – and this is apparently needed on many HPC systems (see also conda-forge/esmpy-feedstock#70). This resolves #131 .
I still want to do some more testing of this and look into whether there are any tests that are currently disabled with mpiuni that should be enabled. But I have done significant testing with ESMPy and have verified that we can now run ESMPy's test suite with mpiuni and all tests pass in that configuration. So it feels like this is getting close enough that it's worth opening a PR.
@theurich and/or @oehmke - I would welcome a review of this. I am assigning both of you as reviewers but will let you decide if you have the time and interest to review it. I will make some line comments calling out pieces that I think most warrant a careful double-check.