Skip to content

Conversation

@hppritcha
Copy link
Member

Testing with mpi4py for MPI 4.1 compliance uncovered
a long existing problem in the way Open MPI handles (incorrectly)
returning an empty status when that is specified by the
MPI standard.

With this fix, mpi4py passes if we declare MPI 4.1 compliance
except for the big count tests.

Testing with mpi4py for MPI 4.1 compliance uncovered
a long existing problem in the way Open MPI handles (incorrectly)
returning an empty status when that is specified by the
MPI standard.

With this fix, mpi4py passes if we declare MPI 4.1 compliance
except for the big count tests.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
but without bigcount tests

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
@devreal
Copy link
Contributor

devreal commented Oct 28, 2025

The fix is not to touch the MPI_ERROR field?

*indx = MPI_UNDEFINED;
if (MPI_STATUS_IGNORE != status) {
OMPI_COPY_STATUS(status, ompi_status_empty, false);
OMPI_COPY_STATUS(status, ompi_status_empty, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not what the standard mandates. Section 3.2.5 it states:

... message-passing calls do not modify the value of the error code field of
status variables. This field may be updated only by the functions in Section 3.7.5 that
return multiple statuses. The field is updated if and only if such function returns with an
error code of MPI_ERR_IN_STATUS.

In most of these cases the return is MPI_SUCCESS so the MPI_ERROR shall not be modified.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are multiple places in the standard where it states an empty status object is to be returned. see my comments below about that.

@hppritcha
Copy link
Member Author

nope its to set it to MPI_SUCCESS in the special cases. That's what mpi4py was catching for a request_get_status_any test. grep for empty status in the standard then see the definition of empty status in section 3.7.3 of the MPI 1.3 and later standards. For the grepping use the MPI 4.1 standard. It doesn't look like Open MPI has ever defined the empty status object correctly.

@bosilca
Copy link
Member

bosilca commented Oct 28, 2025

I don't think we are talking about the same thing. Whatever is in the empty status is one thing, what I was pointing out is that you alter the field MPI_ERROR in the status object on a call that does not return MPI_ERR_IN_STATUS. This is incorrect according to my reading of the standard.

@hppritcha
Copy link
Member Author

@dalcinl check out this discussion since it was your tests which showed the problem.

@hppritcha
Copy link
Member Author

here's an example of wording for MPI_Test in the standard:

One is allowed to call MPI_TEST with a null or inactive request argument. In such a
case the procedure returns with flag = true and empty status.

@hppritcha
Copy link
Member Author

and here's what the standard says about what an empty status is

An empty status is a status that is set to
return tag = MPI_ANY_TAG, source = MPI_ANY_SOURCE, error = MPI_SUCCESS, and is also
internally configured so that calls to MPI_GET_COUNT and MPI_GET_ELEMENTS return
count = 0 and MPI_TEST_CANCELLED returns false. We set a status variable to empty
when the value returned by it is not significant. Status is set in this way so as to prevent
errors due to accesses of stale information.

@bosilca
Copy link
Member

bosilca commented Oct 28, 2025

I also pointed to a different section in the standard that states something different. I think the correct solution is to properly define the empty status as you did, but do not set the MPI_ERROR field unless you are returning MPI_ERR_IN_STATUS.

@hppritcha
Copy link
Member Author

then the mpi4py tests would not pass

@bosilca
Copy link
Member

bosilca commented Oct 28, 2025

Whatever! mpi4py is not the MPI standard, it is an interpretation that might be incorrect.

@dalcinl
Copy link
Contributor

dalcinl commented Oct 28, 2025

@hppritcha Can you point me to the problematic test?

@hppritcha
Copy link
Member Author

 85     def testGetStatusAny(self):
 86         with self.catchNotImplementedError(4, 1):
 87             status = self.STATUS
 88             index, flag = MPI.Request.Get_status_any(self.REQUESTS)
 89             self.assertEqual(index, MPI.UNDEFINED)
 90             self.assertTrue(flag)
 91             index, flag = MPI.Request.Get_status_any(self.REQUESTS, None)
 92             self.assertEqual(index, MPI.UNDEFINED)
 93             self.assertTrue(flag)
 94             if unittest.is_mpi("impi(>=2021.14.0)"):
 95                 status.error = MPI.SUCCESS
 96             index, flag = MPI.Request.Get_status_any(self.REQUESTS, status)
 97             self.assertEqual(index, MPI.UNDEFINED)
 98             self.assertTrue(flag)
 99             self.assertEqual(status.source, MPI.ANY_SOURCE)
100             self.assertEqual(status.tag, MPI.ANY_TAG)
101             self.assertEqual(status.error, MPI.SUCCESS)
102         with self.catchNotImplementedError(4, 1):
103             index, flag = MPI.Request.get_status_any(self.REQUESTS)
104             self.assertEqual(index, MPI.UNDEFINED)
105             self.assertTrue(flag)
106 

I think bosilca would say line 101 is incorrect.

@dalcinl
Copy link
Contributor

dalcinl commented Oct 28, 2025

So, let's reduce the lines to the relevant ones

             index, flag = MPI.Request.Get_status_any(self.REQUESTS, status)
             self.assertEqual(index, MPI.UNDEFINED)
             self.assertTrue(flag)
             self.assertEqual(status.source, MPI.ANY_SOURCE)
             self.assertEqual(status.tag, MPI.ANY_TAG)
             self.assertEqual(status.error, MPI.SUCCESS)

The call to MPI.Request.Get_status_any is not failing, otherwise it would have thrown an exception and the line self.assertEqual(status.error, MPI.SUCCESS) will never hit. As the MPI call is not failing, and thus NOT returning with MPI_ERR_IN_STATUS, I would say that the status should indeed be set to the empty status. However, I would have to double check the exact wording of MPI_Request_get_status_any() in the standard, IIRC there were many rules with these new routines.

@bosilca What am I missing?

@bosilca
Copy link
Member

bosilca commented Oct 28, 2025

Please read above. According to Section 3.2.5 we should never alter the MPI_ERROR field in the status object unless we are returning MPI_ERR_IN_STATUS. I'm not stating this makes sense, but it is what the standard mandates.

@bosilca
Copy link
Member

bosilca commented Oct 28, 2025

So basically if I set some value X in the status.MPI_ERROR before any MPI calls that do not return MPI_ERR_IN_STATUS I should be able to retrieve that X after the call.

@hppritcha
Copy link
Member Author

to me there seem to be conflicting statements in the standard. Where it is state that an empty status is returned under certain conditions, this would imply to me that if it were tested using a status query method that the behavior would be as described by the empty status description.

@dalcinl
Copy link
Contributor

dalcinl commented Oct 28, 2025

Well, the section saying to not touch MPI_ERROR is talking about "message passing calls", which is a bit generic. I'm wondering if whoever wrote that had test for completion calls in mind (most certainly not). In my personal taste, the more specific mandate to return empty status in completion calls should trump.

@bosilca
Copy link
Member

bosilca commented Oct 28, 2025

I agree, these are conflicting statement. What I don't agree to is changing OMPI behavior to match a different understanding of a conflicting statement, for the single reason to pass some tests. Instead change the test to accept a different, but still legitimate, reading of the standard.

@hppritcha
Copy link
Member Author

in any case thanks @dalcinl for rooting out we had an incorrect definition of empty status irrespective of whether to copy the error field over or not. I'll open an issue about this for mpi standard.

@dalcinl
Copy link
Contributor

dalcinl commented Oct 28, 2025

@hppritcha I this the only test failing? What about the test for MPI.Request.Get_status_all() ?

@hppritcha
Copy link
Member Author

that was the only one that failed for me.

@dalcinl
Copy link
Contributor

dalcinl commented Oct 28, 2025

What I don't agree to is changing OMPI behavior to match a different understanding of a conflicting statement,

We are talking about the behavior in a very new MPI routine, that Open MPI has not yet released, right?

for the single reason to pass some tests.

No, that is inaccurate. I fundamentally disagree with your interpretation, as it is based in a part of the standard that was likely written years before the new stuff was added. Also, the behavior that I champion is also the one the other major MPI implementation chose.

I respectfully request for the Open MPI community to reconsider this behavior, and put it to vote if there is such procedure.

@bosilca
Copy link
Member

bosilca commented Oct 28, 2025

All tests handling requests and status should fail, because we were very careful never to overwrite the MPI_ERROR.

@hppritcha
Copy link
Member Author

i'll split out the controversial parts of this PR into another one. for sure we didn't define the empty status correctly and i want that in.

@dalcinl
Copy link
Contributor

dalcinl commented Oct 28, 2025

that was the only one that failed for me.

There you have, then... Is MPI_Request_get_status_all() working according to my interpretation?

@hppritcha
Copy link
Member Author

the mpi4py stopped at the first failure which was for the any test. it probably would have failed the all test if the test framework had kept on going.

@hppritcha
Copy link
Member Author

if you want to do more testing, modify your testany, wait, and waitall tests to catch the cases where an "empty status" is returned.

@dalcinl
Copy link
Contributor

dalcinl commented Oct 28, 2025

Indeed, it seems like MPI_Request_get_status_all() is overwriting the MPI_ERROR field to MPI_SUCCESS. This is using a build against the OMPI ABI.

@hppritcha I pushed a branch at mpi4py@update-ompi-main that will automatically build with all the new MPI 5.0 stuff. You can use that branch against the legacy OMPI ABI. There are some additional test failures. Sorry, I cannot do more for now, I'm really busy with other stuff.

Additionally, and despite @bosilca's claims, it seems that MPI_Request_get_status_any/all behave differently regarding the overwriting of the status->MPI_ERROR field to MPI_SUCCESS to become the empty status.

@bosilca
Copy link
Member

bosilca commented Oct 28, 2025

We are talking about the behavior in a very new MPI routine, that Open MPI has not yet released, right?

This PR does not solely address the behavior in a new MPI routine that Open MPI has yet to release. Instead, it affects all request testing and completion routines.

No, that is inaccurate. I fundamentally disagree with your interpretation, as it is based in a part of the standard that was likely written years before the new stuff was added.

I believe it is incorrect to dismiss portions of the standard that were established before the introduction of new requirements, unless clearly deprecated. It could also be possible, even highly plausible, that the individual(s) who added the new requirements was unaware of the existing, well-defined standards related to status handling.

Also, the behavior that I champion is also the one the other major MPI implementation chose.

I am not sure why there is a preference for the behavior championed by another major MPI implementation. Our approach is clearly mandated by the standard. Why would you not champion ours ?

I respectfully request for the Open MPI community to reconsider this behavior, and put it to vote if there is such procedure.

From a practical standpoint, I find this specific MPI requirement to be cumbersome and unnecessary, particularly given the rationale in the standard (saving a store on an already loaded cache line at the expense of a branch). We adhered to a different interpretation (similar to yours) for a considerable time until it was brought to our attention four years ago that we were not in compliance with the standard. Consequently, we modified our code to ensure compliance. I see no reason to change this until the MPI standard clarifies the outcome of status handling.

hppritcha added a commit to hppritcha/ompi that referenced this pull request Oct 28, 2025
related to open-mpi#13478 but without the controversial stuff.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
hppritcha added a commit to hppritcha/ompi that referenced this pull request Oct 30, 2025
related to open-mpi#13478 but without the controversial stuff.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit 4220027)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants