fix problems with handling empty status #13478

hppritcha · 2025-10-28T17:38:06Z

Testing with mpi4py for MPI 4.1 compliance uncovered
a long existing problem in the way Open MPI handles (incorrectly)
returning an empty status when that is specified by the
MPI standard.

With this fix, mpi4py passes if we declare MPI 4.1 compliance
except for the big count tests.

Testing with mpi4py for MPI 4.1 compliance uncovered a long existing problem in the way Open MPI handles (incorrectly) returning an empty status when that is specified by the MPI standard. With this fix, mpi4py passes if we declare MPI 4.1 compliance except for the big count tests. Signed-off-by: Howard Pritchard <howardp@lanl.gov>

Signed-off-by: Howard Pritchard <howardp@lanl.gov>

but without bigcount tests Signed-off-by: Howard Pritchard <howardp@lanl.gov>

devreal · 2025-10-28T17:57:44Z

The fix is not to touch the MPI_ERROR field?

bosilca · 2025-10-28T18:02:26Z

ompi/mpi/c/request_get_status_any.c.in

        *indx = MPI_UNDEFINED;
        if (MPI_STATUS_IGNORE != status) {
-            OMPI_COPY_STATUS(status, ompi_status_empty, false);
+            OMPI_COPY_STATUS(status, ompi_status_empty, true);


This is not what the standard mandates. Section 3.2.5 it states:

... message-passing calls do not modify the value of the error code field of
status variables. This field may be updated only by the functions in Section 3.7.5 that
return multiple statuses. The field is updated if and only if such function returns with an
error code of MPI_ERR_IN_STATUS.

In most of these cases the return is MPI_SUCCESS so the MPI_ERROR shall not be modified.

there are multiple places in the standard where it states an empty status object is to be returned. see my comments below about that.

hppritcha · 2025-10-28T18:02:54Z

nope its to set it to MPI_SUCCESS in the special cases. That's what mpi4py was catching for a request_get_status_any test. grep for empty status in the standard then see the definition of empty status in section 3.7.3 of the MPI 1.3 and later standards. For the grepping use the MPI 4.1 standard. It doesn't look like Open MPI has ever defined the empty status object correctly.

bosilca · 2025-10-28T18:05:13Z

I don't think we are talking about the same thing. Whatever is in the empty status is one thing, what I was pointing out is that you alter the field MPI_ERROR in the status object on a call that does not return MPI_ERR_IN_STATUS. This is incorrect according to my reading of the standard.

hppritcha · 2025-10-28T18:05:17Z

@dalcinl check out this discussion since it was your tests which showed the problem.

hppritcha · 2025-10-28T18:07:35Z

here's an example of wording for MPI_Test in the standard:

One is allowed to call MPI_TEST with a null or inactive request argument. In such a
case the procedure returns with flag = true and empty status.

hppritcha · 2025-10-28T18:09:46Z

and here's what the standard says about what an empty status is

An empty status is a status that is set to
return tag = MPI_ANY_TAG, source = MPI_ANY_SOURCE, error = MPI_SUCCESS, and is also
internally configured so that calls to MPI_GET_COUNT and MPI_GET_ELEMENTS return
count = 0 and MPI_TEST_CANCELLED returns false. We set a status variable to empty
when the value returned by it is not significant. Status is set in this way so as to prevent
errors due to accesses of stale information.

bosilca · 2025-10-28T18:10:26Z

I also pointed to a different section in the standard that states something different. I think the correct solution is to properly define the empty status as you did, but do not set the MPI_ERROR field unless you are returning MPI_ERR_IN_STATUS.

hppritcha · 2025-10-28T18:10:48Z

then the mpi4py tests would not pass

bosilca · 2025-10-28T18:13:46Z

Whatever! mpi4py is not the MPI standard, it is an interpretation that might be incorrect.

dalcinl · 2025-10-28T18:14:42Z

@hppritcha Can you point me to the problematic test?

hppritcha · 2025-10-28T18:16:14Z

 85     def testGetStatusAny(self):
 86         with self.catchNotImplementedError(4, 1):
 87             status = self.STATUS
 88             index, flag = MPI.Request.Get_status_any(self.REQUESTS)
 89             self.assertEqual(index, MPI.UNDEFINED)
 90             self.assertTrue(flag)
 91             index, flag = MPI.Request.Get_status_any(self.REQUESTS, None)
 92             self.assertEqual(index, MPI.UNDEFINED)
 93             self.assertTrue(flag)
 94             if unittest.is_mpi("impi(>=2021.14.0)"):
 95                 status.error = MPI.SUCCESS
 96             index, flag = MPI.Request.Get_status_any(self.REQUESTS, status)
 97             self.assertEqual(index, MPI.UNDEFINED)
 98             self.assertTrue(flag)
 99             self.assertEqual(status.source, MPI.ANY_SOURCE)
100             self.assertEqual(status.tag, MPI.ANY_TAG)
101             self.assertEqual(status.error, MPI.SUCCESS)
102         with self.catchNotImplementedError(4, 1):
103             index, flag = MPI.Request.get_status_any(self.REQUESTS)
104             self.assertEqual(index, MPI.UNDEFINED)
105             self.assertTrue(flag)
106

I think bosilca would say line 101 is incorrect.

dalcinl · 2025-10-28T18:21:59Z

So, let's reduce the lines to the relevant ones

             index, flag = MPI.Request.Get_status_any(self.REQUESTS, status)
             self.assertEqual(index, MPI.UNDEFINED)
             self.assertTrue(flag)
             self.assertEqual(status.source, MPI.ANY_SOURCE)
             self.assertEqual(status.tag, MPI.ANY_TAG)
             self.assertEqual(status.error, MPI.SUCCESS)

The call to MPI.Request.Get_status_any is not failing, otherwise it would have thrown an exception and the line self.assertEqual(status.error, MPI.SUCCESS) will never hit. As the MPI call is not failing, and thus NOT returning with MPI_ERR_IN_STATUS, I would say that the status should indeed be set to the empty status. However, I would have to double check the exact wording of MPI_Request_get_status_any() in the standard, IIRC there were many rules with these new routines.

@bosilca What am I missing?

bosilca · 2025-10-28T18:22:48Z

Please read above. According to Section 3.2.5 we should never alter the MPI_ERROR field in the status object unless we are returning MPI_ERR_IN_STATUS. I'm not stating this makes sense, but it is what the standard mandates.

bosilca · 2025-10-28T18:24:11Z

So basically if I set some value X in the status.MPI_ERROR before any MPI calls that do not return MPI_ERR_IN_STATUS I should be able to retrieve that X after the call.

hppritcha · 2025-10-28T18:26:54Z

to me there seem to be conflicting statements in the standard. Where it is state that an empty status is returned under certain conditions, this would imply to me that if it were tested using a status query method that the behavior would be as described by the empty status description.

dalcinl · 2025-10-28T18:32:08Z

Well, the section saying to not touch MPI_ERROR is talking about "message passing calls", which is a bit generic. I'm wondering if whoever wrote that had test for completion calls in mind (most certainly not). In my personal taste, the more specific mandate to return empty status in completion calls should trump.

bosilca · 2025-10-28T18:34:17Z

I agree, these are conflicting statement. What I don't agree to is changing OMPI behavior to match a different understanding of a conflicting statement, for the single reason to pass some tests. Instead change the test to accept a different, but still legitimate, reading of the standard.

hppritcha · 2025-10-28T18:42:03Z

in any case thanks @dalcinl for rooting out we had an incorrect definition of empty status irrespective of whether to copy the error field over or not. I'll open an issue about this for mpi standard.

dalcinl · 2025-10-28T18:47:15Z

@hppritcha I this the only test failing? What about the test for MPI.Request.Get_status_all() ?

hppritcha · 2025-10-28T18:52:42Z

that was the only one that failed for me.

dalcinl · 2025-10-28T18:53:27Z

What I don't agree to is changing OMPI behavior to match a different understanding of a conflicting statement,

We are talking about the behavior in a very new MPI routine, that Open MPI has not yet released, right?

for the single reason to pass some tests.

No, that is inaccurate. I fundamentally disagree with your interpretation, as it is based in a part of the standard that was likely written years before the new stuff was added. Also, the behavior that I champion is also the one the other major MPI implementation chose.

I respectfully request for the Open MPI community to reconsider this behavior, and put it to vote if there is such procedure.

bosilca · 2025-10-28T18:54:02Z

All tests handling requests and status should fail, because we were very careful never to overwrite the MPI_ERROR.

hppritcha · 2025-10-28T18:57:41Z

i'll split out the controversial parts of this PR into another one. for sure we didn't define the empty status correctly and i want that in.

dalcinl · 2025-10-28T18:58:30Z

that was the only one that failed for me.

There you have, then... Is MPI_Request_get_status_all() working according to my interpretation?

hppritcha · 2025-10-28T19:06:19Z

the mpi4py stopped at the first failure which was for the any test. it probably would have failed the all test if the test framework had kept on going.

hppritcha · 2025-10-28T19:07:25Z

if you want to do more testing, modify your testany, wait, and waitall tests to catch the cases where an "empty status" is returned.

dalcinl · 2025-10-28T19:17:28Z

Indeed, it seems like MPI_Request_get_status_all() is overwriting the MPI_ERROR field to MPI_SUCCESS. This is using a build against the OMPI ABI.

@hppritcha I pushed a branch at mpi4py@update-ompi-main that will automatically build with all the new MPI 5.0 stuff. You can use that branch against the legacy OMPI ABI. There are some additional test failures. Sorry, I cannot do more for now, I'm really busy with other stuff.

Additionally, and despite @bosilca's claims, it seems that MPI_Request_get_status_any/all behave differently regarding the overwriting of the status->MPI_ERROR field to MPI_SUCCESS to become the empty status.

bosilca · 2025-10-28T19:39:29Z

We are talking about the behavior in a very new MPI routine, that Open MPI has not yet released, right?

This PR does not solely address the behavior in a new MPI routine that Open MPI has yet to release. Instead, it affects all request testing and completion routines.

No, that is inaccurate. I fundamentally disagree with your interpretation, as it is based in a part of the standard that was likely written years before the new stuff was added.

I believe it is incorrect to dismiss portions of the standard that were established before the introduction of new requirements, unless clearly deprecated. It could also be possible, even highly plausible, that the individual(s) who added the new requirements was unaware of the existing, well-defined standards related to status handling.

Also, the behavior that I champion is also the one the other major MPI implementation chose.

I am not sure why there is a preference for the behavior championed by another major MPI implementation. Our approach is clearly mandated by the standard. Why would you not champion ours ?

I respectfully request for the Open MPI community to reconsider this behavior, and put it to vote if there is such procedure.

From a practical standpoint, I find this specific MPI requirement to be cumbersome and unnecessary, particularly given the rationale in the standard (saving a store on an already loaded cache line at the expense of a branch). We adhered to a different interpretation (similar to yours) for a considerable time until it was brought to our attention four years ago that we were not in compliance with the standard. Consequently, we modified our code to ensure compliance. I see no reason to change this until the MPI standard clarifies the outcome of status handling.

related to open-mpi#13478 but without the controversial stuff. Signed-off-by: Howard Pritchard <howardp@lanl.gov>

related to open-mpi#13478 but without the controversial stuff. Signed-off-by: Howard Pritchard <howardp@lanl.gov> (cherry picked from commit 4220027)

hppritcha added 3 commits October 28, 2025 11:25

DO NOT MERGE ME see if mpi4py believes us for 4.1 compliance

a69b935

Signed-off-by: Howard Pritchard <howardp@lanl.gov>

DONT MERGE ME: running mpi4py for 4.1

7c2393c

but without bigcount tests Signed-off-by: Howard Pritchard <howardp@lanl.gov>

github-actions bot added the Target: main label Oct 28, 2025

hppritcha requested a review from edgargabriel October 28, 2025 17:38

edgargabriel approved these changes Oct 28, 2025

View reviewed changes

bosilca reviewed Oct 28, 2025

View reviewed changes

hppritcha mentioned this pull request Oct 28, 2025

How to handle ERROR field when returning an empty status mpi-forum/mpi-issues#1094

Open

hppritcha added a commit to hppritcha/ompi that referenced this pull request Oct 28, 2025

fix empty status fields

4220027

related to open-mpi#13478 but without the controversial stuff. Signed-off-by: Howard Pritchard <howardp@lanl.gov>

hppritcha mentioned this pull request Oct 28, 2025

fix empty status fields #13479

Merged

hppritcha added a commit to hppritcha/ompi that referenced this pull request Oct 30, 2025

fix empty status fields

06c456e

related to open-mpi#13478 but without the controversial stuff. Signed-off-by: Howard Pritchard <howardp@lanl.gov> (cherry picked from commit 4220027)

hppritcha mentioned this pull request Oct 30, 2025

fix empty status fields #13487

Merged

fix problems with handling empty status #13478

Are you sure you want to change the base?

fix problems with handling empty status #13478

Uh oh!

Conversation

hppritcha commented Oct 28, 2025

Uh oh!

devreal commented Oct 28, 2025

Uh oh!

bosilca Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

hppritcha Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

hppritcha commented Oct 28, 2025

Uh oh!

bosilca commented Oct 28, 2025

Uh oh!

hppritcha commented Oct 28, 2025

Uh oh!

hppritcha commented Oct 28, 2025

Uh oh!

hppritcha commented Oct 28, 2025

Uh oh!

bosilca commented Oct 28, 2025

Uh oh!

hppritcha commented Oct 28, 2025

Uh oh!

bosilca commented Oct 28, 2025

Uh oh!

dalcinl commented Oct 28, 2025

Uh oh!

hppritcha commented Oct 28, 2025

Uh oh!

dalcinl commented Oct 28, 2025

Uh oh!

bosilca commented Oct 28, 2025

Uh oh!

bosilca commented Oct 28, 2025

Uh oh!

hppritcha commented Oct 28, 2025

Uh oh!

dalcinl commented Oct 28, 2025

Uh oh!

bosilca commented Oct 28, 2025

Uh oh!

hppritcha commented Oct 28, 2025

Uh oh!

dalcinl commented Oct 28, 2025

Uh oh!

hppritcha commented Oct 28, 2025

Uh oh!

dalcinl commented Oct 28, 2025

Uh oh!

bosilca commented Oct 28, 2025

Uh oh!

hppritcha commented Oct 28, 2025

Uh oh!

dalcinl commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hppritcha commented Oct 28, 2025

Uh oh!

hppritcha commented Oct 28, 2025

Uh oh!

dalcinl commented Oct 28, 2025

Uh oh!

bosilca commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dalcinl commented Oct 28, 2025 •

edited

Loading