Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4.1.x ERROR field of MPI_Status uninitialized after MPI_Recv call #12049

Closed
blattms opened this issue Nov 6, 2023 · 9 comments
Closed

4.1.x ERROR field of MPI_Status uninitialized after MPI_Recv call #12049

blattms opened this issue Nov 6, 2023 · 9 comments

Comments

@blattms
Copy link

blattms commented Nov 6, 2023

Background information

What version of Open MPI are you using?

The issue is there in versions 4.1.6 (Debian sid) and 4.1.4 (Debian bookworm), but not on 4.1.0 (Debian oldstable/bullseye).

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

openmpi is installed from packages either on Debian sid or Debian 12.2

Please describe the system on which you are running

  • Operating system/version: Debian GNU/Linux 12.2 and sid
  • Computer hardware: amd64
  • Network type: only one node without network

Details of the problem

If I compile the following test file with debugging

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
 
/**
 * @brief Display information contained in the MPI_Status.
 **/
int main(int argc, char* argv[])
{
    MPI_Init(&argc, &argv);
 
    int my_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
 
    if(my_rank == 0)
    {
        // The "master" MPI process issues the MPI_Bsend.
        int buffer_sent = 12345;
        int tag = 67890;
        printf("MPI process %d sends value %d with tag %d.\n", my_rank, buffer_sent, tag);
        MPI_Ssend(&buffer_sent, 1, MPI_INT, 1, tag, MPI_COMM_WORLD);
    }
    else
    {
        // The "slave" MPI process receives the message.
        int buffer_received;
        MPI_Status status;
        MPI_Recv(&buffer_received, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
	if(status.MPI_ERROR)
          printf("MPI process %d received value %d from rank %d, with tag %d and error code %d.\n", 
               my_rank,
               buffer_received,
               status.MPI_SOURCE,
               status.MPI_TAG,
               status.MPI_ERROR);
    }
 
    MPI_Finalize();
 
    return EXIT_SUCCESS;
}

using

shell$ mpicc -o tester -g -O0 mpi_test.c

and run under valgrind using:

shell$ mpirun -np 2 valgrind --suppressions=/usr/share/openmpi/openmpi-valgrind.supp ./tester

Then valgrind shows that the ERROR field is not initialized during the if(status.MPI_ERROR):

MPI process 0 sends value 12345 with tag 67890.
==38889== Thread 1:
==38889== Conditional jump or move depends on uninitialised value(s)
==38889==    at 0x10925D: main (mpi_test.c:29)
==38889== 

Shouldn't MPI_Recv initialize the MPI_Status field completely? We have asserts like assert(!status.MPI_ERROR); that are failing because of this. The workaround is to always default initialize the status field

MPI_Status status = {};

Is this intended or are we doing something wrong?

@devreal
Copy link
Contributor

devreal commented Nov 6, 2023

This is intended behavior. Section 3.2.5 states:

In general, message-passing calls do not modify the value of the error code field of
status variables. This field may be updated only by the functions in Section 3.7.5 that
return multiple statuses.

I agree that this is not ideal since it leads to spurious warnings like this. Feel free to open an issue against the standard: https://github.com/mpi-forum/mpi-issues

blattms added a commit to blattms/opm-models that referenced this issue Nov 7, 2023
…us::ERROR

According to MPI standard the ERROR field of MPI_Status might not be initialized
unless for operations that return multiple statuses, see  Section 3.7.5
of the standard. In older OpenMPI versions (<=4.0.x) we were lucky
that ERROR was initialized to 0 always. This is not the case for 4.1.y
at least. See open-mpi/ompi#12049.

Therefore we use the retun code to determine whether there was an
error. Note that the default error handler usually is to abort the
application if errors occur. In that case the error code will always
return success.
@blattms
Copy link
Author

blattms commented Nov 7, 2023

Thanks a lot. I had a hard time figuring out the issue. I only read the man page. From that this was not clear.

@blattms
Copy link
Author

blattms commented Nov 7, 2023

Closing this as it is answered.

@jsquyres
Copy link
Member

jsquyres commented Nov 7, 2023

Thanks a lot. I had a hard time figuring out the issue. I only read the man page. From that this was not clear.

Fair enough. Got any suggestions for text to add to the man page about this? (we'll likely need to include that same text on a bunch of receive-flavored man pages)

@blattms
Copy link
Author

blattms commented Nov 8, 2023

Thanks a lot. I had a hard time figuring out the issue. I only read the man page. From that this was not clear.

Fair enough. Got any suggestions for text to add to the man page about this? (we'll likely need to include that same text on a bunch of receive-flavored man pages)

I guess a comment would fit after this paragraph about status in the Notes section

If your application does not need to examine the status field, you can save resources by using the
predefined constant  MPI_STATUS_IGNORE as a special value for the status argument.

maybe something along these lines:

Not the the status field is mainly meant to be used to provide the number of elements received via function 
MPI_Get_count, the rank of the sending process and the tag used for sending. Other fields of status might
be uninitialized after the function returns.

blattms added a commit to blattms/opm-models that referenced this issue Nov 9, 2023
…us::ERROR

According to MPI standard the ERROR field of MPI_Status might not be initialized
unless for operations that return multiple statuses, see  Section 3.7.5
of the standard. In older OpenMPI versions (<=4.0.x) we were lucky
that ERROR was initialized to 0 always. This is not the case for 4.1.y
at least. See open-mpi/ompi#12049.

Therefore we use the retun code to determine whether there was an
error. Note that the default error handler usually is to abort the
application if errors occur. In that case the error code will always
return success.
@jsquyres
Copy link
Member

@blattms Well this certainly caused me to go down a rabbit hole. 🐇 (hey, who knew that Github had a Rabbit emoji?)

I took your suggestion and tweaked it, and noticed that we actually had related text about this issue in some other man pages. So I ended up stealing that text and using it. Check out #12058 (which rendered to a temporary RTD location here: https://ompi--12058.org.readthedocs.build/en/12058/)

@blattms
Copy link
Author

blattms commented Nov 12, 2023

Thanks a lot for. That text is really perfect. 👍

@bosilca
Copy link
Member

bosilca commented Nov 12, 2023

Not that long ago we had a very similar discussion that ended up in us making the change. Check #9128 for more info.

@jsquyres
Copy link
Member

@blattms FYI: the docs updates have been merged on the main branch already (and are live under https://docs.open-mpi.org/en/main/). There's a pending v5.0.x PR #12069 that will likely be merged soon (and will then become live under https://docs.open-mpi.org/en/v5.0.x/).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants