Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hangs with armel #5321

Closed
amckinstry opened this issue Jun 22, 2018 · 13 comments
Closed

Hangs with armel #5321

amckinstry opened this issue Jun 22, 2018 · 13 comments
Labels
Milestone

Comments

@amckinstry
Copy link

Thank you for taking the time to submit an issue!

Background information

This bug was submitted to Debian, for Armel (Little-endian Arm 32 bit).
Original issue is : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=902041

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

3.1.0
For dependency versions, see the full build log here: https://buildd.debian.org/status/fetch.php?pkg=openmpi&arch=armel&ver=3.1.0-7&stamp=1528563191&raw=0

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Standard build on Debian; no arm-specific patches (one for arch64, not relevant here);

Please describe the system on which you are running

Debian 10 (sid).
4.9.0-6-armmp-lpae #1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07) armv7l GNU/Linux

Details of the problem

Random hangs.

#include <iostream>

#include <mpi.h>

int main(int argc, char** argv)
{
        MPI_Init(&argc, &argv);

        int rank, size;
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);
        MPI_Comm_size(MPI_COMM_WORLD, &size);

        for (int j = 0; j < 1024; ++j) {

                std::cout << "j = " << j << std::endl;

                MPI_Request req[2];

                const char out[4] = {1, 2, 3, 4};
                char in[4];

                int dest = (rank + 1) % size;
                int source = (size + rank - 1) % size;

                MPI_Isend(out, 4, MPI_BYTE, dest, 0, MPI_COMM_WORLD, &req[0]);
                MPI_Irecv(in, 4, MPI_BYTE, source, 0, MPI_COMM_WORLD, &req[1]);

                MPI_Waitall(2, req, MPI_STATUSES_IGNORE);
        }

        MPI_Finalize();
        return 0;
}

@PeterGottesman
Copy link
Contributor

Copied from the original bug report:

The C++ program below hangs often at a random iteration on abel.d.o (in a sid_armel-dchroot).

I built the program using mpic++ -Wall -std=c++14 -o test test.cc and
ran it with two ranks mpirun -np 2 ./test

FWIW, the same program also hangs on powerpc (partch.d.o).

@jsquyres
Copy link
Member

@hjelmn @shamisp Are we missing any ARM atomic fixes on the v3.0.x or v3.1.x branches?

@hjelmn
Copy link
Member

hjelmn commented Jun 25, 2018

This should be fixed in the latest v3.1.x tarballs and master. Please give one of those a try.

@jsquyres
Copy link
Member

@amckinstry Nathan is referring to the nightly v3.1.x tarballs here: https://www.open-mpi.org/nightly/v3.1.x/

Also, We're just about to do v3.1.1rc2 (I'm not sure that the fix was included in 3.1.1rc1...?).

@jsquyres jsquyres added this to the v3.1.1 milestone Jun 25, 2018
@jsquyres jsquyres added the bug label Jun 25, 2018
@amckinstry
Copy link
Author

Ok, will test openmpi-v3.1.x-201806270241-789ce8c

@amckinstry
Copy link
Author

I'm still seeing hangs in about 50% of runs, unfortunately.

@rhc54
Copy link
Contributor

rhc54 commented Jun 29, 2018

@amckinstry You are certainly missing the pmix atomics update - might be the cause of the problem?

@amckinstry
Copy link
Author

Quite likely, as pmix is a separate package in Debian. Do you have a pointer to the pmix update?

@rhc54
Copy link
Contributor

rhc54 commented Jun 29, 2018

What version of PMIx are you on? I'd have to make sure we backported it to the right place.

@rhc54
Copy link
Contributor

rhc54 commented Jun 29, 2018

I suspect you are on PMIx v2.1.x, so you'd want this release candidate:

https://github.com/pmix/pmix/releases/download/v2.1.2rc1/pmix-2.1.2rc1.tar.bz2

@rhc54
Copy link
Contributor

rhc54 commented Jul 1, 2018

@jsquyres
Copy link
Member

jsquyres commented Jul 9, 2018

@amckinstry Have you had a chance to test with PMIx v2.1.2? We've got a PR to update the internal / embedded PMIx to v2.1.2 (#5386).

@amckinstry
Copy link
Author

Ok, this works with v2.1.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants