Skip to content

Conversation

@gc00
Copy link
Collaborator

@gc00 gc00 commented May 19, 2020

Hi @rohgarg,

This is a first attempt at making MANA portable across different MPI implementations.

Look at DMTCP_ROOT/configure-mana for an example of how to use it. After configuring, it will then use the Makefile variables found in DMTCP_ROOT/contrib/mpi-proxy-split/Makefile_config.

I tested this with MPICH. The configure-mana file is set for my MPICH build on login (CentOS).

Mostly, I can do 'make' in contrib/mpi-proxy-split. But in lower-half, 'make' fails to compile the lh_proxy executable, with the error message:
/usr/bin/ld: cannot find -lxml2
In my next debugging session, I'll look for why it needs libxml2.a. Apparently, mpicc translates -lmpi -lrt into -lmpi -lm -lxml2 -lpthread -lrt. I'm not sure why. Looking at sh -x mpicc ..., I don't see an -lxml2 when compiling simple MPI programs in the test subdirectory.

IN THIS UPDATED PR, THE BUG BELOW HAS BEEN FIXED NOW BY THE SECOND COMMIT:
What you'll see is that contrib/mpi-proxy-split/mpi-wrappers/mpi_dummy.c fails to compile, because the mpi.h supplied by MPICH is incompatible with our assumed mpi.h in that file. Maybe you can diagnose this further, @rohgarg. Feel free to extend this branch and merge it, if you can figure out mpi_dummy.c. Developing on a local machine will be faster than developing on Cori, where 'srun' can take a long time to start up.

Also, as a small side note, I saw that test/autotest_config.py was placed under git version control. This seems to be a mistake, since we already have test/autotest_config.py.in. But you know your intentions better than me. Could you also take a look at that?

@gc00 gc00 requested a review from rohgarg May 19, 2020 11:27
@rohgarg rohgarg changed the base branch from master to refactoring May 19, 2020 16:43
@rohgarg
Copy link
Contributor

rohgarg commented May 19, 2020

The configure-mana file is set for my MPICH build on login (CentOS). What you'll see is that contrib/mpi-proxy-split/mpi-wrappers/mpi_dummy.c fails to compile, because the mpi.h supplied by MPICH is incompatible with our assumed mpi.h in that file.

I'm aware of the issue, @gc00. The problem occurs because some of the MPI functions have a slightly different prototype in the two implementations, which is strange. :-) I have had to manually edit the wrapper prototypes in the plugin to make it work so far. Anyway, I will try to create a more general fix to make it work.

@gc00 gc00 force-pushed the refactor-autoconf-mpi-vars branch from 81f0af4 to 39f6064 Compare May 20, 2020 10:02
@rohgarg
Copy link
Contributor

rohgarg commented May 20, 2020

@gc00: I checked my build of mpich-3.3.2 on a CentOS7 machine. I don't have libxml2 installed. I configured and built mpich with the following options

$ ./configure --enable-g=dbg --prefix=/home/rohgarg/sws

Here's what the config.log says:

configure:38772: checking for LIBXML2
configure:38780: $PKG_CONFIG --exists --silence-errors "libxml-2.0"
configure:38783: $? = 1
configure:38799: $PKG_CONFIG --exists --silence-errors "libxml-2.0"
configure:38802: $? = 1
No package 'libxml-2.0' found
12265 configure:38832: result: no

Th next code snippet is from the header file (src/include/mpichconf.h) it generates.

/* Define to 1 if you have the `libxml2' library. */
/* #undef HWLOC_HAVE_LIBXML2 */

I was looking at the options used by mpicc while trying to link the lower half binary. Here's the full command line:

/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/collect2 --build-id --no-add-needed --hash-style=gnu -m elf_x86_64 -static -o lh_proxy /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/crt1.o /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/crti.o /usr/lib/gcc/x86_64-redhat-linux/4.8.5/crtbeginT.o -L/home/rohgarg/sws/lib -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../.. -Ttext-segment 0xE000000 --wrap __munmap --wrap shmat --wrap shmget lh_proxy.o libproxy.a -lmpi -lrt -lpthread -lc libproxy.a -rpath /home/rohgarg/sws/lib --enable-new-dtags -lmpi -lm -lpthread -lrt --start-group -lgcc -lgcc_eh -lc --end-group /usr/lib/gcc/x86_64-redhat-linux/4.8.5/crtend.o /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/crtn.o

@rohgarg
Copy link
Contributor

rohgarg commented May 20, 2020

@gc00: I looked at mpich's source more carefully. It's the hwloc library (a dependency of mpich, and shipped along with the mpich source) that requires libxml2. However, hwloc does detect if libxml2 is not available on the system and can be built with/without libxml2 support.

On my CentOS7 system, with no libxml2 installed, it indeed builds without libxml2.

/home/rohgarg/sws/mpich-3.3.2/src/hwloc/hwloc $ ls topology-xml*
topology-xml.c  topology-xml.lo  topology-xml.o  topology-xml-libxml.c  topology-xml-nolibxml.c  topology-xml-nolibxml.lo  topology-xml-nolibxml.o

Note that only the *-nolibxml versions of the source files are compiled. The *-libxml versions are not compiled, since the configure script did not find any libxml2 on the system (see my comment above).

@gc00 gc00 force-pushed the refactor-autoconf-mpi-vars branch from 39f6064 to 50aa209 Compare May 21, 2020 17:59
 * But continue to use normal ./configure for Cori, and other known hosts
@gc00 gc00 force-pushed the refactor-autoconf-mpi-vars branch 5 times, most recently from afd89cd to 16d3c58 Compare May 25, 2020 23:48
@gc00 gc00 force-pushed the refactor-autoconf-mpi-vars branch 3 times, most recently from d367c36 to a2d2087 Compare May 28, 2020 17:51
@gc00 gc00 force-pushed the refactor-autoconf-mpi-vars branch 2 times, most recently from 3d1f4ef to 2ee5d6d Compare May 28, 2020 18:18
@gc00 gc00 force-pushed the refactoring branch 2 times, most recently from d4da85c to 59d8b3f Compare May 28, 2020 18:26
@gc00 gc00 force-pushed the refactor-autoconf-mpi-vars branch from cf95513 to f6dc93c Compare May 29, 2020 09:59
@gc00 gc00 force-pushed the refactor-autoconf-mpi-vars branch from f6dc93c to 4edb343 Compare May 29, 2020 10:29
@gc00 gc00 force-pushed the refactor-autoconf-mpi-vars branch from 4edb343 to 6594de3 Compare May 29, 2020 11:07
@gc00 gc00 force-pushed the refactor-autoconf-mpi-vars branch from 820325d to 80ef34a Compare June 11, 2020 06:20
@xuyao0127 xuyao0127 force-pushed the refactoring branch 2 times, most recently from 11c02c0 to e1fab96 Compare May 5, 2021 14:12
gc00 added a commit that referenced this pull request May 5, 2021
@xuyao0127 xuyao0127 force-pushed the refactoring branch 2 times, most recently from fec3dd8 to 437ed38 Compare May 17, 2021 21:27
@gc00 gc00 force-pushed the refactoring branch 3 times, most recently from c57dfd2 to 197a2cf Compare May 18, 2021 23:27
@xuyao0127 xuyao0127 force-pushed the refactoring branch 4 times, most recently from d4a048b to 94a642a Compare May 22, 2021 16:26
@gc00 gc00 force-pushed the refactoring branch 3 times, most recently from 9fbf45f to 830fb73 Compare June 7, 2021 23:03
@gc00 gc00 force-pushed the refactoring branch 2 times, most recently from ba3c4c3 to a208eaf Compare August 3, 2021 09:55
@gc00 gc00 closed this Aug 4, 2021
@gc00 gc00 deleted the refactor-autoconf-mpi-vars branch August 4, 2021 22:32
dahongli pushed a commit to dahongli/mana that referenced this pull request Apr 27, 2022
There is race between two phase algorithm preSuspendBarrier() and two
phase commit commit_begin().

T1 thread mpickpt#1: setCkptPending()
T2               thread mpickpt#2: checks isCkptPending() then stop(comm)
T3              thread mpickpt#2: stop(comm) sets state to PHASE_1
T4              thread mpickpt#2: while isCkptPending() loop
T5 thread mpickpt#1 calls waitForNewStateAfter(PHASE_1)

This change adds a timeout when waiting state change. When timeout
expires, the barrier sends a response message to the coordinator.
The coordinator gives a free phase1 pass to one member if all members
are in phase1 so this member can move to IN_CS state and contine
two phase commit. Once this member is in IN_CS, the coordinator would
unblock other members.
dahongli pushed a commit to dahongli/mana that referenced this pull request Apr 27, 2022
There is race between two phase algorithm preSuspendBarrier() and two
phase commit commit_begin().

T1 thread mpickpt#1: setCkptPending()
T2               thread mpickpt#2: checks isCkptPending() then stop(comm)
T3              thread mpickpt#2: stop(comm) sets state to PHASE_1
T4              thread mpickpt#2: while isCkptPending() loop
T5 thread mpickpt#1 calls waitForNewStateAfter(PHASE_1)

This change adds a timeout when waiting state change. When timeout
expires, the barrier sends a response message to the coordinator.
The coordinator gives a free phase1 pass to one member if all members
are in phase1 so this member can move to IN_CS state and contine
two phase commit. Once this member is in IN_CS, the coordinator would
unblock other members.
dahongli pushed a commit to dahongli/mana that referenced this pull request Apr 28, 2022
phase commit commit_begin().

T1 thread mpickpt#1: setCkptPending()
T2               thread mpickpt#2: checks isCkptPending() then stop(comm)
T3              thread mpickpt#2: stop(comm) sets state to PHASE_1
T4              thread mpickpt#2: while isCkptPending() loop
T5 thread mpickpt#1 calls waitForNewStateAfter(PHASE_1)

This change adds a timeout when waiting state change. When the
preSuspend wait for PHASE_1 state change times out, it would
set response to FREE_PASS.
dahongli added a commit that referenced this pull request Apr 28, 2022
There is race between two phase algorithm preSuspendBarrier() and two
phase commit commit_begin().

T1 thread #1: setCkptPending()
T2               thread #2: checks isCkptPending() then stop(comm)
T3              thread #2: stop(comm) sets state to PHASE_1
T4              thread #2: while isCkptPending() loop
T5 thread #1 calls waitForNewStateAfter(PHASE_1)

This change adds a timeout when waiting state change. When the
preSuspend wait for PHASE_1 state change times out, it would
set response to FREE_PASS.

Co-authored-by: Dahong Li <root@cori.nersc.gov>
dahongli added a commit that referenced this pull request Apr 28, 2022
* Fix the race between two phase commit and checkpointing

There is race between two phase algorithm preSuspendBarrier() and two
phase commit commit_begin().

T1 thread #1: setCkptPending()
T2               thread #2: checks isCkptPending() then stop(comm)
T3              thread #2: stop(comm) sets state to PHASE_1
T4              thread #2: while isCkptPending() loop
T5 thread #1 calls waitForNewStateAfter(PHASE_1)

This change adds a timeout when waiting state change. When timeout
expires, the barrier sends a response message to the coordinator.
The coordinator gives a free phase1 pass to one member if all members
are in phase1 so this member can move to IN_CS state and contine
two phase commit. Once this member is in IN_CS, the coordinator would
unblock other members.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants