Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

session: preliminary implementation 2 #5015

Merged
merged 10 commits into from Jan 25, 2021
Merged

Conversation

hzhou
Copy link
Contributor

@hzhou hzhou commented Jan 21, 2021

Pull Request Description

This is 2nd split from https://github.com/pmodels/mpich/pull/4949/commits to simplify review.

The commits were refactored a bit to keep the tracking of world init state localized as static variables.

[skip warnings]

Expected Impact

Author Checklist

  • Reference appropriate issues (with "Fixes" or "See" as appropriate)
  • Remove xfail from the test suite when fixing a test
  • Commits are self-contained and do not do two things at once
  • Commit message is of the form: module: short description and follows good practice
  • Passes whitespace checkers
  • Passes warning tests
  • Passes all tests
  • Add comments such that someone without knowledge of the code could understand
  • You or your company has a signed contributor's agreement on file with Argonne
  • For non-Argonne authors, request an explicit comment from your companies PR approval manager

@hzhou hzhou changed the title 2101 session pre 2 session: preliminary implementation 2 Jan 21, 2021
@hzhou hzhou force-pushed the 2101_session_pre_2 branch 3 times, most recently from cb5aaa4 to f33391d Compare January 21, 2021 02:53
@hzhou hzhou requested a review from minsii January 21, 2021 02:58
@hzhou
Copy link
Contributor Author

hzhou commented Jan 21, 2021

@minsii This is a second split from #4949. Could your review it?

@hzhou
Copy link
Contributor Author

hzhou commented Jan 21, 2021

test:mpich/ch3/most
test:mpich/ch4/most

@hzhou hzhou force-pushed the 2101_session_pre_2 branch 2 times, most recently from deb3e7b to 685b1db Compare January 22, 2021 02:58
@hzhou
Copy link
Contributor Author

hzhou commented Jan 22, 2021

test:mpich/ch4/most
test:mpich/ch3/most

@hzhou
Copy link
Contributor Author

hzhou commented Jan 22, 2021

@minsii This PR is ready for your review.

I have addressed your concern of confusing world state by refactoring to bring the definition and usage of world states localized and wrapped. Both init_counter and init_lock have become local static variables. The changes to the original mpich_state now are minimized.

src/mpi/errhan/errutil.c Outdated Show resolved Hide resolved
src/mpi/init/mpir_init.c \
src/mpi/init/mpir_finalize.c \
src/mpi/init/globals.c \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mpi/init: move MPIR_Init_thread and MPIR_Finalize together does not more than what the commit describes:

  1. Rename MPIR_Call_finalize_callbacks -> MPII_Call_finalize_callbacks and move it to init_util.c
  2. Merge mpir_finalize.c into mpir_init.c.

Can we separate into two commits? So the second commit does not change any code logic but simply moves code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.


static inline void MPII_world_set_initilized(void)
{
MPL_atomic_store_int(&MPIR_world_model_state, MPICH_WORLD_MODEL_INITIALIZED);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit init: add atomic MPIR_world_model_state contains two-stage changes IMO:

  1. Define MPIR_Process.mpich_state and its utility functines/macros
  2. Replace error check code in init_api.txt

Do you think we can separate into two commits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.

#define MPICH_MPI_STATE__IN_INIT 3 /* can call MPIR_Err_return_comm(...) on error */
#define MPICH_MPI_STATE__UNINITIALIZED 0
#define MPICH_MPI_STATE__MPIR_INITIALIZED 1
#define MPICH_MPI_STATE__INITIALIZED 2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I get the full picture. Here is the summary based on my understanding:

  • MPICH_MPI_STATE__PRE_INIT and MPICH_MPI_STATE__POST_FINALIZED ->MPICH_MPI_STATE__UNINITIALIZED
    -- no need to separately define post_finalized, so merge into a single state
  • MPICH_MPI_STATE__IN_INIT ->MPICH_MPI_STATE__MPIR_INITIALIZED
    -- semantic meaning is unchanged, since the reason to have such a state is to use MPIR routines
  • MPICH_MPI_STATE__POST_INIT ->MPICH_MPI_STATE__INITIALIZED
    -- simple renaming, no semantics change
    Can we clearly explain the changes in the commit message?

It looks like you overlooked my previous comment (see above). Just want to make the commit message more straightforward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your understanding is correct. Commit message updated with your comments.

@minsii
Copy link
Contributor

minsii commented Jan 25, 2021

The PR looks good to me. I have only a few comments regarding commit separation/message.

Sometime we need to include custom header files. For example, we will
inlcude "mpi_init.h" in init-only functions.
Move the utility code (just the callback utility for now) to separate
file -- init_util.c.

Rename MPIR_Call_finalize_callbacks to MPII_Call_finalize_callbacks
since it is only used by MPIR_Finalize_impl internally and is
consistenty with other routines in mpi_init.h.
Init and Finalize often need matching code, moving the code into the
same unit makes checking easier. Another motivation is both need manage
init state, init counter and use shared init lock. Having them together
allows use of static variables rather than maintaining these init-only
variables in a global scope.
It is allowed to have MPI_Session_init to run concurrently with
MPI_Init or another MPI_Session_init in a different thread, so we need
lock to control race conditions. We use MPL_initlock to avoid the
initialization issue of normal mutexes.

We also add init_counter so to track multiple init and finalize.
With sessions, we need separately track the states of "world model". It
is no longer sufficient to use a single MPIR_Process.mpich_state.
MPI_Initialized, MPI_Finalized, and init twice error checking only apply
to world model. Replace the checking the mpich_state with checking of
MPIR_world_model_state.
Rather than verbosely compare the mpich_state, which is semantically
obscure, wrap it in a function to make its semantics clear.

Use MPIR_Errutil_is_initialized in MPIR_Err_return_comm.  The usage is
for the same purpose as other usage of MPIR_Errutil_is_initialized,
that if the error utitlities are not ready, we'll have no choice but
to abort here.
Now that we separately track world model init states, the original
mpich_state now tracks initialization states for both the world model
and session model. To better reflect its semantics, we do the following
renaming:

* MPICH_MPI_STATE__PRE_INIT and MPICH_MPI_STATE__POST_FINALIZED
        ->MPICH_MPI_STATE__UNINITIALIZED

  No need to separately define post_finalized, especially now there is a
  possibility to start a new session after finalize, so merge into a single
  state

* MPICH_MPI_STATE__IN_INIT ->MPICH_MPI_STATE__MPIR_INITIALIZED

  Semantic meaning is unchanged, since the reason to have such a state
  is to use MPIR routines during init, especially during MPID_Init.

*MPICH_MPI_STATE__POST_INIT ->MPICH_MPI_STATE__INITIALIZED

  Simple renaming, no semantics change
The old comment is not precise and it refers to MPIR_Err_preOrPostInit
which will be removed.
It is much more helpful to know which function triggered the error.
@hzhou
Copy link
Contributor Author

hzhou commented Jan 25, 2021

test:mpich/ch4/most
test:mpich/ch3/most

@hzhou hzhou merged commit 0e57d7a into pmodels:main Jan 25, 2021
@hzhou hzhou deleted the 2101_session_pre_2 branch January 25, 2021 13:54
@hzhou hzhou mentioned this pull request Jan 25, 2021
10 tasks
@hzhou hzhou mentioned this pull request Feb 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants