Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[4.1.x] another batch of fixes #6549

Merged
merged 13 commits into from
Jun 6, 2023
Merged

[4.1.x] another batch of fixes #6549

merged 13 commits into from
Jun 6, 2023

Conversation

raffenet
Copy link
Contributor

@raffenet raffenet commented Jun 5, 2023

Pull Request Description

Fix a handful of user-reported issues.

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

hzhou and others added 13 commits May 26, 2023 15:13
Because the statuses parameter can accept MPI_STATUSES_IGNORE, we need
use pointer rather than array, or the modern compiler may complain.
Add a call to CFI_is_contiguous, which is needed by the f08
binding. Some compilers provide this prototype, but not the symbol, so
we need to disable f08 if the test fails to link.

Fixes pmodels#6505
Return an accurate error message to the user when they try to cancel an
inactive persistent send or recv request. Closes pmodels#6542.
We should retry the fi_rectmsg if it returns -FI_EAGAIN.
Add a patch to disable an error message from the psm3 provider in builds
with --disable-shared. Fixes pmodels#6518.
We need to handle the case where a non-zero root uses
MPI_IN_PLACE. Otherwise we could try reading from a bad address and
crash. Fixes pmodels#6540.

NOTE: For single node reduce operation with non-zero root, this
composition incurs an extra copy from rank 0->root.
This test checks whether MPI_Allreduce produce identical results on all
ranks with floating point datatype.
The ranks should be already in order from
MPII_Recexchalgo_get_neighbors.
If the basic datatype is a floating point, we need make sure to do the
local reduction following the same associativity on all ranks, or
different rank will result in non-identical results due to rounding.
@raffenet
Copy link
Contributor Author

raffenet commented Jun 5, 2023

test:mpich/ch4/most
test:mpich/ch3/most

@raffenet
Copy link
Contributor Author

raffenet commented Jun 6, 2023

Ignoring multinic failure in testing. Ready for review.

@raffenet raffenet requested a review from hzhou June 6, 2023 14:31
@hzhou
Copy link
Contributor

hzhou commented Jun 6, 2023

Ignoring multinic failure in testing. Ready for review.

That failure now appears quite frequent. I'll investigate at sometime.

Copy link
Contributor

@hzhou hzhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@raffenet
Copy link
Contributor Author

raffenet commented Jun 6, 2023

Ignoring multinic failure in testing. Ready for review.

That failure now appears quite frequent. I'll investigate at sometime.

There's some multinic fixes in main that haven't been backported to 4.1.x. Since its an opt-in feature I think we can safely ignore for now.

@raffenet raffenet merged commit fba49d6 into pmodels:4.1.x Jun 6, 2023
7 of 8 checks passed
@raffenet raffenet deleted the 4.1.x-fixes branch June 6, 2023 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants