-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
request: return error code from free_fn of a generalized request #6751
Conversation
b4039da
to
6e67e50
Compare
test:mpich/ch3/tcp ✔️ |
src/binding/c/request_api.txt
Outdated
array_of_requests[idx] = MPI_REQUEST_NULL; | ||
if (rc && array_of_statuses != MPI_STATUSES_IGNORE) { | ||
array_of_statuses[idx].MPI_ERROR = rc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are the other status objects guaranteed to have MPI_SUCCESS
in them at this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, a typo. array_of_statuses[idx]
should be array_of_statuses[i]
.
are the other status objects guaranteed to have MPI_SUCCESS in them at this point?
I am not sure I understand the question. Why would it matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user is supposed to look in the status array to find out which operation has an error. So presumably all operations that succeeded need to have MPI_SUCCESS
in the status object or else the user may think that they failed? That way this actual error stands out as the one needing handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Ahh, so messy. I am tempted to let MPIR_Request_completion_processing
always set the status->MPI_ERROR
field. If we do that, do you think we can argue that we are not breaking the MPI standard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quote from 3.2.5 -
In general, message-passing calls do not modify the value of the error code field of
status variables. This field may be updated only by the functions in Section 3.7.5 which
return multiple statuses. The field is updated if and only if such function returns with an
error code of MPI_ERR_IN_STATUS.Rationale. The error field in status is not needed for calls that return only one status,
such as MPI_WAIT, since that would only duplicate the information returned by the
function itself. The current design avoids the additional overhead of setting it, in such
cases. The field is needed for calls that return multiple statuses, since each request
may have had a difierent failure. (End of rationale.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the wording such as "in general ... do not modify" and "may be updated only by..." does not exclude that we do set the MPI_ERROR
field. @raffenet What do you think?
The rationale essentially explains to the user the reason that the MPI_ERROR
field may not get updated thus should not be relied on, but it does not say that MPI_ERROR
field is supposed to be left untouched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. I think we can always set it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the text again because the commit modifying a bunch of tests gave me pause. I no longer think we can just blindly set the MPI_ERROR
field in status. This sentence seems clear cut:
The field is updated if and only if such function returns with an error code of MPI_ERR_IN_STATUS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll reverse the changes. Still IMO, this is a nonsense clause.
8f06200
to
e0629ce
Compare
The changes to this PR lead to deadlock when running the mpi4py testsuite in singleton init mode. |
test:mpich/ch4/ofi test:mpich/custom |
test:mpich/ch4/ofi test:mpich/custom |
I think it should work now.
Hui
…________________________________
From: Lisandro Dalcin ***@***.***>
Sent: Sunday, October 29, 2023 3:07 AM
To: pmodels/mpich ***@***.***>
Cc: Hui Zhou ***@***.***>; Author ***@***.***>
Subject: Re: [pmodels/mpich] request: return error code from free_fn of a generalized request (PR #6751)
The changes to this PR lead to deadlock when running the mpi4py testsuite in singleton init mode.
https://github.com/mpi4py/mpi4py-testing/actions/runs/6681969488
I cancelled the runs and there are no logs to look at, sorry.
The hydra runs refused to stop upon manual cancellation (I guess the GHA runtime limit will eventually trigger).
—
Reply to this email directly, view it on GitHub<#6751 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AALNM7R7MV7UX2YIHOYXY63YBYFDHAVCNFSM6AAAAAA6RXUU3WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBUGAZDQOBTGI>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Now I get different failures, all related to |
On invalid errorhandler argument, we now return |
test:mpich/ch4/ofi - 2 TIMEOUT due to server overload |
Add MPIR_Request_free_return, which is the same as MPIR_Request_free but will return any error code encountered during free. Right now, it just returns the potential return code from the free_fn for generalized request.
Make MPI_{Test,Wait}{,all,any,some} to return error code from the free_fn of a generalized request.
This is produced by maint/gen_binding_c.py.
The newly added routine mpi_alloc_mem_cptr_ is missing declaration in mpi_fortimpl.h.
Typos from copy pasting resulted in missing pmpi_base.mod rules.
In MPIC_Waitall, presetting the status_array assumes MPIR_Waitall will not modify the status fields for send requests. This is not a good assumption and makes code tricky to maintain. Instead, we let's only call MPIR_Process_status on recv requests by checking request kinds.
The test issues an MPI_Irecv first then an MPI_Irsend, while the partner process will only send the message after receiving from MPI_Irsend. The idea is the MPI_Irsend should complete first. Thus MPI_Waitany should complete MPI_Irsend first. However, it is possible that the progress in MPI_Waitany completes both requests and return the index for MPI_Irecv first -- this is not breaking the standard, strictly speaking. Even though waitany completes the Irsend first most of the time, but some time, especially the ubsan build, seems often will complete both and return the Irecv first, causing CI test failure. Since it is not strictly a standard behavior, we relax the test.
Check the following 3 cases explicitly: * statuses is MPI_STATUSES_IGNORE, just return MPI_ERR_IN_STATUS * mpi_error already set to MPI_ERR_IN_STATUS * mpi_error was MPI_SUCCESS. We need initialize all statuses' MPI_ERROR field to MPI_SUCCESS first.
test:mpich/ch4/ofi |
Pull Request Description
We are supposed to return the error code from the
free_fn
of a generalized request. This was broken in PR #6718Also add to .gitignore the missing entry
src/binding/c/c_binding.c
.Also fix a warning of missing prototypes in mpif_h binding.
Fixes #6747
Author Checklist
Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
Commits are self-contained and do not do two things at once.
Commit message is of the form:
module: short description
Commit message explains what's in the commit.
Whitespace checker. Warnings test. Additional tests via comments.
For non-Argonne authors, check contribution agreement.
If necessary, request an explicit comment from your companies PR approval manager.