-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rmaps/ppr: Fix case where oversubscribe is ignored. #1327
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this actually fixes the error you reference - e.g., it doesn't fix the cmd line in the deprecated case (the first case the user reported). I'm also not sure it really fixes the oversubscribed case - it will suppress the error, but I'm not sure it actually maps correctly.
What is the value of node->slots_available
in this scenario? I suspect it is zero, which means the entire loop is getting ignored and we simply drop down to the "overflow" logic.
@rhc54 I verified it produced the same results as the OMPI v4.1 branch with orte. This code/logic is basically the same as that (minus the change referenced in this commit). So I am just restoring old behavior there. Now, maybe that old behavior is incorrect. You are right in that this doesn't address the command line, but that can come in a separate commit/fix. Here is the output from v4.1 on an init/finalize program:
with this patch on ompi main:
if the old behavior is incorrect, I can try and fix the logic to be what it should be. |
I'm not saying the old behavior is incorrect. I'm only pointing out that I didn't change the logic you reference and now modify. The reason it is now generating an error is because In other words, your "fix" just disables the loop without generating an error. This seems kinda weird and likely to cause problems in the future. It also makes me wonder if we have other problems that will show as soon as someone types a slightly different cmd line. The critical question is: is |
@rhc54 no - slots available is 44.
|
I really had to scratch my head over this one - the change I made couldn't possibly have resulted in the change in behavior being reported in the cited OMPI issue. I had to go back and look at the difference between what is currently in OMPI v4.x vs what is in PRRTE - and there it is. The difference is simply that OMPI is wrong - it doesn't check for oversubscribe conditions at all and just puts the procs on the node, regardless of the number of slots. PRRTE had been changed to detect oversubscription and error out if that was found, which is correct. However, it should also have checked if the user permitted oversubscribe. So the referenced change you cite above actually didn't cause the problem - PRRTE had this bug for a long time. This change should fix that problem. You might want to go back and fix the OMPI v4 series as it is incorrect. |
Actually, I correct that statement. ORTE checks the oversubscribe condition later, after all the procs have been placed on the node. In fact, PRRTE does as well, so the correct fix is to simply remove that code block - PRRTE will detect oversubscribe and deal with it down below. There are still two problems with the code. First, IIRC even specifying "oversubscribe allowed" cannot override a provided max-slots value. The check in the code block starting at line 380 doesn't seem quite correct in that regard. We should check the other mappers to see how they interpret the max-slots limit to ensure this code is consistent. Second, it isn't clear to me that we correctly recover the slots if we rule that the job violates oversubscription limits and we terminate the mapping procedure. It looks like "slots_inuse" has been modified, but I don't see any place where it gets reset. Might happen in the state machine when the job gets reported as unable to map and the job object gets cleaned up, but I haven't verified that. Regardless, the correct solution here is to remove the code block starting at line 315, and then check the max-slots interpretation. |
There was a case where prte was returning an error on a failure to map even though the user passed `:OVERSUBSCRIBE` to the command line. Introduced by: 8a3e938 Refs: open-mpi/ompi#10216 Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
71c6488
to
5ca431c
Compare
For what it is worth, the |
Sigh - okay, I'll take a look at them. Guess I'll also look into the |
I took a look and those files are fine - they check the oversubscription condition on-the-fly. The ppr case assigns all the procs to the node and then checks for oversubscription. The extra code block in ppr was causing it to do the check both on-the-fly and at the end, which is unnecessary. It makes sense for seq and rank_file to do it on-the-fly because they don't work on a per-node basis, but rather assign one proc at a time to whatever node was given. |
FWIW: max_slots appears to be treated consistently across the mappers, although I'm not sure I agree with the treatment. Still, it has been that way for quite some time - no point in changing it now. |
There was a case where prte was returning an error
on a failure to map even though the user passed
:OVERSUBSCRIBE
to the command line.Introduced by: 8a3e938
The fix is easy, just check if :OVERSUBSCRIBE is set
in the run command, and not return an error if so.
Refs: open-mpi/ompi#10216
Signed-off-by: Austen Lauria awlauria@us.ibm.com