New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent GCC optimization on shared variables #625
Conversation
The *_invoked variables are used in different code paths the compiler does not know about.
Build started, sha1 is merged |
|
Build finished. |
Refer to this link for build results (access rights to CI server needed): |
I also ran with GPU Direct RDMA and set --mca mpi_leave_pinned 0 and saw the dreadful results. I will work on disabling GPU Direct RDMA along with printing a warning if we detect mpi_leave_pinned of 0. |
Thanks very much for the patch, but it seems like a workaround rather than a solution to the problem. |
bot:retest |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
@oere Per discussion on the mailing list, I have 2 followup questions:
Thanks! |
@oere One more question... I'm having difficulty reproducing this behavior with gcc 4.9.2 on RHEL 6.5. I've made a trivial git repo with some code to try to reproduce the error. Are you able to reproduce the behavior with it? See https://github.com/jsquyres/gcc-posix-memalign-side-effects |
@jsquyres here is my reproducer (recompiled gcc 4.9.2 on RHEL 7) for the time being, a "softer" approach could be to update ptmalloc2.c only
and replace
with
#include <stdlib.h>
#include <malloc.h>
int global;
void *hook (size_t alignment, size_t size, const void *caller);
void *hook (size_t alignment, size_t size, const void *caller) {
global = 1;
}
int main (int argc, char *argv) {
void * c;
global = 0;
printf ("global = %d\n", global);
__memalign_hook = hook;
if (0 == global) {
posix_memalign(&c, 0x1000, 1);
if (0 != global)
printf ("changed !\n");
printf ("global = %d\n", global);
}
return 0;
} |
I find the " What I do want to point out is that int global; should be int global = 0; or static int global; otherwise you'll create a new common symbol 😞 Too bad that the followup response you received was nearly useless in terms of clarifying our questions. |
@ggouaillardet Yeah, the followup from Andrew Pinski was pretty useless. 😞 Can you ping them again to try to get more info? What do you think of the "softer" approach is what I suggested in #625 (comment) ? |
i pinged the gcc folks again both approaches are fine. |
bot:retest |
2 similar comments
bot:retest |
bot:retest |
Did we ever come to consensus on this one -- i.e., is the approach in this PR the one we want to go with, or the one I proposed, or the one @ggouaillardet proposed? At a minimum, I'd like to see a big comment in the code explaining the issue, because it's subtle and we're likely to forget the details over time (I already have!). |
It is still an issue of course. Recently, I tested it with GCC 5.2.0 and Open MPI 1.8.8 on ConnectX-4 EDR:
vs. patch applied:
I think the final fix should be aligned to other workarounds necessary in the Open MPI code of which I'm not aware. |
This has been fixed in master, 2.0.0, and 10.0.1 releases. Therefore, closing this bug. |
Topic/patch pml ob1 isend
The *_invoked variables are used in different code paths the compiler does not know about.
For detailed description see users mailing list “Bug: Disabled mpi_leave_pinned for GPUDirect and InfiniBand during run-time caused by GCC optimizations” http://www.open-mpi.org/community/lists/users/2015/06/27039.php.