Prevent GCC optimization on shared variables #625

oere · 2015-06-04T10:16:03Z

The *_invoked variables are used in different code paths the compiler does not know about.
For detailed description see users mailing list “Bug: Disabled mpi_leave_pinned for GPUDirect and InfiniBand during run-time caused by GCC optimizations” http://www.open-mpi.org/community/lists/users/2015/06/27039.php.

The *_invoked variables are used in different code paths the compiler does not know about.

mellanox-github · 2015-06-04T10:22:10Z

Build started, sha1 is merged

mellanox-github · 2015-06-04T10:44:52Z

Coverity found 816 errors for all
Coverity found 5 errors for oshmem
Coverity found 2 errors for fca
Coverity report at http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr//592/Coverity_Report

Sent from GH.

mellanox-github · 2015-06-04T10:59:11Z

Build finished.

mellanox-github · 2015-06-04T10:59:12Z

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/592/

rolfv · 2015-06-04T15:29:22Z

I also ran with GPU Direct RDMA and set --mca mpi_leave_pinned 0 and saw the dreadful results. I will work on disabling GPU Direct RDMA along with printing a warning if we detect mpi_leave_pinned of 0.

hppritcha · 2015-06-04T15:41:21Z

Thanks very much for the patch, but it seems like a workaround rather than a solution to the problem.

hppritcha · 2015-06-04T16:01:17Z

bot:retest

lanl-ompi · 2015-06-04T16:16:22Z

Refer to this link for build results (access rights to CI server needed):
http://jenkins.open-mpi.org/job/ompi_master_pr_distcheck/9/
Test PASSed.

lanl-ompi · 2015-06-04T16:18:30Z

Refer to this link for build results (access rights to CI server needed):
http://jenkins.open-mpi.org/job/ompi_master_pr_cle5.2up02/98/
Test PASSed.

mellanox-github · 2015-06-04T16:28:40Z

Coverity found 816 errors for all
Coverity found 5 errors for oshmem
Coverity found 2 errors for fca
Coverity report at http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr//593/Coverity_Report

Sent from GH.

mellanox-github · 2015-06-04T16:42:59Z

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/593/

jsquyres · 2015-06-08T18:23:00Z

@oere Per discussion on the mailing list, I have 2 followup questions:

Is there a difference between making the individual members be volatile vs. making the entire instance volatile?
Can you put add a huge comment in the code about how this is a workaround that shouldn't be necessary but seems to be required on certain platforms, a URL to this PR and/or the mailing list thread, yadda yadda yadda...? Specifically: when we look back at this code in 6+ months -- when this PR and it has completely cached out of our brains -- it will be helpful to have a cache-refresh right there in a comment in the code.

Thanks!

jsquyres · 2015-06-08T18:46:59Z

@oere One more question... I'm having difficulty reproducing this behavior with gcc 4.9.2 on RHEL 6.5. I've made a trivial git repo with some code to try to reproduce the error. Are you able to reproduce the behavior with it? See https://github.com/jsquyres/gcc-posix-memalign-side-effects

ggouaillardet · 2015-06-09T04:54:48Z

@jsquyres here is my reproducer (recompiled gcc 4.9.2 on RHEL 7)
(sent to gcc-bugs (at) gcc.gnu.org https://gcc.gnu.org/ml/gcc-bugs/2015-06/msg00757.html)

for the time being, a "softer" approach could be to update ptmalloc2.c only

int * volatile memalign_invoked = &mca_memory_linux_component.memalign_invoked;

and replace

mca_memory_linux_component.memalign_invoked```

with

*memalign_invoked

#include <stdlib.h>
#include <malloc.h>

int global;

void *hook (size_t alignment, size_t size, const void *caller);


void *hook (size_t alignment, size_t size, const void *caller) {
    global = 1;
}

int main (int argc, char *argv) {
    void * c;
    global = 0;
    printf ("global = %d\n", global);
    __memalign_hook = hook;
    if (0 == global) {
        posix_memalign(&c, 0x1000, 1);
        if (0 != global)
            printf ("changed !\n");
        printf ("global = %d\n", global);
    }
    return 0;
}

goodell · 2015-06-09T18:29:15Z

I find the "volatile pointer to integer" change to be a bit odd, but I'm not sure that I care very much.

What I do want to point out is that

int global;

should be

int global = 0;

or

static int global;

otherwise you'll create a new common symbol 😞

Too bad that the followup response you received was nearly useless in terms of clarifying our questions.

jsquyres · 2015-06-10T01:22:16Z

@ggouaillardet Yeah, the followup from Andrew Pinski was pretty useless. 😞 Can you ping them again to try to get more info?

What do you think of the "softer" approach is what I suggested in #625 (comment) ?

ggouaillardet · 2015-06-10T02:07:16Z

i pinged the gcc folks again

both approaches are fine.
mine is "even softer" since the mca_memory_linux_component.memalign_invoked and friends are considered as volatile only in opal_memory_linux_ptmalloc2_open
that being said, this might not be the most readable/maintanable one ...

hppritcha · 2015-06-16T04:15:29Z

bot:retest

hppritcha · 2015-06-16T05:22:20Z

bot:retest

hppritcha · 2015-06-16T21:58:13Z

bot:retest

jsquyres · 2015-08-07T10:51:20Z

Did we ever come to consensus on this one -- i.e., is the approach in this PR the one we want to go with, or the one I proposed, or the one @ggouaillardet proposed? At a minimum, I'd like to see a big comment in the code explaining the issue, because it's subtle and we're likely to forget the details over time (I already have!).

oere · 2015-08-11T14:43:42Z

It is still an issue of course. Recently, I tested it with GCC 5.2.0 and Open MPI 1.8.8 on ConnectX-4 EDR:

# OSU MPI Bandwidth Test v4.4.1
# Size      Bandwidth (MB/s)
1                       4.19
2                       8.29
4                      16.48
8                      33.35
16                     66.62
32                    126.51
64                    249.45
128                   426.67
256                   833.77
512                  1375.08
1024                 2621.38
2048                 4418.90
4096                 6508.30
8192                 8913.39
16384                6366.60
32768                9471.91
65536               11311.48
131072              11447.47
262144               9319.38
524288               8520.66
1048576              9892.52
2097152             11650.68
4194304             11616.53

vs. patch applied:

# OSU MPI Bandwidth Test v4.4.1
# Size      Bandwidth (MB/s)
1                       4.01
2                       8.39
4                      17.08
8                      32.88
16                     65.31
32                    119.55
64                    243.24
128                   434.33
256                   842.77
512                  1476.09
1024                 2668.46
2048                 4490.37
4096                 6660.48
8192                 9070.76
16384                9051.34
32768               10908.53
65536               11238.86
131072              11409.24
262144              11495.63
524288              11539.83
1048576             11558.65
2097152             11546.61
4194304             11554.48

I think the final fix should be aligned to other workarounds necessary in the Open MPI code of which I'm not aware.

rolfv · 2015-10-08T18:20:06Z

This has been fixed in master, 2.0.0, and 10.0.1 releases. Therefore, closing this bug.

Topic/patch pml ob1 isend

Prevent GCC optimization on shared variables

ee3c471

The *_invoked variables are used in different code paths the compiler does not know about.

jsquyres mentioned this pull request Oct 6, 2015

Need to add volatile to some globals to workaround gcc bug #988

Closed

rolfv closed this Oct 8, 2015

jsquyres added a commit to jsquyres/ompi that referenced this pull request Nov 10, 2015

Merge pull request open-mpi#625 from hppritcha/topic/patch_pml_ob1_isend

5fabe48

Topic/patch pml ob1 isend

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent GCC optimization on shared variables #625

Prevent GCC optimization on shared variables #625

oere commented Jun 4, 2015

mellanox-github commented Jun 4, 2015

mellanox-github commented Jun 4, 2015

mellanox-github commented Jun 4, 2015

mellanox-github commented Jun 4, 2015

rolfv commented Jun 4, 2015

hppritcha commented Jun 4, 2015

hppritcha commented Jun 4, 2015

lanl-ompi commented Jun 4, 2015

lanl-ompi commented Jun 4, 2015

mellanox-github commented Jun 4, 2015

mellanox-github commented Jun 4, 2015

jsquyres commented Jun 8, 2015

jsquyres commented Jun 8, 2015

ggouaillardet commented Jun 9, 2015

goodell commented Jun 9, 2015

jsquyres commented Jun 10, 2015

ggouaillardet commented Jun 10, 2015

hppritcha commented Jun 16, 2015

hppritcha commented Jun 16, 2015

hppritcha commented Jun 16, 2015

jsquyres commented Aug 7, 2015

oere commented Aug 11, 2015

rolfv commented Oct 8, 2015

Prevent GCC optimization on shared variables #625

Prevent GCC optimization on shared variables #625

Conversation

oere commented Jun 4, 2015

mellanox-github commented Jun 4, 2015

mellanox-github commented Jun 4, 2015

mellanox-github commented Jun 4, 2015

mellanox-github commented Jun 4, 2015

rolfv commented Jun 4, 2015

hppritcha commented Jun 4, 2015

hppritcha commented Jun 4, 2015

lanl-ompi commented Jun 4, 2015

lanl-ompi commented Jun 4, 2015

mellanox-github commented Jun 4, 2015

mellanox-github commented Jun 4, 2015

jsquyres commented Jun 8, 2015

jsquyres commented Jun 8, 2015

ggouaillardet commented Jun 9, 2015

goodell commented Jun 9, 2015

jsquyres commented Jun 10, 2015

ggouaillardet commented Jun 10, 2015

hppritcha commented Jun 16, 2015

hppritcha commented Jun 16, 2015

hppritcha commented Jun 16, 2015

jsquyres commented Aug 7, 2015

oere commented Aug 11, 2015

rolfv commented Oct 8, 2015