Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent GCC optimization on shared variables #625

Closed
wants to merge 1 commit into from

Conversation

oere
Copy link

@oere oere commented Jun 4, 2015

The *_invoked variables are used in different code paths the compiler does not know about.
For detailed description see users mailing list “Bug: Disabled mpi_leave_pinned for GPUDirect and InfiniBand during run-time caused by GCC optimizations” http://www.open-mpi.org/community/lists/users/2015/06/27039.php.

The *_invoked variables are used in different code paths the compiler does not know about.
@mellanox-github
Copy link

Build started, sha1 is merged

@mellanox-github
Copy link

@mellanox-github
Copy link

Build finished.

@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/592/

@rolfv
Copy link
Contributor

rolfv commented Jun 4, 2015

I also ran with GPU Direct RDMA and set --mca mpi_leave_pinned 0 and saw the dreadful results. I will work on disabling GPU Direct RDMA along with printing a warning if we detect mpi_leave_pinned of 0.

@hppritcha
Copy link
Member

Thanks very much for the patch, but it seems like a workaround rather than a solution to the problem.

@hppritcha
Copy link
Member

bot:retest

@lanl-ompi
Copy link
Contributor

Refer to this link for build results (access rights to CI server needed):
http://jenkins.open-mpi.org/job/ompi_master_pr_distcheck/9/
Test PASSed.

@lanl-ompi
Copy link
Contributor

Refer to this link for build results (access rights to CI server needed):
http://jenkins.open-mpi.org/job/ompi_master_pr_cle5.2up02/98/
Test PASSed.

@mellanox-github
Copy link

@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/593/

@jsquyres
Copy link
Member

jsquyres commented Jun 8, 2015

@oere Per discussion on the mailing list, I have 2 followup questions:

  1. Is there a difference between making the individual members be volatile vs. making the entire instance volatile?
  2. Can you put add a huge comment in the code about how this is a workaround that shouldn't be necessary but seems to be required on certain platforms, a URL to this PR and/or the mailing list thread, yadda yadda yadda...? Specifically: when we look back at this code in 6+ months -- when this PR and it has completely cached out of our brains -- it will be helpful to have a cache-refresh right there in a comment in the code.

Thanks!

@jsquyres
Copy link
Member

jsquyres commented Jun 8, 2015

@oere One more question... I'm having difficulty reproducing this behavior with gcc 4.9.2 on RHEL 6.5. I've made a trivial git repo with some code to try to reproduce the error. Are you able to reproduce the behavior with it? See https://github.com/jsquyres/gcc-posix-memalign-side-effects

@ggouaillardet
Copy link
Contributor

@jsquyres here is my reproducer (recompiled gcc 4.9.2 on RHEL 7)
(sent to gcc-bugs (at) gcc.gnu.org https://gcc.gnu.org/ml/gcc-bugs/2015-06/msg00757.html)

for the time being, a "softer" approach could be to update ptmalloc2.c only

int * volatile memalign_invoked = &mca_memory_linux_component.memalign_invoked;

and replace

mca_memory_linux_component.memalign_invoked```

with

*memalign_invoked
#include <stdlib.h>
#include <malloc.h>

int global;

void *hook (size_t alignment, size_t size, const void *caller);


void *hook (size_t alignment, size_t size, const void *caller) {
    global = 1;
}

int main (int argc, char *argv) {
    void * c;
    global = 0;
    printf ("global = %d\n", global);
    __memalign_hook = hook;
    if (0 == global) {
        posix_memalign(&c, 0x1000, 1);
        if (0 != global)
            printf ("changed !\n");
        printf ("global = %d\n", global);
    }
    return 0;
}

@goodell
Copy link
Member

goodell commented Jun 9, 2015

I find the "volatile pointer to integer" change to be a bit odd, but I'm not sure that I care very much.

What I do want to point out is that

int global;

should be

int global = 0;

or

static int global;

otherwise you'll create a new common symbol 😞

Too bad that the followup response you received was nearly useless in terms of clarifying our questions.

@jsquyres
Copy link
Member

@ggouaillardet Yeah, the followup from Andrew Pinski was pretty useless. 😞 Can you ping them again to try to get more info?

What do you think of the "softer" approach is what I suggested in #625 (comment) ?

@ggouaillardet
Copy link
Contributor

i pinged the gcc folks again

both approaches are fine.
mine is "even softer" since the mca_memory_linux_component.memalign_invoked and friends are considered as volatile only in opal_memory_linux_ptmalloc2_open
that being said, this might not be the most readable/maintanable one ...

@hppritcha
Copy link
Member

bot:retest

2 similar comments
@hppritcha
Copy link
Member

bot:retest

@hppritcha
Copy link
Member

bot:retest

@jsquyres
Copy link
Member

jsquyres commented Aug 7, 2015

Did we ever come to consensus on this one -- i.e., is the approach in this PR the one we want to go with, or the one I proposed, or the one @ggouaillardet proposed? At a minimum, I'd like to see a big comment in the code explaining the issue, because it's subtle and we're likely to forget the details over time (I already have!).

@oere
Copy link
Author

oere commented Aug 11, 2015

It is still an issue of course. Recently, I tested it with GCC 5.2.0 and Open MPI 1.8.8 on ConnectX-4 EDR:

# OSU MPI Bandwidth Test v4.4.1
# Size      Bandwidth (MB/s)
1                       4.19
2                       8.29
4                      16.48
8                      33.35
16                     66.62
32                    126.51
64                    249.45
128                   426.67
256                   833.77
512                  1375.08
1024                 2621.38
2048                 4418.90
4096                 6508.30
8192                 8913.39
16384                6366.60
32768                9471.91
65536               11311.48
131072              11447.47
262144               9319.38
524288               8520.66
1048576              9892.52
2097152             11650.68
4194304             11616.53

vs. patch applied:

# OSU MPI Bandwidth Test v4.4.1
# Size      Bandwidth (MB/s)
1                       4.01
2                       8.39
4                      17.08
8                      32.88
16                     65.31
32                    119.55
64                    243.24
128                   434.33
256                   842.77
512                  1476.09
1024                 2668.46
2048                 4490.37
4096                 6660.48
8192                 9070.76
16384                9051.34
32768               10908.53
65536               11238.86
131072              11409.24
262144              11495.63
524288              11539.83
1048576             11558.65
2097152             11546.61
4194304             11554.48

I think the final fix should be aligned to other workarounds necessary in the Open MPI code of which I'm not aware.

@rolfv
Copy link
Contributor

rolfv commented Oct 8, 2015

This has been fixed in master, 2.0.0, and 10.0.1 releases. Therefore, closing this bug.

@rolfv rolfv closed this Oct 8, 2015
jsquyres added a commit to jsquyres/ompi that referenced this pull request Nov 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants