Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing dlopen() the MPI shared library with RTLD_LOCAL #3705

Closed
dalcinl opened this issue Jun 15, 2017 · 28 comments

Comments

Projects
None yet
7 participants
@dalcinl
Copy link

commented Jun 15, 2017

The lack of support for dlopen()ing the MPI shared library within a local namespace has been a recurrent issue for implementors of MPI bindings in dynamic languages like R, Julia, Python. Even the Java JNI bindings you guys distribute have to resort to the hack of re-dlopening the library with RTLD_GLOBAL before the invocation of MPI_Init(). I complained about this ages ago, it was never fixed, and eventually I rolled my own re-delopen hack for mpi4py. However, I really HATE it because is quite fragile. For example, macOS changed at some point the rules. Also, in Linux distros, if the user does not install the openmpi-devel package, the symlink libmpi.so -> libmpi.so.<version> is not available, then the code implementing the hackery has to be updated from time to time every time the library version is bumped. BTW, you Java JNI bindings are broken because of this, you should not dlopen("libmpi.so",...), you have to dlopen("libmpi.so.20",...) in Linux to not depend on the openmpi-devel package at runtime.

As a reproducer, you have this piece of Python code.

import ctypes
libdir = "/home/devel/mpi/openmpi/2.1.1/lib/"

lib = ctypes.CDLL(libdir+"libmpi.so", ctypes.RTLD_LOCAL)
ierr = lib.MPI_Init(None,None)
assert ierr==0

r = ctypes.create_string_buffer(4096)
n = ctypes.c_int()
ierr = lib.MPI_Get_library_version(r, ctypes.byref(n))
assert ierr==0
print(r[0:n.value])

ierr = lib.MPI_Finalize()
assert ierr==0

Running it of course fails:

$ python ompi-dlopen.py 
[kw14821:30046] mca_base_component_repository_open: unable to open mca_patcher_overwrite: /home/devel/mpi/openmpi/2.1.1/lib/openmpi/mca_patcher_overwrite.so: undefined symbol: mca_patcher_base_patch_t_class (ignored)
... lots of output ...
[kw14821:30046] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

The root problem is that all the *.so files in $prefix/lib/openmpi/ do not explicitly link to the MPI library. The following script hot-fix the issues using patchelf (version 0.9 required):

#!/bin/sh
prefix="/home/devel/mpi/openmpi/2.1.1"
for filename in $(ls $prefix/lib/openmpi/*.so); do
    patchelf --add-needed libmpi.so.20 $filename
    patchelf --set-rpath "\$ORIGIN/.." $filename
done

Please note the special rpath I'm adding $ORIGIN\.., for macOS it should be@loader_path/.. and the script should be based in install_name_tool.

After running the shell script above and hot-fixing the plugins, the Python script runs just fine:

$ ./ompi-fix-libs.sh 
$ python ompi-dlopen.py 
Open MPI v2.1.1, package: Open MPI dalcinl@kw14821 Distribution, ident: 2.1.1, repo rev: v2.1.0-100-ga2fdb5b, May 10, 2017

In short, I think fixing the issue once and for all (at least for Linux, macOS, and Solaris) is to link the plugins in $prefix/lib/openmpi with -L$BUILDDIR/lib -lmpi -Wl,-rpath,\$ORIGIN/.. in Linux/Solaris and -L$BUILDDIR/lib -lmpi -Wl,-rpath,@loader_path/.. in macOS.

Unfortunately, I cannot offer a patch, I'm not an expert on autotools and I have no idea how to implement these changes. However, I can offer my help to review changes and test them in Linux and macOS.

@gpaulsen

This comment has been minimized.

Copy link
Contributor

commented Jun 19, 2017

It looks like mca_patcher_overwrite.so is really looking for libopen-pal, so perhaps we should add the dependency to THAT rather than libmpi which has the sideeffect of loading libopen-pal.

@dalcinl

This comment has been minimized.

Copy link
Author

commented Jun 20, 2017

Well, yes, of course. Ideally, each plugin should link to the exact libs they use. But IIUC, some other components do depend on libmpi. As I don'k now well the plugin dependencies, I abused a little and linked libmpi to all plugins. This is not the cleanest way of doing it, but certainly the easiest to maintain as start add new plugins. Please note the output I pasted is just the fist line. Fixing just mca_patcher_overwrite.so will not solve the problems, you will get an error later, at the point some other plugin is loaded.

If you guys decide to fix this issue in any way (either linking libmpi to all plugins, or linking the exact lib each plugin depends on), I can contribute a pure Python script with no external dependencies to be added to your test suite to prevent from any regressions in the future.

@rhc54

This comment has been minimized.

Copy link
Member

commented Jun 27, 2017

@dalcinl I have added this to our discussion topics for our July developer's conference. We seem to recall there was a reason we didn't do the linkage, but maybe we can provide a configure flag to make your life easier.

@dalcinl

This comment has been minimized.

Copy link
Author

commented Jun 29, 2017

@rhc54 This is not about making my life easier, but the life of all users that typically have a pre-installed Open MPI they don't have control on. If the configure flag is not on by default, installers of Open MPI will likely miss to turn on the flag, and this issue will continue harming end users ad eternum.

In general, I see two possible ways to fix things:

  1. Modify every plugin to call dlopen("libname.so.X", RTLD_LOCAL), where libname.so.X is the specific library the plugin depends on (libmpi.so.X or libopen-pal.so.X, etc). These dlopen() calls should be issued in the init routine for every plugin (which I guess you already have, right?). You should also close the library in all plugin's finalize() routine. This is perhaps the most clean solution, though it looks harder to implement as you need to add these calls in all the plugins. Also, you would be hardwiring the X version of libname.so.X in the library code, which is perhaps undesirable.

  2. Through the linker, as I explained above, adding libmpi.so.X as a needed library, but with an RPATH entry $ORIGIN/.., so it is relative to the plugin install directory <prefix>/lib/openmpi. This approach is arguably easy to automate and would introduce little maintenance burden.

I would really focus on option (2) and try to understand under which scenarios it would break. Moreover, experienced users and sysadmins have a chance to hotfix things by using ELF or Mach-O binary editing tools like patchelf or install_name_tool, just as I showed above.

@rhc54

This comment has been minimized.

Copy link
Member

commented Jun 29, 2017

I believe you misunderstood me. I was not minimizing your concern, but only echoing a conversation we had on the weekly telecon where I raised this issue. My point was that there was a reason for not doing the linkage, and so we have to be careful here that we don't break other things. Hence, I added it to the conference agenda to ensure we give adequate consideration to the problem before deciding on a path forward.

@jsquyres

This comment has been minimized.

Copy link
Member

commented Jun 29, 2017

I just did a little spelunking to dig into the history of this a bit (as @rhc54 noted, we talked about this on Tuesday at our weekly webex, where @bwbarrett and I felt sure that we made it so that components do not link against their respective project libraries a while ago -- so I dug into the past to find out why we did this). Here's what I found:

@rhc54

This comment has been minimized.

Copy link
Member

commented Jun 29, 2017

After working thru that page, I'm wondering if we can extract a combination that works. If we are in a Linux environment and are not building static, then it appears (if I read the tables correctly) that linking the components to their base library might work and resolve the issue. As I said, we can ponder this more at the conference.

@dalcinl

This comment has been minimized.

Copy link
Author

commented Jun 29, 2017

@jsquyres From your tables, in the second one, entry 15, result (I), is this "other linkers will not work" just speculation or you actually have a concrete example about a platform where things are broken? BTW, I'm almost sure this sould also work on macOS, though I did not actually tried (do you want me to try it and confirm?)

@jsquyres

This comment has been minimized.

Copy link
Member

commented Jun 29, 2017

@dalcinl Yeah, I noticed that same phrase again this morning when I was re-reading that table. I had two thoughts about it:

  1. Same question as you: did Past Jeff have specific systems in mind? I unfortunately do not remember. 😦
  2. What systems do we care about nowadays, and do they all work in case (I)? I.e., should we solve the problem just for the systems we care about, and just make sure it fails gracefully and/or has a workaround for other systems? That might be a possibility.
@jsquyres

This comment has been minimized.

Copy link
Member

commented Jul 13, 2017

We talked about this in detail yesterday at the face to face meeting in Chicago. The prevailing thought was: we don't remember the system(s) that were a problem back in Oct 2010. Doh!

We updated https://github.com/open-mpi/ompi/wiki/Linkers yesterday to say:

Per that thread, the problem that we fixed in 213b5d5 was that we were inconsistent about linking in libPROJECT to components (i.e., some did and some did not). We resolved the situation by making them all not link against libPROJECT. In that thread, Brian cited that there was some platform -- he unfortunately did not cite which platform -- did not support the components linking against libPROJECT. 😦

So the thought was that we should (re)write the test(s) that were used to generate the tables on that wiki page (e.g., main() calls a library function in a .so that dlopen's a DSO and then calls something in the DSO, and we do some checks to make sure that the shared library doesn't exist in the process memory more than once). Do this with and without linking the DSO to the .so. And then re-run this on all the platforms that we care about today, and see if it's a problem anywhere.

I uploaded a first cut of this test to ompi-tests/simple/dlopen. Prelim results:

  • RHEL 6 x86_86: works
  • RHEL 7 x86_64: works
  • MacOS 10.12.5 (i.e., my Mac): works
@gpaulsen

This comment has been minimized.

Copy link
Contributor

commented Aug 1, 2017

Wanted to figure out which platform caused problems in the past. Other platforms, please test and update.

@jsquyres

This comment has been minimized.

Copy link
Member

commented Aug 1, 2017

Per 1 Aug webex: I asked @siegmargross to run on his various flavors of Solaris/Linux on x86/SPARC. @bwbarrett will be testing on all the variety of platforms available at Amazon.

@bwbarrett

This comment has been minimized.

Copy link
Member

commented Aug 1, 2017

I tested on Amazon Linux 17.03 (newish kernel), FreeBSD 11, and NetBSD 7.0, and all three worked.

@jsquyres

This comment has been minimized.

Copy link
Member

commented Aug 1, 2017

@shamisp confirms it works on two different Linux/ARM platforms.

@gpaulsen

This comment has been minimized.

Copy link
Contributor

commented Aug 1, 2017

Works on ppc64le on Red Hat Enterprise 7.3
glibc-common-2.17-157.el7.ppc64le
xlc 13.1
Linux kernel 3.10.0

@kawashima-fj

This comment has been minimized.

Copy link
Member

commented Aug 2, 2017

Works on

  • SPARC64 XIfx + Linux 2.6.32 + glibc 2.12 + gcc 4.4.7 / Fujitsu compiler (Fujitsu PRIMEHPC FX100)
  • Thunder X (AArch64) + Linux 4.5.0 + glibc 2.17 + gcc 4.8.5 (CentOS 7.2)
@jsquyres

This comment has been minimized.

Copy link
Member

commented Aug 2, 2017

@siegmargross confirms:

I was lucky and could get access to an old SPARC machine.

loki dlopen_test-1.0.0 123 ./main
main: var: addr: 0x601040, int value: 3
libfoo: var: addr: 0x601040, int value: 3
dso_foo: var: addr: 0x601040, int value: 3

loki dlopen_test-1.0.0 124 uname -a
Linux loki 4.4.74-92.29-default #1 SMP Thu Jun 29 13:06:32 UTC 2017 (561ddb1) x86_64 x86_64 x86_64 GNU/Linux

gcc-7.1.0
# ./main
main: var: addr: 20c64, int value: 3
libfoo: var: addr: 20c64, int value: 3
dso_foo: var: addr: 20c64, int value: 3

# uname -a
SunOS osz.informatik.hs-fulda.de 5.10 Generic_147440-05 sun4u sparc
SUNW,Sun-Fire-V240

gcc-3.4.2
@jsquyres

This comment has been minimized.

Copy link
Member

commented Aug 2, 2017

In off-issue discussion (i.e., email), @kawashima-fj mentions the following is necessary to make it work with their compilers (I just want to capture all this data in one place):

Yes, you need to add a CLI option on configure for the Fujitsu compiler. But this is an issue of Libtool and Fujitsu compiler. Not an issue of Open MPI itself.

By default, Fujitsu compiler does not accept -fPIC option. Instead, use -KPIC option. IIRC, Libtool tries to detect an appropriate option and pass it to a compiler to generate position independent code. But Libtool does not know Fujitsu compiler. So -KPIC option is not passed to Fujitsu compiler and non-PIC code is generated.

Fujitsu compiler has -Xg option, which accepts GCC CLI options and GCC language extensions. So manually passing -KPIC or -Xg resolves the problem. Most Fujitsu users use -Xg option to compile open source softwares (CC="fcc -Xg"). And this GCC compatibility will be enabled by default in the future Fujitsu compiler.

For the moment, it may be sufficient to document this somewhere. @ggouaillardet didn't think it was worth adding additional logic into our configure to handle this case (but I think we could be open to that, perhaps at least until a fix is available upstream from Libtool).

@jsquyres

This comment has been minimized.

Copy link
Member

commented Aug 3, 2017

@hppritcha Points out that if we link the Java bindings against lib OMPI / ORTE / OPAL (i.e., whichever is necessary), the hinkyness of ompi/mpi/java/c/mpi_MPI.c dlopen()ing libmpi.so can probably go away.

@dalcinl

This comment has been minimized.

Copy link
Author

commented Aug 3, 2017

@jsquyres I mentioned Java when I opened this issue, and indeed the Java issues should be fixed. Linking libmpi.so should be enough.

@gpaulsen

This comment has been minimized.

Copy link
Contributor

commented Aug 18, 2017

How much work would it be for someone well versed in configure to add the -lmpi to the MPC components to make this "just work"? That's all that's needed correct?
I'm eager to test a PR if one is available.

@jjhursey jjhursey self-assigned this Aug 21, 2017

@jjhursey

This comment has been minimized.

Copy link
Member

commented Aug 21, 2017

I have a branch with some changes that is passing the posted test. Let me clean it up a bit this morning and I'll post a PR for review today. I might need help cleaning up the Java bindings (I just haven't looked at what needs to happen there yet).

@dalcinl

This comment has been minimized.

Copy link
Author

commented Aug 21, 2017

@jjhursey I think you just need to fix JNI_OnLoad()/JNI_OnUnload() to remove dlopen()/dlclose() calls and related code.

Can you point me to your branch? I would like to give it a try with mpi4py.

jjhursey added a commit to jjhursey/ompi that referenced this issue Aug 21, 2017

mca: Dynamic components link against project lib
 * Resolves open-mpi#3705
 * Components should link against the project level library to better
   support `dlopen` with `RTLD_LOCAL`.
 * Different project levels link to different sets of libraries by
   using the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am`.

```
MCA components in ompi/
       $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \
       $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la \
       $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
MCA components in orte/
       $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la \
       $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
MCA components in opal/
       $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
```

jjhursey added a commit to jjhursey/ompi that referenced this issue Aug 21, 2017

mca: Dynamic components link against project lib
 * Resolves open-mpi#3705
 * Components should link against the project level library to better
   support `dlopen` with `RTLD_LOCAL`.
 * Different project levels link to different sets of libraries by
   using the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am`.
```
MCA components in ompi/
       $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \
       $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la \
       $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
MCA components in orte/
       $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la \
       $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
MCA components in opal/
       $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
```

Note: The changes in this commit were automated. Some components
were not included because they are staticly built only.

jjhursey added a commit to jjhursey/ompi that referenced this issue Aug 21, 2017

mca: Dynamic components link against project lib
 * Resolves open-mpi#3705
 * Components should link against the project level library to better
   support `dlopen` with `RTLD_LOCAL`.
 * Different project levels link to different sets of libraries by
   using the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am`.
```
MCA components in ompi/
       $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la \
       $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la \
       $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
MCA components in orte/
       $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la \
       $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
MCA components in opal/
       $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
```

Note: The changes in this commit were automated. Some components
were not included because they are staticly built only.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>

jjhursey added a commit to jjhursey/ompi that referenced this issue Aug 22, 2017

mca: Dynamic components link against project lib
 * Resolves open-mpi#3705
 * Components should link against the project level library to better
   support `dlopen` with `RTLD_LOCAL`.
 * Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am`
   with the appropriate project level library:
```
MCA components in ompi/
       $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la
MCA components in orte/
       $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la
MCA components in opal/
       $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
MCA components in oshmem/
       $(top_builddir)/oshmem/liboshmem.la"
```

Note: The changes in this commit were automated by the script in
the commit that proceeds it with the `libadd_mca_comp_update.py`
script. Some components were not included in this change because
they are statically built only.

jjhursey added a commit to jjhursey/ompi that referenced this issue Aug 22, 2017

mca: Dynamic components link against project lib
 * Resolves open-mpi#3705
 * Components should link against the project level library to better
   support `dlopen` with `RTLD_LOCAL`.
 * Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am`
   with the appropriate project level library:
```
MCA components in ompi/
       $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la
MCA components in orte/
       $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la
MCA components in opal/
       $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
MCA components in oshmem/
       $(top_builddir)/oshmem/liboshmem.la"
```

Note: The changes in this commit were automated by the script in
the commit that proceeds it with the `libadd_mca_comp_update.py`
script. Some components were not included in this change because
they are statically built only.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>

jjhursey added a commit to jjhursey/ompi that referenced this issue Aug 22, 2017

mca: Dynamic components link against project lib
 * Resolves open-mpi#3705
 * Components should link against the project level library to better
   support `dlopen` with `RTLD_LOCAL`.
 * Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am`
   with the appropriate project level library:
```
MCA components in ompi/
       $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la
MCA components in orte/
       $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la
MCA components in opal/
       $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
MCA components in oshmem/
       $(top_builddir)/oshmem/liboshmem.la"
```

Note: The changes in this commit were automated by the script in
the commit that proceeds it with the `libadd_mca_comp_update.py`
script. Some components were not included in this change because
they are statically built only.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>

jjhursey added a commit to jjhursey/ompi that referenced this issue Aug 22, 2017

mpi/java: Remove dlopen() workaround
 * See discussion on Issue open-mpi#3705 regarding why this is no longer needed.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>

jjhursey added a commit to jjhursey/ompi that referenced this issue Aug 22, 2017

mpi/java: Remove dlopen() workaround
 * See discussion on Issue open-mpi#3705 regarding why this is no longer needed.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>

jjhursey added a commit to jjhursey/ompi that referenced this issue Aug 22, 2017

mca: Dynamic components link against project lib
 * Resolves open-mpi#3705
 * Components should link against the project level library to better
   support `dlopen` with `RTLD_LOCAL`.
 * Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am`
   with the appropriate project level library:
```
MCA components in ompi/
       $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la
MCA components in orte/
       $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la
MCA components in opal/
       $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
MCA components in oshmem/
       $(top_builddir)/oshmem/liboshmem.la"
```

Note: The changes in this commit were automated by the script in
the commit that proceeds it with the `libadd_mca_comp_update.py`
script. Some components were not included in this change because
they are statically built only.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>

jjhursey added a commit to jjhursey/ompi that referenced this issue Aug 22, 2017

mpi/java: Remove dlopen() workaround
 * See discussion on Issue open-mpi#3705 regarding why this is no longer needed.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>

jjhursey added a commit to jjhursey/ompi that referenced this issue Aug 23, 2017

mca: Dynamic components link against project lib
 * Resolves open-mpi#3705
 * Components should link against the project level library to better
   support `dlopen` with `RTLD_LOCAL`.
 * Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am`
   with the appropriate project level library:
```
MCA components in ompi/
       $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la
MCA components in orte/
       $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la
MCA components in opal/
       $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
MCA components in oshmem/
       $(top_builddir)/oshmem/liboshmem.la"
```

Note: The changes in this commit were automated by the script in
the commit that proceeds it with the `libadd_mca_comp_update.py`
script. Some components were not included in this change because
they are statically built only.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>

jjhursey added a commit to jjhursey/ompi that referenced this issue Aug 23, 2017

mpi/java: Remove dlopen() workaround
 * See discussion on Issue open-mpi#3705 regarding why this is no longer needed.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>

jjhursey added a commit to jjhursey/ompi that referenced this issue Aug 23, 2017

mpi/java: Remove dlopen() workaround
 * See discussion on Issue open-mpi#3705 regarding why this is no longer needed.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>

jjhursey added a commit to jjhursey/ompi that referenced this issue Aug 24, 2017

mca: Dynamic components link against project lib
 * Resolves open-mpi#3705
 * Components should link against the project level library to better
   support `dlopen` with `RTLD_LOCAL`.
 * Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am`
   with the appropriate project level library:
```
MCA components in ompi/
       $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la
MCA components in orte/
       $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la
MCA components in opal/
       $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
MCA components in oshmem/
       $(top_builddir)/oshmem/liboshmem.la"
```

Note: The changes in this commit were automated by the script in
the commit that proceeds it with the `libadd_mca_comp_update.py`
script. Some components were not included in this change because
they are statically built only.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>

jjhursey added a commit to jjhursey/ompi that referenced this issue Aug 24, 2017

mpi/java: Remove dlopen() workaround
 * See discussion on Issue open-mpi#3705 regarding why this is no longer needed.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
@jjhursey

This comment has been minimized.

Copy link
Member

commented Aug 25, 2017

PR #4121 merged into master. Re-opening so we can consider it for v3.0 release

@jjhursey jjhursey reopened this Aug 25, 2017

jjhursey added a commit to jjhursey/ompi that referenced this issue Aug 25, 2017

mca: Dynamic components link against project lib
 * Resolves open-mpi#3705
 * Components should link against the project level library to better
   support `dlopen` with `RTLD_LOCAL`.
 * Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am`
   with the appropriate project level library:
```
MCA components in ompi/
       $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la
MCA components in orte/
       $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la
MCA components in opal/
       $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
MCA components in oshmem/
       $(top_builddir)/oshmem/liboshmem.la"
```

Note: The changes in this commit were automated by the script in
the commit that proceeds it with the `libadd_mca_comp_update.py`
script. Some components were not included in this change because
they are statically built only.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>

Local application of the LIBADD script:
`./contrib/libadd_mca_comp_update.py`
Reference `master` commit e1d0795

jjhursey added a commit to jjhursey/ompi that referenced this issue Aug 25, 2017

mpi/java: Remove dlopen() workaround
 * See discussion on Issue open-mpi#3705 regarding why this is no longer needed.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
(cherry picked from commit 49c40f0)
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>

jjhursey added a commit to jjhursey/ompi that referenced this issue Aug 25, 2017

mca: Dynamic components link against project lib
 * Resolves open-mpi#3705
 * Components should link against the project level library to better
   support `dlopen` with `RTLD_LOCAL`.
 * Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am`
   with the appropriate project level library:
```
MCA components in ompi/
       $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la
MCA components in orte/
       $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la
MCA components in opal/
       $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
MCA components in oshmem/
       $(top_builddir)/oshmem/liboshmem.la"
```

Note: The changes in this commit were automated by the script in
the commit that proceeds it with the `libadd_mca_comp_update.py`
script. Some components were not included in this change because
they are statically built only.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>

Local application of the LIBADD script:
`./contrib/libadd_mca_comp_update.py`
Reference `master` commit e1d0795

jjhursey added a commit to jjhursey/ompi that referenced this issue Aug 25, 2017

mpi/java: Remove dlopen() workaround
 * See discussion on Issue open-mpi#3705 regarding why this is no longer needed.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
(cherry picked from commit 49c40f0)
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
@jjhursey

This comment has been minimized.

Copy link
Member

commented Aug 25, 2017

@jsquyres @hppritcha Is this something that you would consider for the v2.x series? If so then I can create a PR there. Otherwise we can just work on getting it in the v3.0.x series.

@jsquyres

This comment has been minimized.

Copy link
Member

commented Aug 25, 2017

I think going with v3.0.x should be fine.

@jsquyres

This comment has been minimized.

Copy link
Member

commented Sep 19, 2017

This is now done in v3.0.x and forward.

@jsquyres jsquyres closed this Sep 19, 2017

nrnhines added a commit to neuronsimulator/nrn that referenced this issue Jan 11, 2019

ParallelContext.mpi_init() initializes MPI for NEURON
So -mpi not required for nrniv launch (but -nobanner useful)
Can use with python without NEURON_INIT_MPI enviornment variable.
Dynamic loading of libnrnmpi.so on linux does not need LD_LIBRARY_PATH

Note that on linux, openmpi prior to version 3 but not configured with
--with-paranrn=dynamic will produce an error with pc.mpi_init() when
python was launched that begins with:
mca_base_component_repository_open: unable to open mca_patcher_overwrite:
folowed by lots of output. Solution is to get openmpi 3 or above or
follow the patch instructions for openmpi at
open-mpi/ompi#3705
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.