uv_dlopen fails even though it has a valid handle #462

Closed
joeferner opened this Issue Jun 11, 2012 · 9 comments

Comments

Projects
None yet
2 participants
@joeferner

When using node v0.7.9 on OSX built from source when I load my module it fails. When using node v0.7.8 on OSX built from source with the same javascript source it succeeds.

var java = require('java');
module.js:480
  process.dlopen(filename, module.exports);
          ^
Error: dlsym(RTLD_DEFAULT, JLI_MemAlloc): symbol not found
    at Object..node (module.js:480:11)
    at Module.load (module.js:356:32)
    at Function._load (module.js:312:12)
    at Module.require (module.js:362:17)
    at require (module.js:378:17)
    at Object.<anonymous> (/Users/admin/projects/rhesus-scrapp/node_modules/java/lib/nodeJavaBridge.js:5:16)
    at Module._compile (module.js:449:26)
    at Object..js (module.js:467:10)
    at Module.load (module.js:356:32)
    at Function._load (module.js:312:12)

After running git bisect I tracked it down to a commit made to node that upgraded libuv 039fac633eaf081a96bb1ed8bc7307d03dcbe9d9.

I then played with the code a bit and found that dlopen is returning a valid handle (i.e. lib->handle != NULL) yet dlerror is returning that an error occurred.

@bnoordhuis

This comment has been minimized.

Show comment Hide comment
@bnoordhuis

bnoordhuis Jun 11, 2012

Contributor

Good catch, seems like it's a genuine error after all. Does this patch fix it for you?

diff --git a/src/unix/dl.c b/src/unix/dl.c
index 9eb6600..01796e3 100644
--- a/src/unix/dl.c
+++ b/src/unix/dl.c
@@ -31,6 +31,7 @@ static int uv__dlerror(uv_lib_t* lib);


 int uv_dlopen(const char* filename, uv_lib_t* lib) {
+  dlerror(); /* Reset error status. */
   lib->errmsg = NULL;
   lib->handle = dlopen(filename, RTLD_LAZY);
   return uv__dlerror(lib);
Contributor

bnoordhuis commented Jun 11, 2012

Good catch, seems like it's a genuine error after all. Does this patch fix it for you?

diff --git a/src/unix/dl.c b/src/unix/dl.c
index 9eb6600..01796e3 100644
--- a/src/unix/dl.c
+++ b/src/unix/dl.c
@@ -31,6 +31,7 @@ static int uv__dlerror(uv_lib_t* lib);


 int uv_dlopen(const char* filename, uv_lib_t* lib) {
+  dlerror(); /* Reset error status. */
   lib->errmsg = NULL;
   lib->handle = dlopen(filename, RTLD_LAZY);
   return uv__dlerror(lib);
@joeferner

This comment has been minimized.

Show comment Hide comment
@joeferner

joeferner Jun 12, 2012

At first that's what I thought it was, doesn't seem like resetting the error using dlerror is enough for me. I guess it couldn't hurt though. It could be that the library being loaded (java in my case) is calling dlopen itself and is doing something else with it internally, not really sure.

At first that's what I thought it was, doesn't seem like resetting the error using dlerror is enough for me. I guess it couldn't hurt though. It could be that the library being loaded (java in my case) is calling dlopen itself and is doing something else with it internally, not really sure.

@bnoordhuis

This comment has been minimized.

Show comment Hide comment
@bnoordhuis

bnoordhuis Jun 12, 2012

Contributor

Are you sure it doesn't fix the problem? I was able to reproduce an issue very similar to the one you reported in joyent/node#3392 and 84f0d96 fixed it for me.

Contributor

bnoordhuis commented Jun 12, 2012

Are you sure it doesn't fix the problem? I was able to reproduce an issue very similar to the one you reported in joyent/node#3392 and 84f0d96 fixed it for me.

@joeferner

This comment has been minimized.

Show comment Hide comment
@joeferner

joeferner Jun 13, 2012

Unfortunately I still get the error on OSX when I include that line. On linux it works fine. Checking the result of dlopen seems to be the only reliable way that I can find so far.

Unfortunately I still get the error on OSX when I include that line. On linux it works fine. Checking the result of dlopen seems to be the only reliable way that I can find so far.

@bnoordhuis

This comment has been minimized.

Show comment Hide comment
@bnoordhuis

bnoordhuis Jun 13, 2012

Contributor

I think I know what happens. Am I right when I say that you always get an error when you change RTLD_LAZY to RTLD_NOW?

EDIT: With and without your patch, I mean.

Contributor

bnoordhuis commented Jun 13, 2012

I think I know what happens. Am I right when I say that you always get an error when you change RTLD_LAZY to RTLD_NOW?

EDIT: With and without your patch, I mean.

@joeferner

This comment has been minimized.

Show comment Hide comment
@joeferner

joeferner Jun 26, 2012

If I change RTLD_LAZY to RTLD_NOW I get the same behavior. It requires adding the check for a non-NULL return to get the native module to load.

I also tried the various permutation with node v0.8 and see the same behavior.

If I change RTLD_LAZY to RTLD_NOW I get the same behavior. It requires adding the check for a non-NULL return to get the native module to load.

I also tried the various permutation with node v0.8 and see the same behavior.

@bnoordhuis

This comment has been minimized.

Show comment Hide comment
@bnoordhuis

bnoordhuis Jun 26, 2012

Contributor

The only plausible explanation I can come with up is that dlopen() on OS X sometimes clobbers the dlerror errno, even on success. I haven't been able to reproduce that but maybe it happens when a constructor in lib A (fails to) dlopen another library. Pure speculation, though.

Does this patch address the issue?

diff --git a/src/unix/dl.c b/src/unix/dl.c
index 01796e3..9cc830b 100644
--- a/src/unix/dl.c
+++ b/src/unix/dl.c
@@ -34,7 +34,7 @@ int uv_dlopen(const char* filename, uv_lib_t* lib) {
   dlerror(); /* Reset error status. */
   lib->errmsg = NULL;
   lib->handle = dlopen(filename, RTLD_LAZY);
-  return uv__dlerror(lib);
+  return lib->handle ? 0 : uv__dlerror(lib);
 }

Contributor

bnoordhuis commented Jun 26, 2012

The only plausible explanation I can come with up is that dlopen() on OS X sometimes clobbers the dlerror errno, even on success. I haven't been able to reproduce that but maybe it happens when a constructor in lib A (fails to) dlopen another library. Pure speculation, though.

Does this patch address the issue?

diff --git a/src/unix/dl.c b/src/unix/dl.c
index 01796e3..9cc830b 100644
--- a/src/unix/dl.c
+++ b/src/unix/dl.c
@@ -34,7 +34,7 @@ int uv_dlopen(const char* filename, uv_lib_t* lib) {
   dlerror(); /* Reset error status. */
   lib->errmsg = NULL;
   lib->handle = dlopen(filename, RTLD_LAZY);
-  return uv__dlerror(lib);
+  return lib->handle ? 0 : uv__dlerror(lib);
 }

@joeferner

This comment has been minimized.

Show comment Hide comment
@joeferner

joeferner Jun 27, 2012

Yes that is correct that patch fixes the issue. That's essentially what my pull request does but also cleans up and resets any possible errors in the lib object, similar to what dlerror does.

Yes that is correct that patch fixes the issue. That's essentially what my pull request does but also cleans up and resets any possible errors in the lib object, similar to what dlerror does.

@bnoordhuis

This comment has been minimized.

Show comment Hide comment
@bnoordhuis

bnoordhuis Jun 27, 2012

Contributor

@joeferner Thanks for testing, Joe. Fixed in 1b68434.

Contributor

bnoordhuis commented Jun 27, 2012

@joeferner Thanks for testing, Joe. Fixed in 1b68434.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment