Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add dlopen RTLD_DEEPBIND flag #849

Merged
merged 4 commits into from Oct 13, 2016

Conversation

Projects
None yet
4 participants
@garlick
Copy link
Member

garlick commented Oct 12, 2016

This is an attempt to solve flux-framework/flux-sched#211

Not sure if this is the right approach, discussion continues in the issue.

@garlick garlick added the review label Oct 12, 2016

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Oct 12, 2016

Probably shouldn't have posted this in fact. I'm getting some test failures, e.g.

jimbo:~/proj/flux-core/src/broker > ./test_shutdown.t
1..11
ok 1 - shutdown_encode works
ok 2 - shutdown_decode works
not ok 3 - opened loop connector
#   Failed test 'opened loop connector'
#   at test/shutdown.c line 59.
Bail out!  can't continue without loop handle
@coveralls

This comment has been minimized.

Copy link

coveralls commented Oct 13, 2016

Coverage Status

Coverage decreased (-1.8%) to 73.359% when pulling c1c8bb5 on garlick:deepbind into 286b6e7 on flux-framework:master.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Oct 13, 2016

Does this need to be rebased on current master?

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Oct 13, 2016

Yes it does. Wow, what was I doing yesterday.

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Oct 13, 2016

Current coverage is 71.48% (diff: 80.00%)

Merging #849 into master will decrease coverage by 0.02%

@@             master       #849   diff @@
==========================================
  Files           157        157          
  Lines         26688      26683     -5   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits          19086      19075    -11   
- Misses         7602       7608     +6   
  Partials          0          0          
Diff Coverage File Path
••••••• 71% src/modules/live/live.c
•••••••••• 100% src/broker/module.c
•••••••••• 100% src/common/libflux/module.c
•••••••••• 100% src/common/libflux/handle.c

Powered by Codecov. Last update c65b665...220b5c3

@garlick garlick force-pushed the garlick:deepbind branch from c1c8bb5 to 24f1c4c Oct 13, 2016

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Oct 13, 2016

The test failure above was actually due to my running the broker unit tests from within a running instance that had since been removed. The test would try to load the loop module from the other instance's lib directory.

I rebased and fixed a problem in the "live" module that cropped up. This is passing tests on my desktop. Let's see how it does in travis.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Oct 13, 2016

Coverage Status

Coverage increased (+0.008%) to 75.12% when pulling 24f1c4c on garlick:deepbind into 57e3a91 on flux-framework:master.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Oct 13, 2016

This works in my environment as well. Nice work, and sorry about the optparse shortcoming!

This also allows most flux-sched tests to pass after removing .la files after install and setting LD_LIBRARY_PATH appropriately (I wonder if we should do this in the flux command driver, like we do for PATH?)

I'm now only getting a failure here:

expecting success: 
    diff -u ${expected_order} ./actual

--- /home/ubuntu/flux-sched/t/data/emulator-data/easy_expected  2016-08-12 17:05:26.000000000 +0000
+++ ./actual    2016-10-13 17:19:05.533150759 +0000
@@ -1,12 +1,12 @@
 1
 2
 3
+4
+5
 6
 7
 8
 9
 10
 11
-4
-5
 12
not ok 5 - jobs scheduled in correct order
#
#           diff -u ${expected_order} ./actual
#

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Oct 13, 2016

Just submitted flux-framework/flux-sched#212 which adds RTLD_DEEPBIND to sched's module loading code. I wonder if that might have something to do with the above failure?

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Oct 13, 2016

Just submitted flux-framework/flux-sched#212 which adds RTLD_DEEPBIND to sched's module loading code. I wonder if that might have something to do with the above failure?

I tried again using your flux-sched deepbind branch and still get the same failure.

@garlick garlick force-pushed the garlick:deepbind branch from 24f1c4c to 85d7fd6 Oct 13, 2016

garlick added some commits Oct 12, 2016

broker: add dlopen RTLD_DEEPBIND for modules
Problem: comms modules that use json-c may bind with
jansson functions of the same name from the broker's
global symbol table, resulting in undefined behavior.

Add the RTLD_DEEPBIND flag to the dlopen call used
to load comms modules.
libflux/module: add dlopen RTLD_DEEPBIND flag.
Problem: comms modules that use json-c may bind with
jansson functions of the same name from flux-module's
global symbol table, resulting in undefined behavior.

Add the RTLD_DEEPBIND flag to the dlopen call used
by flux-module to load comms modules.
libflux/handle: add dlopen RTLD_DEEPBIND flag
Problem: connector modules that use json-c may bind with
jansson functions of the same name in an executable's
global symbol table, resulting in undefined behavior.

Add the RTLD_DEEPBIND flag to the dlopen call used
to load connector modules.
modules/live: avoid using optparse for module opts
When experimenting with dlopen flags for modules, for
some reason t0010-generic-utils.t began failing due
to an option parsing error in the 'live' module.

live was using optparse to parse its arguments, but
optparse, which uses getopt(), is not MT-safe.  Switch
the module over to the simplistic key=val argument
parsing idiom used in other modules and update the test.
@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Oct 13, 2016

Rebased and added post-install hook to remove .la files. Just need to find where to place LD_LIBRARY_PATH setting in sched test scripts, and see what's up with the above easy_exected test failure.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Oct 13, 2016

Just pushed a commit to flux-framework/flux-sched#212 which sets LD_LIBRARY_PATH in .travis.yml is that a reasonable place to do it?

@coveralls

This comment has been minimized.

Copy link

coveralls commented Oct 13, 2016

Coverage Status

Coverage increased (+0.008%) to 75.147% when pulling 85d7fd6 on garlick:deepbind into c65b665 on flux-framework:master.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Oct 13, 2016

Just pushed a commit to flux-framework/flux-sched#212 which sets LD_LIBRARY_PATH in .travis.yml is that a reasonable place to do it?

Fine with me!

@garlick garlick force-pushed the garlick:deepbind branch from 85d7fd6 to 220b5c3 Oct 13, 2016

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Oct 13, 2016

OK, I backed that last commit out since it broke distcheck.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Oct 13, 2016

Coverage Status

Coverage decreased (-0.03%) to 75.109% when pulling 220b5c3 on garlick:deepbind into c65b665 on flux-framework:master.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Oct 13, 2016

I think with sched's travis.yml setting LD_LIBRARY_PATH and removing the installed flux-core .la files, we are OK, modulo the failing test. Want to merge this and sort out the test separately?

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Oct 13, 2016

Ok, looks good now.

@grondo grondo merged commit 90f1cb3 into flux-framework:master Oct 13, 2016

4 checks passed

codecov/patch 80.00% of diff hit (target 71.51%)
Details
codecov/project Absolute coverage decreased by -0.02% but relative coverage increased by +8.48% compared to c65b665
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage decreased (-0.03%) to 75.109%
Details

@grondo grondo removed the review label Oct 13, 2016

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Oct 13, 2016

Thanks!

@garlick garlick referenced this pull request Oct 26, 2016

Closed

0.5.0 release notes #879

@garlick garlick deleted the garlick:deepbind branch Oct 26, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.