Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add valgrind wrap specification for dlclose in sched/plugin.c #331

Merged
merged 3 commits into from May 3, 2018

Conversation

Projects
None yet
4 participants
@grondo
Copy link
Contributor

grondo commented May 1, 2018

@dongahn, can you try these two commits and see if they have any effect on the results you are seeing with t5000-valgrind.t?

IIUC, this should allow flux-sched valgrind test to work with even a flux-core that was not configured with valgrind/valgrind.h. Once sched.so is dlopened by flux-broker, the valgrind spec added to sched/plugin.c should become active and reroute future calls to dlclose() to the noop wrapped version.

You will have to apply these two commits and rerun ./autogen and ./configure. Then ensure that valgrind/valgrind.h was found with

$ grep VALGRIND config.h
#define HAVE_VALGRIND_VALGRIND_H 1

Then retry the valgrind tests and ensure there are no false positives/missing symbols.

Thanks!

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented May 1, 2018

Codecov Report

Merging #331 into master will decrease coverage by <.01%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #331      +/-   ##
==========================================
- Coverage   73.68%   73.67%   -0.01%     
==========================================
  Files          56       56              
  Lines        9811     9812       +1     
==========================================
  Hits         7229     7229              
- Misses       2582     2583       +1
Impacted Files Coverage Δ
sched/plugin.c 55.51% <0%> (-0.23%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a6c99a5...af2320b. Read the comment docs.

@coveralls

This comment has been minimized.

Copy link

coveralls commented May 1, 2018

Coverage Status

Coverage decreased (-0.008%) to 75.22% when pulling af2320b on grondo:valgrind-wrap-dlclose into a6c99a5 on flux-framework:master.

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented May 1, 2018

Thanks @grondo, i will take a look.

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented May 1, 2018

On quartz, module load doesn't seem to add the valgrind header directory.

quartz188{dahn}41: ml valgrind/3.13.0
quartz188{dahn}49: ./configure --prefix=/g/g0/dahn/workspace/planner_correction/inst

<CUT>

checking valgrind/valgrind.h usability... no
checking valgrind/valgrind.h presence... no

So as a sanity check, I did:

quartz188{dahn}50: ./configure CFLAGS="-I/usr/tce/packages/valgrind/valgrind-3.13.0/include" --prefix=/g/g0/dahn/workspace/planner_correction/inst

<CUT>

checking valgrind/valgrind.h usability... yes
checking valgrind/valgrind.h presence... no
configure: WARNING: valgrind/valgrind.h: accepted by the compiler, rejected by the preprocessor!
configure: WARNING: valgrind/valgrind.h: proceeding with the compiler's result

And valgrind test is okay now.

PASS: t5000-valgrind.t 1 - found executable flux-broker
PASS: t5000-valgrind.t 2 - valgrind reports no new errors on single broker run

But w/ the above method, some regression tests failed on quartz:

FAIL: t1007-exclude.t 1 - excluding a node with no job allocated or reserved works
FAIL: t1007-exclude.t 2 - including an excluded node works
FAIL: t1007-exclude.t 3 - excluding a node does not kill the jobs
FAIL: t1007-exclude.t 4 - -k option kills the jobs using the node
FAIL: t1007-exclude.t 5 - excluding a node with reservations works

This could be a side-effect of CFLAGS="-I/usr/tce/packages/valgrind/valgrind-3.13.0/include", though.

Maybe we should add --with-valgrind-header=<dir> option? I'm wondering if there is a valgrind m4 script out there that has this option + other options...

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented May 1, 2018

Does module load appropriately set PKG_CONFIG_PATH? I notice there is a pkg-config file for valgrind, so maybe we could just add a PKG_CHECK_MODULES stanza for valgrind.

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented May 1, 2018

It doesn't look like ml is setting it:

quartz188{dahn}68: echo $PKG_CONFIG_PATH
/g/g0/dahn/workspace/planner_correction/inst/lib/pkgconfig

I will manually set it to see if my build will like it.

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented May 1, 2018

Well we don't check for PKG_CHECK_MODULES (valgrind, ...) (yet), but if PKC_CONFIG_PATH was set properly then you could at least do CPPFLAGS=$(pkg-config --cflags valgrind)

I wonder how we open issue to get these packages fixed?

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented May 1, 2018

@lee218llnl: we see an issue with valgrind TCE package on quartz. How do we open an issue for TCE packages?

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented May 1, 2018

@grondo:

quartz188{dahn}174: pkg-config --cflags valgrind
-I/usr/tce/packages/valgrind/valgrind-3.13.0/include/valgrind

So I made mods like this:

--- a/configure.ac
+++ b/configure.ac
@@ -45,6 +45,8 @@ AC_CHECK_LIB([dl], [dlerror],
 PKG_CHECK_MODULES([HWLOC], [hwloc >= 1.11.1], [], [])
 PKG_CHECK_MODULES([JANSSON], [jansson >= 2.6], [], [])
 PKG_CHECK_MODULES([XML2], [libxml-2.0])
+PKG_CHECK_MODULES([VALGRIND], [valgrind])
+CPPFLAGS=$(pkg-config --cflags valgrind)
 AX_PROG_LUA([5.1],[5.3])
 AX_LUA_HEADERS
 AX_LUA_LIBS
@@ -73,7 +75,7 @@ AC_CHECK_HEADERS([\
   unistd.h \
   readline/readline.h \
   readline/history.h \
-  valgrind/valgrind.h \
+  valgrind.h \
 ]) 
--- a/sched/plugin.c
+++ b/sched/plugin.c
@@ -29,8 +29,8 @@
 #include <flux/core.h>
 #include <czmq.h>

-#if HAVE_VALGRIND_VALGRIND_H
-#include <valgrind/valgrind.h>
+#if HAVE_VALGRIND_H
+#include <valgrind.h>

@@ -437,7 +437,7 @@ void sched_plugin_loader_destroy (struct sched_plugin_loader *sploader)
     }
 }

-#if HAVE_VALGRIND_VALGRIND_H
+#if HAVE_VALGRIND_H
 /* Disable dlclose() during valgrind operation
  */
 void I_WRAP_SONAME_FNNAME_ZZ(Za,dlclose)(void *dso) {}
--- a/t/t5000-valgrind.t
+++ b/t/t5000-valgrind.t
@@ -16,7 +16,7 @@ fi
 #  However, allow run to be forced on the cmdline with -d, --debug.
 #
 have_valgrind_h() {
-    grep -q "^#define HAVE_VALGRIND_VALGRIND_H" ${SHARNESS_BUILD_DIRECTORY}/config.h
+    grep -q "^#define HAVE_VALGRIND_H" ${SHARNESS_BUILD_DIRECTORY}/config.h
 }

With this, the new valgrind test succeeds on quartz:

PASS: t5000-valgrind.t 1 - found executable flux-broker
PASS: t5000-valgrind.t 2 - valgrind reports no new errors on single broker run

Do you want me to push this somewhere you can pick?

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented May 1, 2018

It's here.

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented May 1, 2018

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented May 2, 2018

FYI TCE valgrid package issue has been fixed on Quartz and will be propagated to the rest of TOSS3 system in a month.

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented May 2, 2018

I also got the valgrind-devel package added to TOSS so at least on TOSS3 valgrind.h will be found by default without mucking with the environment.

grondo added some commits May 1, 2018

sched: add valgrind wrap for dlclose in plugin.c
Work around missing symbols when flux/flux-sched is run under
valgrind by using valgrind's wrap specification to disable
dlclose(), but only when run under valgrind.

This may enable the flux-sched valgrind test to work even under
flux-core which was not configured with valgrind/valgrind.h (i.e.
once sched.so is loaded, all calls to dlclose() should be
rerouted to dlclose() noop wrapper.)
t5000-valgrind.t: check for HAVE_VALGRIND in config.h
In t5000-valgrind.t switch from kludgy check for valgrind wrap
symbol in flux-broker binary to using the same test as flux-core,
i.e. check that valgrind.h was found by configure.

This works now because sched is installing its own valgrind
wrap spec in sched/plugin.c.
m4: add ax_valgrind.m4
Add AX_VALGRIND macro to condense the series of checks for
valgrind.h, resulting in

 1. try pkg-config, look for either valgrind.h or valgrind/valgrind.h
     if found using VALGRIND_CFLAGS
 2. fall back to looking for valgrind.h or valgrind/valgrind.h without
     pkg-config

@grondo grondo force-pushed the grondo:valgrind-wrap-dlclose branch from 9c7761c to af2320b May 2, 2018

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented May 3, 2018

@dongahn, sorry this took me awhile to fix up.

I ran into a couple problems with pkg-config. First, by default PKG_CHECK_MODULES makes packages a build requirement, so if valgrind.pc was not found the build would fail. Second, on some platforms valgrind.h or valgrind/valgrind.h may be present without a valgrind.pc in PKG_CONFIG_PATH, so valgrind would not be properly detected.

I tried to come up with a solution that would work for most cases, and placed it into its own macro file as AX_VALGRIND. I'll copy this solution back to flux-core also if you can verify it here before merging.

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented May 3, 2018

@grondo, looks good to me!

In terms of the commit sequence, though maybe the m4 commit should come first?

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented May 3, 2018

@dongahn

This comment has been minimized.

Copy link
Contributor

dongahn commented May 3, 2018

Ah. That makes sense. I am ok as is then. Thanks.

@dongahn dongahn merged commit 6ab67ae into flux-framework:master May 3, 2018

2 of 4 checks passed

codecov/patch 0% of diff hit (target 73.68%)
Details
codecov/project 73.67% (-0.01%) compared to a6c99a5
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage decreased (-0.008%) to 75.22%
Details
@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented May 3, 2018

Thanks!

@grondo grondo deleted the grondo:valgrind-wrap-dlclose branch May 3, 2018

@grondo grondo referenced this pull request May 11, 2018

Closed

Need 0.5.0 Release #340

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.