Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

valgrind reports no new errors on single broker run #1097

Closed
lipari opened this issue Jun 28, 2017 · 5 comments
Closed

valgrind reports no new errors on single broker run #1097

lipari opened this issue Jun 28, 2017 · 5 comments

Comments

@lipari
Copy link
Contributor

lipari commented Jun 28, 2017

building and checking flux-core on ipa15, I received the following failure:

lipari@ipa15$ ./t5000-valgrind.t --debug --verbose
sharness: loading extensions from /g/g0/lipari/flux-core/t/sharness.d/01-setup.sh
sharness: loading extensions from /g/g0/lipari/flux-core/t/sharness.d/flux-sharness.sh
expecting success: 
	flux ${VALGRIND} \
		--tool=memcheck \
		--leak-check=full \
		--gen-suppressions=all \
		--trace-children=no \
		--child-silent-after-fork=yes \
		--num-callers=30 \
		--leak-resolution=med \
		--error-exitcode=1 \
		--suppressions=$VALGRIND_SUPPRESSIONS \
		${BROKER} --shutdown-grace=4 ${VALGRIND_WORKLOAD} 10

==132203== Memcheck, a memory error detector
==132203== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==132203== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==132203== Command: /g/g0/lipari/flux-core/src/broker/.libs/lt-flux-broker --shutdown-grace=4 /g/g0/lipari/flux-core/t/valgrind/valgrind-workload.sh 10
==132203== 
FLUX_URI=local:///var/tmp/lipari/flux-RzN8zP
==132203== 
==132203== HEAP SUMMARY:
==132203==     in use at exit: 11,368 bytes in 54 blocks
==132203==   total heap usage: 591,957 allocs, 591,903 frees, 67,393,582 bytes allocated
==132203== 
==132203== 1,280 bytes in 1 blocks are definitely lost in loss record 25 of 27
==132203==    at 0x4C28BE3: malloc (vg_replace_malloc.c:299)
==132203==    by 0xD289E0E: ???
==132203==    by 0xD28B672: ???
==132203==    by 0xD288692: ???
==132203==    by 0xD04A4C9: ???
==132203==    by 0x40ABF8: module_thread (module.c:158)
==132203==    by 0x59D2DC4: start_thread (pthread_create.c:308)
==132203==    by 0x63E776C: clone (clone.S:113)
==132203== 
{
   <insert_a_suppression_name_here>
   Memcheck:Leak
   match-leak-kinds: definite
   fun:malloc
   obj:*
   obj:*
   obj:*
   obj:*
   fun:module_thread
   fun:start_thread
   fun:clone
}
==132203== 3,328 (2,816 direct, 512 indirect) bytes in 1 blocks are definitely lost in loss record 27 of 27
==132203==    at 0x4C28BE3: malloc (vg_replace_malloc.c:299)
==132203==    by 0xD289E0E: ???
==132203==    by 0xD28A3D9: ???
==132203==    by 0xD287DBC: ???
==132203==    by 0xD04A493: ???
==132203==    by 0x40ABF8: module_thread (module.c:158)
==132203==    by 0x59D2DC4: start_thread (pthread_create.c:308)
==132203==    by 0x63E776C: clone (clone.S:113)
==132203== 
{
   <insert_a_suppression_name_here>
   Memcheck:Leak
   match-leak-kinds: definite
   fun:malloc
   obj:*
   obj:*
   obj:*
   obj:*
   fun:module_thread
   fun:start_thread
   fun:clone
}
==132203== LEAK SUMMARY:
==132203==    definitely lost: 4,096 bytes in 2 blocks
==132203==    indirectly lost: 512 bytes in 1 blocks
==132203==      possibly lost: 0 bytes in 0 blocks
==132203==    still reachable: 6,760 bytes in 51 blocks
==132203==         suppressed: 0 bytes in 0 blocks
==132203== Reachable blocks (those to which a pointer was found) are not shown.
==132203== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==132203== 
==132203== For counts of detected and suppressed errors, rerun with: -v
==132203== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
not ok 1 - valgrind reports no new errors on single broker run
#	
#		flux ${VALGRIND} \
#			--tool=memcheck \
#			--leak-check=full \
#			--gen-suppressions=all \
#			--trace-children=no \
#			--child-silent-after-fork=yes \
#			--num-callers=30 \
#			--leak-resolution=med \
#			--error-exitcode=1 \
#			--suppressions=$VALGRIND_SUPPRESSIONS \
#			${BROKER} --shutdown-grace=4 ${VALGRIND_WORKLOAD} 10
#	

# failed 1 among 1 test(s)
1..1
@lipari
Copy link
Contributor Author

lipari commented Jun 28, 2017

For the record, got the same failure on hype2.

@grondo
Copy link
Contributor

grondo commented Jun 28, 2017

Oh, oops, the build must not have found valgrind.h on these systems so modules ended up being dlclosed. I'll rerun with dlclose commented out.

@grondo
Copy link
Contributor

grondo commented Jun 28, 2017

Hm, this might be a false positive. When I build with dlclose enabled in the broker, I reproduce @lipari's result above, but once I comment out the dlclose, valgrind runs clean.

@grondo
Copy link
Contributor

grondo commented Jun 28, 2017

Not sure what to do here, there are a couple approaches I can think of

  • Disable valgrind test when valgrind.h wasn't found (with an option to force-run it). Would probably have to grep for #undef HAVE_VALGRIND_VALGRIND_H or use CPP to check config/config.h
  • Add some kind of suppression for this specific case, so that errors from unknown modules are suppressed. I'm not sure this approach is even possible.

@garlick
Copy link
Member

garlick commented Jun 28, 2017

The first option seems like not a bad quick fix.

grondo added a commit to grondo/flux-core that referenced this issue Jun 28, 2017
Disable t5000-valgrind.t by default when valgrind/valgrind.h was
not found by ./configure. This means that the valgrind hook to
disable dlclose() for modules is not active, and this has been found
to cause false positives for this test.

The test can still be forced by hand with the use of -d, --debug flag,
e.g.

 ./t5000-valgrind.t -d

Fixes flux-framework#1097
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants