Skip to content

Conversation

@hjelmn
Copy link
Member

@hjelmn hjelmn commented Apr 9, 2015

This commit fixes several vagrind errors. Included:

  • installdirs did not correctly reinitialize all pointers to NULL
    at close. This causes valgrind errors on a subsequent call to
    opal_init_tool.
  • several opal strings were leaked by opal_deregister_params which
    was setting them to NULL instead of letting them be freed by the
    MCA variable system.
  • move opal_net_init to AFTER the variable system is initialized and
    opal's MCA variables have been registered. opal_net_init uses a
    variable registered by opal_register_params!
  • do not leak ompi_mpi_main_thread when it is allocated by
    MPI_T_init_thread.
  • do not overwrite ompi_mpi_main_thread if it is already set (by
    MPI_T_init_thread).
  • mca_base_var: read_files was overwritting mca_base_var_file_list
    even if it was non-NULL.
  • mca_base_var: set all file global variables to initial states on
    finalize.
  • btl/vader: decrement enumerator reference count to ensure that it
    is freed.

Signed-off-by: Nathan Hjelm hjelmn@lanl.gov

@hjelmn
Copy link
Member Author

hjelmn commented Apr 9, 2015

Now running valgrind clean for a simple program executing:

MPI_T_init_thread();
MPI_T_finalize();

Probably bound for 1.8 if @rhc54 approves.

@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/428/

Build Log
last 50 lines

[...truncated 21911 lines...]
7ffff7df6000-7ffff7ff6000 ---p 000ef000 08:06 263055                     /scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/lib/libopen-rte.so.0.0.0
7ffff7ff6000-7ffff7ffb000 rw-p 000ef000 08:06 263055                     /scrap/jenkins/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/lib/libopen-rte.so.0.0.0
7ffff7ffb000-7ffff7ffe000 rw-p 00000000 00:00 0 
7ffff7ffe000-7ffff7fff000 r-xp 00000000 00:00 0                          [vdso]
7ffff8000000-7ffff8200000 rw-s 00115000 00:05 13822                      /dev/infiniband/uverbs1
7ffff8200000-7ffff8400000 rw-s 00115000 00:05 13822                      /dev/infiniband/uverbs1
7ffffffea000-7ffffffff000 rw-p 00000000 00:00 0                          [stack]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
[jenkins01:27310] *** Process received signal ***
[jenkins01:27310] Signal: Aborted (6)
[jenkins01:27310] Signal code:  (-6)
[jenkins01:27310] [ 0] /lib64/libpthread.so.0[0x3d6980f710]
[jenkins01:27310] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x3d69032925]
[jenkins01:27310] [ 2] /lib64/libc.so.6(abort+0x175)[0x3d69034105]
[jenkins01:27310] [ 3] /lib64/libc.so.6[0x3d69070837]
[jenkins01:27310] [ 4] /lib64/libc.so.6[0x3d69076166]
[jenkins01:27310] [ 5] /lib64/libc.so.6[0x3d69078c93]
[jenkins01:27310] [ 6] /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/lib/libopen-pal.so.0(opal_free+0x1f)[0x7ffff7a4a0e7]
[jenkins01:27310] [ 7] /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/lib/libopen-pal.so.0(opal_finalize_util+0xc4)[0x7ffff79fb1a8]
[jenkins01:27310] [ 8] /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/lib/libopen-pal.so.0(opal_finalize+0xf6)[0x7ffff79fb2b3]
[jenkins01:27310] [ 9] /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/lib/libopen-rte.so.0(orte_finalize+0xd4)[0x7ffff7d1e730]
[jenkins01:27310] [10] mpirun[0x40590c]
[jenkins01:27310] [11] mpirun[0x4037a4]
[jenkins01:27310] [12] /lib64/libc.so.6(__libc_start_main+0xfd)[0x3d6901ed1d]
[jenkins01:27310] [13] mpirun[0x4036c9]
[jenkins01:27310] *** End of error message ***
Build step 'Execute shell' marked build as failure
TAP Reports Processing: START
Looking for TAP results report in workspace using pattern: **/*.tap
Saving reports...
Processing '/var/lib/jenkins/jobs/gh-ompi-master-pr/builds/428/tap-master-files/cov_stat.tap'
Parsing TAP test result [/var/lib/jenkins/jobs/gh-ompi-master-pr/builds/428/tap-master-files/cov_stat.tap].
not ok - coverity detected 911 failures in all_428 # SKIP http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr//ws/cov_build/all_428/output/errors/index.html
not ok - coverity detected 5 failures in oshmem_428 # TODO http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr//ws/cov_build/oshmem_428/output/errors/index.html
ok - coverity found no issues for yalla_428
ok - coverity found no issues for mxm_428
not ok - coverity detected 2 failures in fca_428 # TODO http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr//ws/cov_build/fca_428/output/errors/index.html
ok - coverity found no issues for hcoll_428

TAP Reports Processing: FINISH
coverity_for_all    http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr//ws/cov_build/all_428/output/errors/index.html
coverity_for_oshmem http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr//ws/cov_build/oshmem_428/output/errors/index.html
coverity_for_fca    http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr//ws/cov_build/fca_428/output/errors/index.html
[copy-to-slave] The build is taking place on the master node, no copy back to the master will take place.
Setting commit status on GitHub for https://api.github.com/repos/open-mpi/ompi/commit/4f7f890223a8ab9fbefd41b6a9b396f246f7176e
[BFA] Scanning build for known causes...

[BFA] Done. 0s
Setting status of d9c690f52a014356150aa60070c15b214a90714b to FAILURE with url http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr/428/ and message: Merged build finished.

Test FAILed.

This commit fixes several vagrind errors. Included:

 - installdirs did not correctly reinitialize all pointers to NULL
   at close. This causes valgrind errors on a subsequent call to
   opal_init_tool.

 - several opal strings were leaked by opal_deregister_params which
   was setting them to NULL instead of letting them be freed by the
   MCA variable system.

 - move opal_net_init to AFTER the variable system is initialized and
   opal's MCA variables have been registered. opal_net_init uses a
   variable registered by opal_register_params!

 - do not leak ompi_mpi_main_thread when it is allocated by
   MPI_T_init_thread.

 - do not overwrite ompi_mpi_main_thread if it is already set (by
   MPI_T_init_thread).

 - mca_base_var: read_files was overwritting mca_base_var_file_list
   even if it was non-NULL.

 - mca_base_var: set all file global variables to initial states on
   finalize.

 - btl/vader: decrement enumerator reference count to ensure that it
   is freed.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
@hjelmn hjelmn force-pushed the valgrind_cleanness branch from d9c690f to a7b0c00 Compare April 11, 2015 15:29
@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/429/
Test PASSed.

hjelmn added a commit that referenced this pull request Apr 13, 2015
fix memory leaks and valgrind errors
@hjelmn hjelmn merged commit 113c890 into open-mpi:master Apr 13, 2015
@mike-dubman
Copy link
Member

bot:retest

@mike-dubman
Copy link
Member

  • we had a copy&paste error in jenkins script which did not exit with error for some failed test
  • it was fixed in jenkins script and uncovered that this commit breaks "-am" and "-tune" options

@elenash
Copy link
Contributor

elenash commented Apr 15, 2015

@hjelmn I made a fix for that in #532
Could you please take a look?

jsquyres added a commit to jsquyres/ompi that referenced this pull request Nov 10, 2015
…-map

yalla: fix passing on-demand mapping config to mxm.
@hjelmn hjelmn deleted the valgrind_cleanness branch May 23, 2016 17:44
markalle pushed a commit to markalle/ompi that referenced this pull request Sep 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants