Skip to content

opal: use after free in MCA component repository in MPI_Finalize #6259

@rtoijala

Description

@rtoijala

Background information

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

git master acc2a70
(current latest)

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

From a git clone, with --enable-debug.

Please describe the system on which you are running

  • Operating system/version: Ubuntu 16.04 x86_64
  • Computer hardware: Intel desktop
  • Network type: N/A

Details of the problem

A simple no-op test program causes a use-after-free in MPI_Finalize.

Test program:

program a
    use mpi
    implicit none
    integer :: err
    call MPI_Init(err)
    call MPI_Finalize(err)
end program a

Run with valgrind ./a.out

Valgrind error:

Invalid read of size 4
   at 0x5B9FFB4: opal_hash_table_get_value_ptr (opal_hash_table.c:653)
   by 0x5BE61F2: find_component (mca_base_component_repository.c:316)
   by 0x5BE628D: mca_base_component_repository_release (mca_base_component_repository.c:338)
   by 0x5BE786B: mca_base_component_unload (mca_base_components_close.c:46)
   by 0x5BE78FF: mca_base_component_close (mca_base_components_close.c:65)
   by 0x5BE7987: mca_base_components_close (mca_base_components_close.c:91)
   by 0x5BE792E: mca_base_framework_components_close (mca_base_components_close.c:71)
   by 0x5C6ACB8: opal_installdirs_base_close (installdirs_base_components.c:171)
   by 0x5BF78A2: mca_base_framework_close (mca_base_framework.c:252)
   by 0x5BF7C3B: mca_base_framework_close_list (mca_base_framework.c:292)
   by 0x5BAF20B: opal_finalize_cleanup_domain (opal_finalize.c:136)
   by 0x5BAF36B: opal_finalize_util (opal_finalize.c:151)
 Address 0x7605d80 is 2,144 bytes inside a block of size 8,672 free'd
   at 0x4C2EDEB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5B9F184: opal_hash_table_destruct (opal_hash_table.c:144)
   by 0x5BE5380: opal_obj_run_destructors (opal_object.h:462)
   by 0x5BE6D64: mca_base_component_repository_finalize (mca_base_component_repository.c:562)
   by 0x5BE3805: mca_base_close (mca_base_close.c:59)
   by 0x5BAF20B: opal_finalize_cleanup_domain (opal_finalize.c:136)
   by 0x5BAF36B: opal_finalize_util (opal_finalize.c:151)
   by 0x57F80E1: ompi_mpi_finalize (ompi_mpi_finalize.c:495)
   by 0x58396B0: PMPI_Finalize (pfinalize.c:54)
   by 0x4E8423E: PMPI_FINALIZE (pfinalize_f.c:71)
   by 0x400A15: MAIN__ (in [...]/a.out)
   by 0x400A4C: main (in [...]/a.out)
 Block was alloc'd at
   at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5B9F21C: opal_hash_table_init2 (opal_hash_table.c:167)
   by 0x5B9F2DB: opal_hash_table_init (opal_hash_table.c:185)
   by 0x5BE5FFE: mca_base_component_repository_init (mca_base_component_repository.c:258)
   by 0x5BE8272: mca_base_open (mca_base_open.c:172)
   by 0x5BAFF34: opal_init_util (opal_init.c:501)
   by 0x57F56EE: ompi_mpi_init (ompi_mpi_init.c:428)
   by 0x584765B: PMPI_Init (pinit.c:67)
   by 0x4E87D3E: mpi_init (pinit_f.c:87)
   by 0x400A09: MAIN__ (in [...]/a.out)
   by 0x400A4C: main (in [...]/a.out)

The order in which opal_init_util_frameworks calls its finalizers is

mca_base_close
opal_dss_close
opal_datatype_finalize
opal_net_finalize
opal_deregister_params
mca_base_var_finalize
opal_util_keyval_parse_finalize
opal_show_help_finalize
opal_malloc_finalize
opal_output_finalize
mca_base_framework_close_list(opal_init_util_frameworks)

The first of these ends up calling mca_base_component_repository_finalize.
The last one tries to call mca_base_framework_close on installdirs, which then tries to free its components and attempts to use the already destructed hash table mca_base_component_repository.

To confirm this, add the line ht->ht_table = NULL; in the function opal_hash_table_destruct, which changes the use-after-free into a segfault.

My understanding of the component system is not sufficient to suggest a solution to the problem. The following comment in opal_installdirs_base_open is perhaps related:

    /* NTH: Is it ok not to close the components? If not we can add a flag
       to mca_base_framework_components_close to indicate not to deregister
       variable groups */

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions