Parrot may end up calling dlclose() twice with the same handle, ref. t/pmc/threads.t test 14 #363

Open
parrot opened this Issue Nov 30, 2009 · 11 comments

Comments

Projects
None yet
3 participants
@ghost

ghost commented Nov 30, 2009

I have been puzzled by why the last test in t/pmc/threads.t (number 14 at last count) fails on all the NetBSD systems I'm testing on.

It turns out that the reason is that this particular test causes parrot to call dlclose() twice with the same handle value, and ld.elf_so has a check for whether the given handle is "valid", which it isn't in the second instance, causing spurious output from the test, which in turns causes the test to fail, like so:

not ok 14 - CLONE_CODE|CLONE_GLOBALS|CLONE_HLL|CLONE_LIBRARIES - TT \# 1250
#   Failed test 'CLONE_CODE|CLONE_GLOBALS|CLONE_HLL|CLONE_LIBRARIES - TT \# 1250'
#   at t/pmc/threads.t line 831.
#          got: 'in thread:
#0
# ok (equal)
#42
# in main:
#0
# ok (equal)
#42
# Invalid shared object handle 0xeffed600
# '
#     expected: 'in thread:
#0
# ok (equal)
#42
# in main:
#0
# ok (equal)
#42
# '
# Looks like you failed 1 test of 14.

The following GDB session shows that this is the problem:

granny-smith: {130} gdb ./parrot
GNU gdb 6.5
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "powerpc--netbsd"...
(gdb) b dlclose
Function "dlclose" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y

Breakpoint 1 (dlclose) pending.
(gdb) r t/pmc/threads_14.pir
Starting program: /usr/users/he/parrot/parrot t/pmc/threads_14.pir
Breakpoint 2 at 0xefff594c
Pending breakpoint "dlclose" resolved
in thread:
0
ok (equal)
42

Breakpoint 2, 0xefff594c in dlclose () from /usr/libexec/ld.elf_so
(gdb) where
#0  0xefff594c in dlclose () from /usr/libexec/ld.elf_so
#1  0xefd65f30 in Parrot_dlclose (handle=0xeffed600)
    at config/gen/platform/generic/dl.c:96
#2  0xefe6e274 in Parrot_ParrotLibrary_destroy (interp=0xee8031e0, 
    pmc=0xee7d3e60) at ./src/pmc/parrotlibrary.pmc:65
#3  0xefd6a204 in Parrot_pmc_destroy (interp=0xee8031e0, pmc=0xee7d3e60)
    at src/pmc.c:116
#4  0xefd092d0 in free_pmc_in_pool (interp=0xee8031e0, pool_unused=0xee710150, 
    p=0xee7d3e60) at src/gc/mark_sweep.c:602
#5  0xefd083d8 in Parrot_gc_sweep_pool (interp=0xee8031e0, pool=0xee710150)
    at src/gc/mark_sweep.c:331
#6  0xefd071f0 in gc_ms_finalize (interp=0xee8031e0, mem_pools=0xee807f60)
    at src/gc/gc_ms.c:232
#7  0xefd07490 in gc_ms_mark_and_sweep (interp=0xee8031e0, flags=4)
    at src/gc/gc_ms.c:161
#8  0xefd03e74 in Parrot_gc_mark_and_sweep (interp=0xee8031e0, flags=4)
    at src/gc/api.c:856
#9  0xefd1aab8 in Parrot_really_destroy (interp=0xee8031e0, 
    exit_code_unused=0, arg_unused=0x0) at src/interp/inter_create.c:337
#10 0xefd79e58 in pt_thread_join (parent=0xee803100, tid=1)
    at src/thread.c:1383
#11 0xeff16fc8 in Parrot_ParrotRunningThread_nci_join (interp=0xee803100, 
    pmc=0xee8afbac) at ./src/pmc/parrotrunningthread.pmc:108
#12 0xefe7d4a0 in Parrot_NCI_invoke (interp=0xee803100, pmc=0xee8afbac, 
    next=0xee7193d0) at ./src/pmc/nci.pmc:338
#13 0xefc5719c in Parrot_callmethodcc_p_sc (cur_opcode=0xee7193c4, 
    interp=0xee803100) at src/ops/object.ops:74
#14 0xefd6e3f4 in runops_slow_core (interp=0xee803100, runcore=0xee8ee5c0, 
    pc=0xee7193c4) at src/runcore/cores.c:848
#15 0xefd6bae8 in runops_int (interp=0xee803100, offset=64)
    at src/runcore/main.c:546
#16 0xefd1c8bc in runops (interp=0xee803100, offs=64) at src/call/ops.c:99
#17 0xefd1225c in Parrot_pcc_invoke_from_sig_object (interp=0xee803100, 
    sub_obj=0xee8b1d94, call_object=0xee8b1dbc) at src/call/pcc.c:297
#18 0xefd123c4 in Parrot_pcc_invoke_sub_from_c_args (interp=0xee803100, 
    sub_obj=0xee8b1d94, sig=0xeff83ea0 "P->") at src/call/pcc.c:76
#19 0xefceba2c in Parrot_runcode (interp=0xee803100, argc=1, argv=0xffffd870)
    at src/embed.c:825
#20 0xeff51a58 in imcc_run_pbc (interp=0xee803100, obj_file=0, 
    output_file=0x0, argc=1, argv=0xffffd870) at compilers/imcc/main.c:790
#21 0xeff51e10 in imcc_run (interp=0xee803100, 
    sourcefile=0xffffe998 "t/pmc/threads_14.pir", argc=1, argv=0xffffd870)
    at compilers/imcc/main.c:1073
#22 0x01800be8 in main (argc=1, argv=0xffffd870) at src/main.c:60
(gdb) c
Continuing.
in main:
0
ok (equal)
42

Breakpoint 2, 0xefff594c in dlclose () from /usr/libexec/ld.elf_so
(gdb) where
#0  0xefff594c in dlclose () from /usr/libexec/ld.elf_so
#1  0xefd65f30 in Parrot_dlclose (handle=0xeffed600)
    at config/gen/platform/generic/dl.c:96
#2  0xefe6e274 in Parrot_ParrotLibrary_destroy (interp=0xee803100, 
    pmc=0xee8e5ec4) at ./src/pmc/parrotlibrary.pmc:65
#3  0xefd6a204 in Parrot_pmc_destroy (interp=0xee803100, pmc=0xee8e5ec4)
    at src/pmc.c:116
#4  0xefd092d0 in free_pmc_in_pool (interp=0xee803100, pool_unused=0xee84c150, 
    p=0xee8e5ec4) at src/gc/mark_sweep.c:602
#5  0xefd083d8 in Parrot_gc_sweep_pool (interp=0xee803100, pool=0xee84c150)
    at src/gc/mark_sweep.c:331
#6  0xefd071f0 in gc_ms_finalize (interp=0xee803100, mem_pools=0xee807080)
    at src/gc/gc_ms.c:232
#7  0xefd07490 in gc_ms_mark_and_sweep (interp=0xee803100, flags=4)
    at src/gc/gc_ms.c:161
#8  0xefd03e74 in Parrot_gc_mark_and_sweep (interp=0xee803100, flags=4)
    at src/gc/api.c:856
#9  0xefd1aab8 in Parrot_really_destroy (interp=0xee803100, 
    exit_code_unused=0, arg_unused=0x0) at src/interp/inter_create.c:337
#10 0xefcf1038 in Parrot_exit (interp=0xee803100, status=0) at src/exit.c:90
#11 0x01800c04 in main (argc=1, argv=0xffffd870) at src/main.c:65
(gdb) c
Continuing.
Invalid shared object handle 0xeffed600

Program exited normally.
(gdb) 

Note that the handle= value in the call preceding dlclose() is the same in the two cases.

Parrot should not be calling dlclose() with the same handle value in a row, without the second one being the result of a dlopen(), which isn't the case here. OK, that's not proven by the above, but this one does:

(gdb) b dlopen
Breakpoint 3 at 0xefff5d60
(gdb) r
Starting program: /usr/users/he/parrot/parrot t/pmc/threads_14.pir
...
Breakpoint 3, 0xefff5d60 in dlopen () from /usr/libexec/ld.elf_so
(gdb) c
Continuing.
in thread:
0
ok (equal)
42

Breakpoint 2, 0xefff594c in dlclose () from /usr/libexec/ld.elf_so
(gdb) c
Continuing.
in main:
0
ok (equal)
42

Breakpoint 2, 0xefff594c in dlclose () from /usr/libexec/ld.elf_so
(gdb) c
Continuing.
Invalid shared object handle 0xeffed600

Program exited normally.
(gdb) 

Originally http://trac.parrot.org/parrot/ticket/1340 by heidnes

@ghost

ghost commented Nov 30, 2009

Trac commenter: heidnes

Hmm, this may have to do with the "clone library" functionality, which appears to create two copies of a shared library, which however share the dlhandle(), ref. the clone() function. Things go wrong when the second instance is destroy()ed. This happens in code which comes from src/pmc/parrotlibrary.pmc's destroy() and clone() functions, I suspect.

@ghost

ghost commented Dec 1, 2009

2046 byte attachment from heidnes
at http://trac.parrot.org/parrot/raw-attachment/ticket/1340/diff

```Index: config/gen/platform/generic/dl.c

--- config/gen/platform/generic/dl.c (revision 42823)
+++ config/gen/platform/generic/dl.c (working copy)
@@ -22,11 +22,69 @@
*/

#ifdef PARROT_HAS_HEADER_DLFCN
+# include <stddef.h>
+# include <stdlib.h>

include <dlfcn.h>

#endif

#define PARROT_DLOPEN_FLAGS RTLD_LAZY

+#ifdef PARROT_HAS_HEADER_DLFCN
+
+struct handle_entry {

  • void *handle;
  • struct handle_entry *next;
    +};

+struct handle_entry *handle_list = NULL;
+
+static void
+push_handle_entry(void *handle)
+{

  • struct handle_entry *e;
  • e = malloc(sizeof(struct handle_entry));
  • if (!e) { return; }
  • e->handle = handle;
  • e->next = handle_list;
  • handle_list = e;
    +}

+static void *
+find_handle_entry(void *handle)
+{

  • struct handle_entry *e;
  • for(e = handle_list; e; e = e->next) {
  • if (e->handle == handle)
  •   return handle;
    
  • }
  • return NULL;
    +}

+static void
+remove_handle_entry(void *handle)
+{

  • struct handle_entry *cur, *prev, *p;
  • if (handle_list) {
  • if (handle_list->handle == handle) {
  •   p = handle_list;
    
  •   handle_list = p->next;
    
  •   free(p);
    
  • } else {
  •   for (cur = handle_list; cur; prev = cur, cur = cur->next) {
    
  •   if (cur->handle == handle) {
    
  •       prev->next = cur->next;
    
  •       free(cur);
    
  •   }
    
  •   }
    
  • }
  • }
    +}
    +#endif /* PARROT_HAS_HEADER_DLFCN _/

/_

=item C<void * Parrot_dlopen(const char *filename)>
@@ -39,7 +97,11 @@
Parrot_dlopen(const char *filename)
{
#ifdef PARROT_HAS_HEADER_DLFCN

  • return dlopen(filename, PARROT_DLOPEN_FLAGS);
  • void *h;
  • h = dlopen(filename, PARROT_DLOPEN_FLAGS);
  • push_handle_entry(h);
  • return h;
    #else
    return 0;
    #endif
    @@ -93,7 +155,13 @@
    Parrot_dlclose(void *handle)
    {
    #ifdef PARROT_HAS_HEADER_DLFCN
  • return dlclose(handle);
  • int rv;
  • if (find_handle_entry(handle)) {
  • remove_handle_entry(handle);
  • rv = dlclose(handle);
  • return rv;
  • }
    #else
    return -1;
    #endif
@ghost

ghost commented Dec 1, 2009

Trac commenter: heidnes

    <p>

New version of diff, ensuring to include required headers, and omit helper functions if we don't need them.

@ghost

ghost commented Dec 1, 2009

Replying to heidnes:

Hmm, this may have to do with the "clone library" functionality, which appears to create two copies of a shared library, which however share the dlhandle(), ref. the clone() function. Things go wrong when the second instance is destroy()ed. This happens in code which comes from src/pmc/parrotlibrary.pmc's destroy() and clone() functions, I suspect.

It is permissible to call dlclose() twice on the same handle, if the shared object has been dlopen()-ed twice. The two calls to dlopen() will both return the same handle.

I don't know what, exactly, parrot means by "clone" on a shared object, but if the clone method simply called dlopen() again, the multiple calls to dlclose() would probably work as intended.

@ghost

ghost commented Dec 1, 2009

Trac commenter: heidnes

Replying to doughera:

It is permissible to call dlclose() twice on the same handle, if the shared object has been dlopen()-ed twice. The two calls to dlopen() will both return the same handle.

Maybe. I've not checked if there is a standard defining that part.

I don't know what, exactly, parrot means by "clone" on a shared object, but if the clone method simply called dlopen() again, the multiple calls to dlclose() would probably work as intended.

The clone operation looks like this (from src/pmc/parrotlibrary.pmc):

    VTABLE PMC *clone() {
        PMC \* const dest     = pmc_new(INTERP, SELF->vtable->base_type);
        PMC_oplib_init(dest) = PMC_oplib_init(SELF);
        PMC_dlhandle(dest)   = PMC_dlhandle(SELF);
        if (PMC_metadata(SELF))
            PMC_metadata(dest) = VTABLE_clone(INTERP, PMC_metadata(SELF));
        return dest;
    }

i.e. it just copies the handle returned by dlopen(), it does not dlopen() a second time.
When it comes time to clean up after each of the library instances, dlclose() gets called
twice with the same handle value, returned from a single dlopen() call. I suspect that is
an error.

The patch I created prevents this from happening.

Member

Benabik commented Mar 8, 2012

This is apparently still an issue on NetBSD (see #732). Does anyone have NetBSD to try this? Has anyone made sure the patch works on not-BSD?

Owner

leto commented Mar 8, 2012

We have access to a NetBSD machine on the GCC Compile Farm:

name port disk CPU Notes
gcc70 160G 2x3.2 GHz Intel Xeon 3.2E (Irwindale) / 3 GB RAM / Dell Poweredge SC1425 / NetBSD 5.1 amd64

http://gcc.gnu.org/wiki/CompileFarm

This kind of patch definitely needs to be tested on all our supported platforms before being merged.

Owner

leto commented Mar 10, 2012

@he32 @Benabik it looks like the updated diff was mangled in the conversion from Trac to Github, can you resubmit the diff as a pull request against Parrot 4.1.0 or the master branch?

Owner

leto commented Mar 10, 2012

Ah, I was mistaken. The updated diff is at http://trac.parrot.org/parrot/raw-attachment/ticket/1340/diff

Owner

leto commented Mar 10, 2012

The patch no longer applies, but I am working on a manual merge now.

@leto leto added a commit that referenced this issue Mar 10, 2012

@leto leto Cast a malloc to make g++ happy, #363 6050332
Owner

leto commented Mar 10, 2012

With the commit above, on the gh363/dlclose branch, "make test" passes for me on g++. If anybody can test that branch on other platforms, that would be spiffy.

leto was assigned Mar 28, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment