Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional SEGSIGV with lgi.DBus #88

Closed
Elv13 opened this issue Oct 23, 2014 · 14 comments · Fixed by #144
Closed

Occasional SEGSIGV with lgi.DBus #88

Elv13 opened this issue Oct 23, 2014 · 14 comments · Fixed by #144

Comments

@Elv13
Copy link

Elv13 commented Oct 23, 2014

Hello,

Thanks for answering the question on the other DBus issue. I implemented a DBus wrapper (wor in progress) and it usually work just fine. However, I notice there is occasional segfaults

luajit:

(gdb) thread apply all bt full

Thread 2 (LWP 3892):
#0  0x00007ffff6304d17 in pthread_mutex_lock () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00007ffff67c31e7 in g_mutex_lock (mutex=<optimized out>) at /var/tmp/portage/dev-libs/glib-2.40.0-r1/work/glib-2.40.0/glib/gthread-posix.c:209
        _g_boolean_var_ = <optimized out>
        status = <optimized out>
#2  0x00007ffff605c6ed in write_message_cb (source_object=<optimized out>, res=0x6c8990, user_data=0x6ced50) at /var/tmp/portage/dev-libs/glib-2.40.0-r1/work/glib-2.40.0/gio/gdbusprivate.c:1331
        data = 0x6ced50
        error = 0x6c8990
        __FUNCTION__ = "write_message_cb"
#3  0x00007ffff5fe69b7 in g_simple_async_result_complete (simple=0x6c8990) at /var/tmp/portage/dev-libs/glib-2.40.0-r1/work/glib-2.40.0/gio/gsimpleasyncresult.c:763
        current_source = 0x6db8d0
        current_context = 0x7ffff6304d10 <pthread_mutex_lock>
        __FUNCTION__ = "g_simple_async_result_complete"
#4  0x00007ffff605a9ce in write_message_continue_writing (data=data@entry=0x6ced50) at /var/tmp/portage/dev-libs/glib-2.40.0-r1/work/glib-2.40.0/gio/gdbusprivate.c:1095
        ostream = <optimized out>
        simple = 0x6c8990
        fd_list = <optimized out>
        __FUNCTION__ = "write_message_continue_writing"
#5  0x00007ffff605b211 in write_message_async (user_data=0x6ced50, callback=0x7ffff605c6d0 <write_message_cb>, data=0x6ced50, worker=<optimized out>) at /var/tmp/portage/dev-libs/glib-2.40.0-r1/work/glib-2.40.0/gio/gdbusprivate.c:1131
No locals.
#6  continue_writing (worker=<optimized out>) at /var/tmp/portage/dev-libs/glib-2.40.0-r1/work/glib-2.40.0/gio/gdbusprivate.c:1531
        old_message = 0x6d2230
        new_blob = <optimized out>
        new_blob_size = 0
        error = 0x6ac140
        data = 0x6ced50
        flush_async_data = 0x0
        __FUNCTION__ = "continue_writing"
#7  0x00007ffff605b309 in continue_writing_in_idle_cb (user_data=<optimized out>) at /var/tmp/portage/dev-libs/glib-2.40.0-r1/work/glib-2.40.0/gio/gdbusprivate.c:1552
        worker = <optimized out>
#8  0x00007ffff67755e3 in g_main_dispatch (context=0x6c7180) at /var/tmp/portage/dev-libs/glib-2.40.0-r1/work/glib-2.40.0/glib/gmain.c:3064
        dispatch = 0x7ffff676f540 <g_idle_dispatch>
        prev_source = 0x0
        was_in_call = 0
        user_data = 0x6c6840
        callback = 0x7ffff605b2f0 <continue_writing_in_idle_cb>
        cb_funcs = 0x7ffff6a70920 <g_source_callback_funcs>
        cb_data = 0x6d9e40
        need_destroy = <optimized out>
        source = 0x6db8d0
        current = 0x6c66b0
        i = 0
#9  g_main_context_dispatch (context=context@entry=0x6c7180) at /var/tmp/portage/dev-libs/glib-2.40.0-r1/work/glib-2.40.0/glib/gmain.c:3663
No locals.
#10 0x00007ffff6775968 in g_main_context_iterate (context=0x6c7180, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at /var/tmp/portage/dev-libs/glib-2.40.0-r1/work/glib-2.40.0/glib/gmain.c:3734
        max_priority = 2147483647
        timeout = -1
        some_ready = 1
        nfds = 2
        allocated_nfds = <optimized out>
        fds = <optimized out>
#11 0x00007ffff6775c2a in g_main_loop_run (loop=0x6c7110) at /var/tmp/portage/dev-libs/glib-2.40.0-r1/work/glib-2.40.0/glib/gmain.c:3928
        __FUNCTION__ = "g_main_loop_run"
#12 0x00007ffff605a076 in gdbus_shared_thread_func (user_data=0x6c7150) at /var/tmp/portage/dev-libs/glib-2.40.0-r1/work/glib-2.40.0/gio/gdbusprivate.c:273
        data = 0x6c7150
#13 0x00007ffff67a2b15 in g_thread_proxy (data=0x6b9680) at /var/tmp/portage/dev-libs/glib-2.40.0-r1/work/glib-2.40.0/glib/gthread.c:764
        thread = 0x6b9680
#14 0x00007ffff6302e03 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#15 0x00007ffff7404dbd in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 1 (LWP 3630):
#0  0x00007ffff6ed5deb in g_base_info_unref (info=0x6cd230) at girepository/gibaseinfo.c:258
        rinfo = 0x6cd230
#1  0x00007ffff6ed5e09 in g_base_info_unref (info=0x6b7f70) at girepository/gibaseinfo.c:262
        rinfo = 0x6b7f70
#2  0x00007ffff710f3dd in info_gc (L=<optimized out>) at gi.c:598
        info = <optimized out>
#3  0x000000000042be1b in lj_BC_FUNCC ()
---Type <return> to continue, or q <return> to quit---
No symbol table info available.
#4  0x000000000042fa3c in lj_gc_finalize_udata ()
No symbol table info available.
#5  0x000000000040aca1 in cpfinalize ()
No symbol table info available.
#6  0x000000000042c15c in lj_vm_cpcall ()
No symbol table info available.
#7  0x000000000040b2cd in lua_close ()
No symbol table info available.
#8  0x0000000000404696 in main ()
No symbol table info available.

lua:

(gdb) thread apply all bt full

Thread 2 (LWP 14933):
#0  g_io_stream_get_output_stream (stream=0x6c123000) at /var/tmp/portage/dev-libs/glib-2.40.0-r1/work/glib-2.40.0/gio/giostream.c:233
        _g_boolean_var_ = <optimized out>
        klass = <error reading variable klass (Cannot access memory at address 0x6c123000)>
#1  0x00007ffff5b8f92f in write_message_continue_writing (data=data@entry=0x641260) at /var/tmp/portage/dev-libs/glib-2.40.0-r1/work/glib-2.40.0/gio/gdbusprivate.c:995
        ostream = <optimized out>
        simple = 0x6c6100
        fd_list = <optimized out>
        __FUNCTION__ = "write_message_continue_writing"
#2  0x00007ffff5b90211 in write_message_async (user_data=0x641260, callback=0x7ffff5b916d0 <write_message_cb>, data=0x641260, worker=<optimized out>) at /var/tmp/portage/dev-libs/glib-2.40.0-r1/work/glib-2.40.0/gio/gdbusprivate.c:1131
No locals.
#3  continue_writing (worker=<optimized out>) at /var/tmp/portage/dev-libs/glib-2.40.0-r1/work/glib-2.40.0/gio/gdbusprivate.c:1531
        old_message = 0x6d1280
        new_blob = <optimized out>
        new_blob_size = 0
        error = 0x64e340
        data = 0x641260
        flush_async_data = 0x0
        __FUNCTION__ = "continue_writing"
#4  0x00007ffff5b90309 in continue_writing_in_idle_cb (user_data=<optimized out>) at /var/tmp/portage/dev-libs/glib-2.40.0-r1/work/glib-2.40.0/gio/gdbusprivate.c:1552
        worker = <optimized out>
#5  0x00007ffff62aa5e3 in g_main_dispatch (context=0x6c4960) at /var/tmp/portage/dev-libs/glib-2.40.0-r1/work/glib-2.40.0/glib/gmain.c:3064
        dispatch = 0x7ffff62a4540 <g_idle_dispatch>
        prev_source = 0x0
        was_in_call = 0
        user_data = 0x6c4100
        callback = 0x7ffff5b902f0 <continue_writing_in_idle_cb>
        cb_funcs = 0x7ffff65a5920 <g_source_callback_funcs>
        cb_data = 0x6df930
        need_destroy = <optimized out>
        source = 0x65ad50
        current = 0x6c3fb0
        i = 0
#6  g_main_context_dispatch (context=context@entry=0x6c4960) at /var/tmp/portage/dev-libs/glib-2.40.0-r1/work/glib-2.40.0/glib/gmain.c:3663
No locals.
#7  0x00007ffff62aa968 in g_main_context_iterate (context=0x6c4960, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at /var/tmp/portage/dev-libs/glib-2.40.0-r1/work/glib-2.40.0/glib/gmain.c:3734
        max_priority = 2147483647
        timeout = -1
        some_ready = 1
        nfds = 2
        allocated_nfds = <optimized out>
        fds = <optimized out>
#8  0x00007ffff62aac2a in g_main_loop_run (loop=0x6871a0) at /var/tmp/portage/dev-libs/glib-2.40.0-r1/work/glib-2.40.0/glib/gmain.c:3928
        __FUNCTION__ = "g_main_loop_run"
#9  0x00007ffff5b8f076 in gdbus_shared_thread_func (user_data=0x6c4930) at /var/tmp/portage/dev-libs/glib-2.40.0-r1/work/glib-2.40.0/gio/gdbusprivate.c:273
        data = 0x6c4930
#10 0x00007ffff62d7b15 in g_thread_proxy (data=0x6b3320) at /var/tmp/portage/dev-libs/glib-2.40.0-r1/work/glib-2.40.0/glib/gthread.c:764
        thread = 0x6b3320
#11 0x00007ffff5e37e03 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#12 0x00007ffff7698dbd in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 1 (LWP 14923):
#0  0x00007ffff62cd85a in allocator_get_magazine_threshold (allocator=0x7ffff65a64e0 <allocator>, allocator=0x7ffff65a64e0 <allocator>, ix=4) at /var/tmp/portage/dev-libs/glib-2.40.0-r1/work/glib-2.40.0/glib/gslice.c:582
        chunk_size = 80
        threshold = <optimized out>
        contention_counter = <optimized out>
#1  thread_memory_magazine2_is_full (tmem=<optimized out>, ix=4) at /var/tmp/portage/dev-libs/glib-2.40.0-r1/work/glib-2.40.0/glib/gslice.c:834
No locals.
#2  g_slice_free1 (mem_size=<optimized out>, mem_block=0x693b20) at /var/tmp/portage/dev-libs/glib-2.40.0-r1/work/glib-2.40.0/glib/gslice.c:1090
        _g_boolean_var_ = <optimized out>
        tmem = 0x61b500
        ix = 4
        chunk_size = 80
#3  0x00007ffff6a0ae09 in g_base_info_unref (info=0x693b70) at girepository/gibaseinfo.c:262
        rinfo = 0x693b70
#4  0x00007ffff6c418ac in callable_param_destroy (param=0x695178) at callable.c:632
No locals.
#5  callable_gc (L=<optimized out>) at callable.c:649
        i = <optimized out>
---Type <return> to continue, or q <return> to quit---
        callable = 0x695138
#6  0x00007ffff7bb2e40 in luaD_precall () from /usr/lib64/liblua.so.5
No symbol table info available.
#7  0x00007ffff7bb35b4 in luaD_call () from /usr/lib64/liblua.so.5
No symbol table info available.
#8  0x00007ffff7bb57ee in luaC_callGCTM () from /usr/lib64/liblua.so.5
No symbol table info available.
#9  0x00007ffff7bb249c in luaD_rawrunprotected () from /usr/lib64/liblua.so.5
No symbol table info available.
#10 0x00007ffff7bbe372 in lua_close () from /usr/lib64/liblua.so.5
No symbol table info available.
#11 0x0000000000401506 in main ()
No symbol table info available.
(gdb) 

Other luajit error:

WARNING, invalid argument count, expected 1, got 0
luajit: /usr/local/share/lua/5.1/lgi/override/GLib-Variant.lua:141: attempt to index local 'val' (a nil value)
stack traceback:
    /usr/local/share/lua/5.1/lgi/override/GLib-Variant.lua:141: in function 'variant_new'
    /usr/local/share/lua/5.1/lgi/override/GLib-Variant.lua:172: in function 'Variant'
    ./client.lua:51: in function 'format_arguments'
    ./client.lua:95: in function 'call_with_proxy'
    ./client.lua:123: in function 'proxy_ready'
    ./client.lua:172: in function <./client.lua:163>
    [C]: in function 'run'
    demo/mpris.lua:27: in main chunk
    [C]: at 0x004055e0
*** Error in `luajit': corrupted double-linked list: 0x0000000001e36600 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x794de)[0x7ff7da3164de]
/lib64/libc.so.6(+0x7a450)[0x7ff7da317450]
/usr/lib64/libgio-2.0.so.0(+0xec876)[0x7ff7d8fdc876]
/usr/lib64/libgio-2.0.so.0(g_simple_async_result_complete+0x87)[0x7ff7d8f669b7]
/usr/lib64/libgio-2.0.so.0(+0xea9ce)[0x7ff7d8fda9ce]
/usr/lib64/libgio-2.0.so.0(+0xeb211)[0x7ff7d8fdb211]
/usr/lib64/libgio-2.0.so.0(+0xec857)[0x7ff7d8fdc857]
/usr/lib64/libgio-2.0.so.0(g_simple_async_result_complete+0x87)[0x7ff7d8f669b7]
/usr/lib64/libgio-2.0.so.0(+0xea9ce)[0x7ff7d8fda9ce]
/usr/lib64/libgio-2.0.so.0(+0xeb211)[0x7ff7d8fdb211]
/usr/lib64/libgio-2.0.so.0(+0xec857)[0x7ff7d8fdc857]
/usr/lib64/libgio-2.0.so.0(g_simple_async_result_complete+0x87)[0x7ff7d8f669b7]
/usr/lib64/libgio-2.0.so.0(+0xea9ce)[0x7ff7d8fda9ce]
/usr/lib64/libgio-2.0.so.0(+0xeb211)[0x7ff7d8fdb211]
/usr/lib64/libgio-2.0.so.0(+0xeb309)[0x7ff7d8fdb309]

Code:
https://github.com/Elv13/wirefu/blob/05996b995c08e46bf40e56dad528e0d4c9e95942/client.lua

Using the VLC interface as most people have it installed, can be reproduced with other:

wirefu.SESSION.org.mpris.MediaPlayer2.vlc("/org/mpris/MediaPlayer2").org.mpris.MediaPlayer2.TrackList.GetTracksMetadata():get(function (array_of_dict)
    print("I know, the number of argument is wrong",#array_of_dict)
end)

It would not be my first crash in career using various DBus clients API, but I don't think the current code should crash like this (Cannot access memory at address 0x6c123000)> is bad.

Any idea?

@Elv13
Copy link
Author

Elv13 commented Oct 25, 2014

One more:

#0  0x00007ffff7e4d550 in ?? ()
No symbol table info available.
#1  0x00007ffff3581368 in call_in_idle_cb (user_data=0x7fffd40035e0)
    at /var/tmp/portage/dev-libs/glib-2.40.2/work/glib-2.40.2/gio/gdbusconnection.c:4875
        invocation = 0x7fffd40035e0
        vtable = <optimized out>
        registration_id = <optimized out>
        subtree_registration_id = <optimized out>
        __FUNCTION__ = "call_in_idle_cb"
#2  0x00007ffff6c275e3 in g_main_dispatch (context=0x6a52c0)
    at /var/tmp/portage/dev-libs/glib-2.40.2/work/glib-2.40.2/glib/gmain.c:3064
        dispatch = 0x7ffff6c21540 <g_idle_dispatch>
        prev_source = 0x0
        was_in_call = 0
        user_data = 0x7fffd40035e0
        callback = 0x7ffff3581240 <call_in_idle_cb>
        cb_funcs = 0x7ffff6f22920 <g_source_callback_funcs>
        cb_data = 0x7fffd4005ff0
        need_destroy = <optimized out>
        source = 0x7fffd4005f80
        current = 0x73f160
        i = 0
#3  g_main_context_dispatch (context=context@entry=0x6a52c0)
    at /var/tmp/portage/dev-libs/glib-2.40.2/work/glib-2.40.2/glib/gmain.c:3663
No locals.
#4  0x00007ffff6c27968 in g_main_context_iterate (context=0x6a52c0, block=block@entry=1, dispatch=dispatch@entry=1, 
    self=<optimized out>) at /var/tmp/portage/dev-libs/glib-2.40.2/work/glib-2.40.2/glib/gmain.c:3734
        max_priority = 2147483647
        timeout = 806
        some_ready = 1
        nfds = 4
        allocated_nfds = <optimized out>
        fds = <optimized out>
#5  0x00007ffff6c27c2a in g_main_loop_run (loop=0x82bf40) at /var/tmp/portage/dev-libs/glib-2.40.2/work/glib-2.40.2/glib/gmain.c:3928
        __FUNCTION__ = "g_main_loop_run"
#6  0x000000000040de0a in main (argc=<optimized out>, argv=<optimized out>) at /home/kde-devel/kde/src/awesome/awesome.c:538
        confpath = 0x0
        xfd = <optimized out>
        i = <optimized out>
        opt = <optimized out>
        cmdlen = <optimized out>
        xdg = {reserved = 0x6a0890}
        no_argb = <optimized out>
        run_test = <optimized out>
        tree_c = {sequence = 4}
        long_options = {{name = 0x43f859 "help", has_arg = 0, flag = 0x0, val = 104}, {name = 0x43f85e "version", has_arg = 0, 
            flag = 0x0, val = 118}, {name = 0x43f866 "config", has_arg = 1, flag = 0x0, val = 99}, {name = 0x43f86d "check", 
            has_arg = 0, flag = 0x0, val = 107}, {name = 0x43f873 "no-argb", has_arg = 0, flag = 0x0, val = 97}, {name = 0x0, 
            has_arg = 0, flag = 0x0, val = 0}}
        __FUNCTION__ = "main"
        sa = {__sigaction_handler = {sa_handler = 0x40d266 <signal_fatal>, sa_sigaction = 0x40d266 <signal_fatal>}, sa_mask = {
            __val = {0 <repeats 16 times>}}, sa_flags = -2147483648, sa_restorer = 0x0}
        channel = <optimized out>
        query = <optimized out>
        xmapping_cookie = <optimized out>
        tmp_win = <optimized out>
(gdb) 

This one happen when invoking a service created with LGI

EDIT:

The valgrind:

This one doesn't cause a crash

attempting to create a server
==3748== Invalid read of size 4
==3748==    at 0x40733E: lj_str_new (in /usr/bin/luajit-2.0.1)
==3748==    by 0x40FD81: lua_pushstring (in /usr/bin/luajit-2.0.1)
==3748==    by 0x5D029EA: lgi_marshal_2lua (marshal.c:1179)
==3748==    by 0x5CFA9E8: callable_param_2lua (callable.c:757)
==3748==    by 0x5CFCF7C: callable_call (callable.c:925)
==3748==    by 0x42BE1A: lj_BC_FUNCC (in /usr/bin/luajit-2.0.1)
==3748==    by 0x5D03FA9: lgi_marshal_access (marshal.c:1445)
==3748==    by 0x42BE1A: lj_BC_FUNCC (in /usr/bin/luajit-2.0.1)
==3748==    by 0x41201F: lua_pcall (in /usr/bin/luajit-2.0.1)
==3748==    by 0x5CFAF51: closure_callback (callable.c:1212)
==3748==    by 0x68F47BE: ffi_closure_unix64_inner (in /usr/lib64/libffi.so.6.0.1)
==3748==    by 0x68F4FC3: ffi_closure_unix64 (in /usr/lib64/libffi.so.6.0.1)
==3748==  Address 0x59a2fec is 28 bytes inside a block of size 30 alloc'd
==3748==    at 0x4C29C4B: malloc (vg_replace_malloc.c:270)
==3748==    by 0x65F80E0: g_malloc (gmem.c:97)
==3748==    by 0x6615A7E: g_strdup (gstrfuncs.c:356)
==3748==    by 0x617E2FF: g_value_set_string (gvaluetypes.c:1038)
==3748==    by 0x6DF74C7: own_with_closures_on_bus_acquired (gdbusnameowning.c:720)
==3748==    by 0x6DF7EFB: connection_get_cb (gdbusnameowning.c:479)
==3748==    by 0x6D8AE76: g_simple_async_result_complete (gsimpleasyncresult.c:763)
==3748==    by 0x6D8AED8: complete_in_idle_cb (gsimpleasyncresult.c:775)
==3748==    by 0x65F15E2: g_main_context_dispatch (gmain.c:3064)
==3748==    by 0x65F1967: g_main_context_iterate.isra.24 (gmain.c:3734)
==3748==    by 0x65F1C29: g_main_loop_run (gmain.c:3928)
==3748==    by 0x68F4E5B: ffi_call_unix64 (in /usr/lib64/libffi.so.6.0.1)
==3748== 

But this one does

==7178== Invalid read of size 4
==7178==    at 0x68F4698: ffi_closure_unix64_inner (in /usr/lib64/libffi.so.6.0.1)
==7178==    by 0x68F4FC3: ffi_closure_unix64 (in /usr/lib64/libffi.so.6.0.1)
==7178==    by 0x6DEC367: call_in_idle_cb (gdbusconnection.c:4875)
==7178==    by 0x65F15E2: g_main_context_dispatch (gmain.c:3064)
==7178==    by 0x65F1967: g_main_context_iterate.isra.24 (gmain.c:3734)
==7178==    by 0x65F1C29: g_main_loop_run (gmain.c:3928)
==7178==    by 0x68F4E5B: ffi_call_unix64 (in /usr/lib64/libffi.so.6.0.1)
==7178==    by 0x68F3FB7: ffi_call (in /usr/lib64/libffi.so.6.0.1)
==7178==    by 0x5CFCE6C: callable_call (callable.c:909)
==7178==    by 0x42BE1A: lj_BC_FUNCC (in /usr/bin/luajit-2.0.1)
==7178==    by 0x41201F: lua_pcall (in /usr/bin/luajit-2.0.1)
==7178==    by 0x406050: pmain (in /usr/bin/luajit-2.0.1)
==7178==  Address 0x4 is not stack'd, malloc'd or (recently) free'd
==7178== 
==7178== 
==7178== Process terminating with default action of signal 11 (SIGSEGV)
==7178==  Access not within mapped region at address 0x4
==7178==    at 0x68F4698: ffi_closure_unix64_inner (in /usr/lib64/libffi.so.6.0.1)
==7178==    by 0x68F4FC3: ffi_closure_unix64 (in /usr/lib64/libffi.so.6.0.1)
==7178==    by 0x6DEC367: call_in_idle_cb (gdbusconnection.c:4875)
==7178==    by 0x65F15E2: g_main_context_dispatch (gmain.c:3064)
==7178==    by 0x65F1967: g_main_context_iterate.isra.24 (gmain.c:3734)
==7178==    by 0x65F1C29: g_main_loop_run (gmain.c:3928)
==7178==    by 0x68F4E5B: ffi_call_unix64 (in /usr/lib64/libffi.so.6.0.1)
==7178==    by 0x68F3FB7: ffi_call (in /usr/lib64/libffi.so.6.0.1)
==7178==    by 0x5CFCE6C: callable_call (callable.c:909)
==7178==    by 0x42BE1A: lj_BC_FUNCC (in /usr/bin/luajit-2.0.1)
==7178==    by 0x41201F: lua_pcall (in /usr/bin/luajit-2.0.1)
==7178==    by 0x406050: pmain (in /usr/bin/luajit-2.0.1)
==7178==  If you believe this happened as a result of a stack
==7178==  overflow in your program's main thread (unlikely but
==7178==  possible), you can try to increase the size of the
==7178==  main thread stack using the --main-stacksize= flag.
==7178==  The main thread stack size used in this run was 8388608.
==7178== 
==7178== HEAP SUMMARY:
==7178==     in use at exit: 247,212 bytes in 3,385 blocks
==7178==   total heap usage: 13,742 allocs, 10,357 frees, 852,168 bytes allocated
==7178== 
==7178== LEAK SUMMARY:
==7178==    definitely lost: 64 bytes in 1 blocks
==7178==    indirectly lost: 104 bytes in 1 blocks
==7178==      possibly lost: 8,068 bytes in 143 blocks
==7178==    still reachable: 230,104 bytes in 3,176 blocks
==7178==         suppressed: 0 bytes in 0 blocks
==7178== Rerun with --leak-check=full to see details of leaked memory
==7178== 

I tried to install FFI with more debug symbols, but given how it work this did not improve the valgrind log. The closure here is declared using

core.marshal.callback(Gio.DBusInterfaceMethodCallFunc ,
    function(conn, sender, path, interface_name,method_name,parameters,invok)
   --...
   Gio.DBusMethodInvocation.return_value(invok,gvar)
   ---...

It work fine maybe 30 times and then it crash. I tested could always reproduce by calling the method using a "while true" from bash

@ff2000
Copy link

ff2000 commented Apr 16, 2015

I tried to reproduce but had no luck :/
Do you build your system from source? I am using Gentoo and there is a general advice to rebuild dbus-glib after a glib-update. Seems it's possible to hit weird issues...
But of course there are still other possible sources: Bad CFLAGS, to name one.

@psychon
Copy link
Collaborator

psychon commented May 9, 2016

I cannot reproduce anything either, but I just noticed that all(?) the backtraces contain lua_close(). I don't think it is safe to try to tear everything down while some operations are still going on. This likely also explains all use-after-frees.
Still, I'd like to be able to reproduce this....

@mlen
Copy link

mlen commented Oct 23, 2016

Hey, I was trying to get a semi-nice API, but looks like I was able to produce a nice example of segfaulting code: https://gist.github.com/mlen/61b474ad7fa32a274187bf52f0bcb6b1

I probably did something wrong, but I can't see anything obvious.

@psychon
Copy link
Collaborator

psychon commented Oct 23, 2016

I can reproduce that crash and "minimised" it to the following code:

local lgi = require 'lgi'
local Gio = lgi.require 'Gio'
local GLib = lgi.require 'GLib'

local name = "org.freedesktop.DBus"
local path = "/"
local iface = "org.freedesktop.DBus"

local bus = Gio.bus_get_sync(Gio.BusType.SESSION)
local result, e = bus:call_sync(name, path, "org.freedesktop.DBus.Introspectable", "Introspect", nil, nil, Gio.DBusSendMessageFlags.NONE, -1, nil)
assert(result, e)

local info = Gio.DBusNodeInfo.new_for_xml(result.value[1]):lookup_interface(iface)
local proxy = Gio.DBusProxy.new_sync(bus, Gio.DBusProxyFlags.NONE, info, name, path, iface, nil)

local info2 = proxy:get_interface_info()
print(info, info2)

Valgrind's output sounds like a double free (g_boxed_free calls g_dbus_node_info_unref which accesses data already freed by g_boxed_free before). Since the last thing done is a call do g_dbus_proxy_get_interface_info and it does not crash without this call, I'm 95% sure that lgi assumes that it has ownership of the returned info while the docs disagree:

Do not unref the returned object, it is owned by proxy .

@pavouk This should tell you all you need for fixing this, right?

@psychon
Copy link
Collaborator

psychon commented Oct 23, 2016

Oh, another thing: @mlen Does the following fix your crash?

--- ldbus.lua.before    2016-10-23 09:35:51.727075728 +0200
+++ ldbus.lua   2016-10-23 09:36:34.382285509 +0200
@@ -7,7 +7,7 @@ ldbus.Bus = {}
 ldbus.Object = {}

 function ldbus.Object:__index(method)
-  local info = self.proxy:get_interface_info()
+  local info = self.info

   local minfo = info:lookup_method(method)
   if minfo then
@@ -86,8 +86,8 @@ function ldbus.Bus:GetObject(name, path,
     return nil
   end

-  local info = Gio.DBusNodeInfo.new_for_xml(result.value[1]):lookup_interface(iface)
-  table.proxy = Gio.DBusProxy.new_sync(self.bus, Gio.DBusProxyFlags.NONE, info, name, path, iface, nil)
+  table.info = Gio.DBusNodeInfo.new_for_xml(result.value[1]):lookup_interface(iface)
+  table.proxy = Gio.DBusProxy.new_sync(self.bus, Gio.DBusProxyFlags.NONE, table.info, name, path, iface, nil)

   if not table.proxy then
     return nil

Edit: @Elv13 wirefu also uses :get_interface_info on a proxy. Perhaps it makes sense to try without it as well?

@mlen
Copy link

mlen commented Oct 23, 2016

@psychon it does fix it. Thanks!

@mlen
Copy link

mlen commented Oct 23, 2016

Actually it turned out passing table.info to Gio.DBusProxy.new_sync is not neccessary

@pavouk
Copy link
Collaborator

pavouk commented Oct 23, 2016

Thanks a lot for report and also exactly pinning down the place where the problem is. As expected, get_interface_info() is the culprit; see the snippet from its definition from Gio-2.0.gir:

<method name="get_interface_info"
              c:identifier="g_dbus_proxy_get_interface_info"
              version="2.26">
        <doc xml:space="preserve">Returns the #GDBusInterfaceInfo, if any, specifying the interface
that @proxy conforms to. See the #GDBusProxy:g-interface-info
property for more details.</doc>
        <return-value transfer-ownership="full">
          <doc xml:space="preserve">A #GDBusInterfaceInfo or %NULL. Do not unref the returned
object, it is owned by @proxy.</doc>
          <type name="DBusInterfaceInfo" c:type="GDBusInterfaceInfo*"/>
        </return-value>

Doc string implies that ownership transfer is none, while the actual attribute does transfer-ownership="full". This is technical error in Gio annotation, so the report to gnome bugzilla would be in order. However, since waiting for the fix in Gio would take forever (a GLib release cycle, plus it would remain broken forever for existing GLib versions), it would be nice to fix it by replacing auto-imported get_interface_info specs with custom ffi-made spec and creating method definition by core.callable.new, as is already the case in a few places in existing overrides. I can do it when I get to it, although it might take some time, I've now even less time for lgi than usual :-(

@pavouk
Copy link
Collaborator

pavouk commented Oct 23, 2016

BTW, this should be probably split into separate issue, because I don't think that this is the original non-deterministic problem for which this issue was created. Bad annotation of get_interface_info will probably get very deterministic crash whenever it is hit.

@pavouk
Copy link
Collaborator

pavouk commented Oct 23, 2016

@mlen - your 'wrapper' is really cool! I was always thinking about built-in dbus support in this style, but I never got to actually implement it. If you are interested in building such support directly into lgi (i.e. mostly by extending existing imported classes instead of building parallel ones like ldbus.Bus), it would be very welcome!

@psychon
Copy link
Collaborator

psychon commented Oct 24, 2016

BTW, this should be probably split into separate issue, because I don't think that this is the original non-deterministic problem for which this issue was created. Bad annotation of get_interface_info will probably get very deterministic crash whenever it is hit.

Why not? Calling get_interface_info should work fine (and I checked: @Elv13's wirefu does use it). Later, when the Lua garbage collector cleans up, we get a double free. So this is not deterministic at all.

@mlen hit this earlier, because the GC never run and so the crash occurred during lua_close. For @Elv13 I think it is very likely that his wirefu code runs long enough so that the GC "hits at a random time". Both his backtraces are during GC activity (first "normal" gc, second during lua_close) and the third error message is a "some corruption"-message from malloc.

All of these can very well be caused by the wrong annotation.

This is technical error in Gio annotation, so the report to gnome bugzilla would be in order.

Since you are "my local G* expert", I would prefer if you were to report this. If I don't hear anything from you in this direction, I'll try to do this next weekend.

it would be nice to fix it by replacing auto-imported get_interface_info specs with custom ffi-made spec and creating method definition by core.callable.new, as is already the case in a few places in existing overrides. I can do it when I get to it, although it might take some time, I've now even less time for lgi than usual.

Sorry to hear that about your time. I'll try to cook up a patch, but in the past I never managed to do that for this kind of issue.

@psychon
Copy link
Collaborator

psychon commented Oct 24, 2016

Sorry, unrelated spam, but: @mlen

--- ldbus.lua.orig  2016-10-24 10:29:41.144193232 +0200
+++ ldbus.lua   2016-10-24 10:41:16.867617506 +0200
@@ -17,7 +17,7 @@ function ldbus.Object:__index(method)

       if #args == #minfo.in_args and #args > 0 then
         for i, t in ipairs(minfo.in_args) do
-          serialized[i] = GLib.Variant(t, args[i])
+          serialized[i] = GLib.Variant(t.signature, args[i])
         end
       end

@@ -32,9 +32,9 @@ function ldbus.Object:__index(method)
   else
     local pinfo = info:lookup_property(method)
     if pinfo then
-      return self.bus:GetObject(self.object_name, self.object_path, "org.freedesktop.DBus.Properties").Get(self.object_iface, method)
+      return self.proxy:get_cached_property(method)
     else
-      return nil
+      error("unknown key: " .. tostring(method))
     end
   end
 end
@@ -95,6 +95,9 @@ function ldbus.Bus:GetObject(name, path,
   return table
 end

-local bus = ldbus.Bus:Open('SYSTEM')
-local ck = bus:GetObject('org.freedesktop.ConsoleKit', '/org/freedesktop/ConsoleKit/Manager', 'org.freedesktop.ConsoleKit.Manager')
-print(ck.CanStop())
+local bus = ldbus.Bus:Open('SESSION')
+local ck = bus:GetObject('org.gnome.evince.Daemon', '/org/gnome/evince/Daemon', 'org.gnome.evince.Daemon')
+print("Call result:", ck.FindDocument("file:///tmp", false))
+local ck = bus:GetObject('org.gnome.evince.Daemon', '/org/gnome/evince/Daemon', 'org.gtk.Application')
+print("Property Busy", ck.Busy:print())
+print("done")

@psychon
Copy link
Collaborator

psychon commented Oct 29, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants