Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xb-self-test intermittently hang #59

Closed
oSoMoN opened this issue Jun 9, 2020 · 10 comments · Fixed by #83
Closed

xb-self-test intermittently hang #59

oSoMoN opened this issue Jun 9, 2020 · 10 comments · Fixed by #83

Comments

@oSoMoN
Copy link

oSoMoN commented Jun 9, 2020

The autopkgtests for libxmlb in Ubuntu failed a couple of times recently on amd64, and I managed to reproduce the failure by running the test in an infinite loop.

I'm attaching a full log. The test always appears to hang after ok 14 /libxmlb/builder{ensure}, and it prints Executing: libxmlb/libxmlb.test every 5 seconds until it times out after 5 minutes.

This is on Ubuntu groovy (amd64).

@hughsie
Copy link
Owner

hughsie commented Jun 9, 2020

the

Executing: libxmlb/libxmlb.test
Executing: libxmlb/libxmlb.test

is super weird. I wonder if running it in the same environment using valgrind would show anything bad.

@oSoMoN
Copy link
Author

oSoMoN commented Jun 9, 2020

Here is the relevant output of a failed run under valgrind (and here is the full log):

Test timed out after 300 seconds
Executing: libxmlb/libxmlb.test
FAIL: libxmlb/libxmlb.test (Child process killed by signal 9)
SUMMARY: total=1; passed=0; skipped=0; failed=1; user=0.0s; system=0.0s; maxrss=6032
FAIL: libxmlb/libxmlb.test (Child process killed by signal 9)
==123981== 
==123981== HEAP SUMMARY:
==123981==     in use at exit: 1,531 bytes in 44 blocks
==123981==   total heap usage: 46 allocs, 2 frees, 1,563 bytes allocated
==123981== 
==123981== LEAK SUMMARY:
==123981==    definitely lost: 0 bytes in 0 blocks
==123981==    indirectly lost: 0 bytes in 0 blocks
==123981==      possibly lost: 0 bytes in 0 blocks
==123981==    still reachable: 1,531 bytes in 44 blocks
==123981==         suppressed: 0 bytes in 0 blocks
==123981== Rerun with --leak-check=full to see details of leaked memory
==123981== 
==123981== For lists of detected and suppressed errors, rerun with: -s
==123981== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

@hughsie
Copy link
Owner

hughsie commented Jun 9, 2020

So meson SIGKILLd the process after 300 seconds. How long does it take on a successful build? It's not getting scheduled on a machine that's 100% loaded or something?

@oSoMoN
Copy link
Author

oSoMoN commented Jun 9, 2020

That's a successful run in a groovy chroot on my laptop, and CPU usage seems irrelevant (I can get successes and failures both at rest and under heavy load − locally building firefox):

# time ./debian/tests/installed-tests > /dev/null

real	0m0.279s
user	0m0.044s
sys	0m0.012s

Note that meson is not involved here, I'm running the tests from the debian package, which simply invoke gnome-desktop-testing-runner libxmlb (with gnome-desktop-testing and libxmlb-tests suitably installed, of course).

@hughsie
Copy link
Owner

hughsie commented Jun 9, 2020

I'm out of ideas I'm afraid.

@hughsie hughsie closed this as completed Jan 21, 2021
@iainlane
Copy link
Contributor

btw I saw this too and got a backtrace (line numbers are from 0.1.15-2 as in Debian/Ubuntu). It appears to be deadlocking on a monitor:

(gdb) bt
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00007fc9150d57cc in g_mutex_lock_slowpath (mutex=mutex@entry=0x7fc9153a0318 <g.inotify_lock_lock>)
    at ../../../glib/gthread-posix.c:1458
#2  0x00007fc9150d6256 in g_mutex_lock (mutex=mutex@entry=0x7fc9153a0318 <g.inotify_lock_lock>) at ../../../glib/gthread-posix.c:1482
#3  0x00007fc915309de4 in _ih_sub_cancel (sub=0x55a566f5be10) at ../../../gio/inotify/inotify-helper.c:121
#4  0x00007fc91530a06e in g_inotify_file_monitor_cancel (monitor=0x55a566f501a0) at ../../../gio/inotify/ginotifyfilemonitor.c:75
#5  0x00007fc91523680b in g_file_monitor_cancel (monitor=0x55a566f501a0) at ../../../gio/gfilemonitor.c:241
#6  g_file_monitor_cancel (monitor=0x55a566f501a0) at ../../../gio/gfilemonitor.c:235
#7  0x00007fc915236890 in g_file_monitor_dispose (object=0x55a566f501a0) at ../../../gio/gfilemonitor.c:123
#8  0x00007fc9152fe8b3 in g_local_file_monitor_dispose (object=0x55a566f501a0) at ../../../gio/glocalfilemonitor.c:814
#9  0x00007fc915179c93 in g_object_unref (_object=<optimized out>) at ../../../gobject/gobject.c:3461
#10 g_object_unref (_object=0x55a566f501a0) at ../../../gobject/gobject.c:3391
#11 0x000055a5666dce9d in xb_silo_file_monitor_item_free (item=0x55a566f53810) at ../src/xb-silo.c:1220
#12 0x00007fc915075172 in g_hash_table_remove_all_nodes (hash_table=hash_table@entry=0x55a566f62b60, notify=notify@entry=1, 
    destruction=destruction@entry=1) at ../../../glib/ghash.c:706
#13 0x00007fc915076533 in g_hash_table_remove_all_nodes (destruction=1, notify=1, hash_table=0x55a566f62b60)
    at ../../../glib/ghash.c:1461
#14 g_hash_table_unref (hash_table=0x55a566f62b60) at ../../../glib/ghash.c:1461
#15 0x000055a5666dd445 in xb_silo_finalize (obj=0x55a566f47a00) at ../src/xb-silo.c:1320
#16 0x00007fc915179d0e in g_object_unref (_object=<optimized out>) at ../../../gobject/gobject.c:3499
#17 g_object_unref (_object=0x55a566f47a00) at ../../../gobject/gobject.c:3391
#18 0x000055a5666c4864 in xb_builder_finalize (obj=0x55a566f46da0) at ../src/xb-builder.c:1027
#19 0x00007fc915179d0e in g_object_unref (_object=<optimized out>) at ../../../gobject/gobject.c:3499
#20 g_object_unref (_object=0x55a566f46da0) at ../../../gobject/gobject.c:3391
#21 0x000055a5666db999 in glib_autoptr_clear_GObject (_ptr=0x55a566f46da0) at /usr/include/glib-2.0/gobject/gobject-autocleanups.h:27
#22 glib_autoptr_clear_XbBuilder (_ptr=0x55a566f46da0) at ../src/xb-builder.h:20
#23 glib_autoptr_cleanup_XbBuilder (_ptr=<synthetic pointer>) at ../src/xb-builder.h:20
#24 xb_builder_ensure_watch_source_func () at ../src/xb-self-test.c:501
#25 0x00007fc9150b13ce in test_case_run (tc=0x55a566f43b00) at ../../../glib/gtestutils.c:2633
#26 g_test_run_suite_internal (suite=suite@entry=0x55a566f42440, path=path@entry=0x0) at ../../../glib/gtestutils.c:2721
#27 0x00007fc9150b1174 in g_test_run_suite_internal (suite=suite@entry=0x55a566f42420, path=path@entry=0x0)
    at ../../../glib/gtestutils.c:2733
#28 0x00007fc9150b18ba in g_test_run_suite (suite=0x55a566f42420) at ../../../glib/gtestutils.c:2808
#29 0x00007fc9150b18d5 in g_test_run () at ../../../glib/gtestutils.c:2043
#30 0x000055a5666c38bb in main (argc=<optimized out>, argv=<optimized out>) at ../src/xb-self-test.c:2423
(gdb) 

@hughsie
Copy link
Owner

hughsie commented Jan 25, 2021

@pwithnall am i doing anything dumb? g_file_monitor_cancel() shouldn't block for that long, no?

@iainlane
Copy link
Contributor

I think it's probably the same as https://gitlab.gnome.org/GNOME/glib/-/issues/1941 - testing the suggested fix there now

@pwithnall
Copy link
Collaborator

Yeah it’s probably that.

If anyone wants to fix the GLib bug, please be my guest 🙂

iainlane added a commit to iainlane/libxmlb that referenced this issue Jan 25, 2021
GFileMonitors currently need to be explicitly cancelled when they are
unreffed, otherwise they can deadlock due to the GLib-internal worker
thread also holding a strong reference to the monitor.

For full details, see: https://gitlab.gnome.org/GNOME/glib/-/issues/1941

Closes hughsie#59
@iainlane
Copy link
Contributor

Yeah that works, see #83, thanks!

hughsie pushed a commit that referenced this issue Jan 25, 2021
GFileMonitors currently need to be explicitly cancelled when they are
unreffed, otherwise they can deadlock due to the GLib-internal worker
thread also holding a strong reference to the monitor.

For full details, see: https://gitlab.gnome.org/GNOME/glib/-/issues/1941

Closes #59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants