Skip to content

flatpak-update-worker busy-loops at ~46% CPU when parent mintUpdate process exits #1041

@Fossie1973

Description

@Fossie1973

Summary

flatpak-update-worker processes launched with -u (update-packages) enter an infinite busy loop consuming ~46% CPU each when the parent mintUpdate process dies or closes the stdin pipe. The workers become orphaned (PPID 1) and spin indefinitely until manually killed.

Environment

  • Linux Mint 22.3 (Zena)
  • mintupdate 7.1.4
  • mint-common 2.5.1
  • flatpak 1.14.6-1ubuntu0.1
  • Python 3.12.3

Bug

The bug is in message_from_updater() in flatpak-update-worker.py (line 312). When the parent closes the stdin pipe, read_bytes_async completes immediately with empty bytes (EOF). The if bytes_read: block is skipped, but read_bytes_async is re-scheduled unconditionally at line 312:

def message_from_updater(self, pipe, res):
    if self.cancellable is None or self.cancellable.is_cancelled():
        return

    try:
        bytes_read = pipe.read_bytes_finish(res)
    except GLib.Error as e:
        if e.code != Gio.IOErrorEnum.CANCELLED:
            warn("Error reading from updater: %s" % e.message)
        return

    if bytes_read:
        message = bytes_read.get_data().decode().strip("\n")
        # ... handle message ...

    # BUG: this runs even when bytes_read is empty (EOF)
    pipe.read_bytes_async(4096, GLib.PRIORITY_DEFAULT, self.cancellable, self.message_from_updater)

Each EOF read completes immediately, the callback fires, and it re-schedules another read. This creates a tight loop that never blocks.

Suggested fix

Move the re-schedule inside the if bytes_read: block and handle EOF explicitly:

    if bytes_read:
        message = bytes_read.get_data().decode().strip("\n")
        # ... handle message ...
        pipe.read_bytes_async(4096, GLib.PRIORITY_DEFAULT, self.cancellable, self.message_from_updater)
    else:
        # EOF — parent closed the pipe
        self.quit()

Evidence

Two orphaned workers were found running for over 24 hours, consuming ~46% CPU each.

I/O stats confirm a busy loop — ~1.18 billion read syscalls with zero bytes actually read:

rchar: 7834362
syscr: 1181281207
read_bytes: 65536

Sampling over 2 seconds showed ~4,600 syscalls/sec with no bytes transferred.

Thread states:

Thread State Activity
Main (flatpak-update-) R (running) 100% of process CPU time
pool-spawner S (sleeping) Idle
gmain S (sleeping) Idle, poll with timeout=-1
gdbus S (sleeping) Idle, poll with timeout=-1
dconf worker S (sleeping) Idle
flatpak-update- (secondary) S (sleeping) Blocked on GIL (futex wait)

GDB backtrace of main thread (both workers show the same pattern):

#0  __stpcpy_sse2_unaligned () at strcpy-sse2-unaligned.S:45
#1  g_strconcat () from libglib-2.0.so.0
#2-#4  libgio internals
#5  g_input_stream_read_async () from libgio-2.0.so.0
#6  g_input_stream_read_bytes_async () from libgio-2.0.so.0
#7-#9  libffi
#10-#12  gi._gi.cpython-312 (Python → GI call)
#13 _PyObject_MakeTpCall ()
#14 _PyEval_EvalFrameDefault ()         ← message_from_updater calling pipe.read_bytes_async()
#15-#16  gi._gi (completion callback from previous read)
#17-#18  libffi
#19-#22  libgio (dispatching read completion)
#23-#24  libgio (input stream internals)
#25  g_main_context dispatch
#26-#27 g_main_loop_run () from libglib-2.0.so.0
#28 gtk_main () from libgtk-3.so.0

The secondary flatpak thread is blocked waiting for the GIL, called from a g_signal_emit in libflatpak.so.0 — it can never acquire the lock because the main thread never yields.

Additional context from xsession-errors:

Multiple prior warnings suggest a broader issue with concurrent flatpak cache operations that may have contributed to the parent dying:

flatpak-update-worker (WARN): cache not valid, refreshing
flatpak-WARNING: Invalid checksum for indexed summary ...
mint-common (WARN): Could not update appstream for flathub: Error deploying appstream: Error moving file ... File exists
mint-common (WARN): Could not update appstream for flathub: Error deploying appstream: Error moving file ... Directory not empty

These "File exists" / "Directory not empty" errors appear dozens of times, suggesting a race condition when two workers concurrently update the appstream cache.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions