Skip to content

android: Fix GC hook installation on Android 16 (CMC + stripped libart)#394

Open
Kolektori wants to merge 1 commit into
frida:mainfrom
Kolektori:android-16-libart-cmc-gc-hook
Open

android: Fix GC hook installation on Android 16 (CMC + stripped libart)#394
Kolektori wants to merge 1 commit into
frida:mainfrom
Kolektori:android-16-libart-cmc-gc-hook

Conversation

@Kolektori
Copy link
Copy Markdown

Fixes #387.

On Android 16 (e.g. build BP4A.251205.006, com.android.art@361302280) the process crashes with a NULL dereference in art::CodeInfo::DecodeGcMasksOnly during a GC stack walk whenever a replacement ArtMethod is on some thread's stack when the collector runs. The reporter's backtrace in #387 shows the fault driven by art::gc::collector::MarkCompact::RunPhasesVisitRootsStackVisitor::WalkStackDecodeGcMasksOnly.

Two A16 libart.so changes together break the GC synchronization machinery in ensureArtKnowsHowToHandleReplacementMethods and instrumentArtGarbageCollection:

  1. libart.so is now stripped — .symtab is gone but the library retains a .gnu_debugdata section (LZMA-compressed mini-debuginfo). Module.findSymbolByName reads .dynsym + .symtab and so returns null for Heap::CollectGarbageInternal and ConcurrentCopying::CopyingPhase on A16, silently skipping the hook install. Module.enumerateSymbols() does parse the mini-debuginfo, so the symbols are still reachable that way.

  2. A16 defaults to Concurrent Mark Compact (CMC) instead of Concurrent Copying. Even if we resolve ConcurrentCopying::CopyingPhase, it never fires under CMC, so replacement ArtMethods are never re-synchronized after compaction. MarkCompact::RunPhases is the CMC lifecycle-event equivalent.

Changes

  • Add a small resolveDebugdataSymbol(module, name) fallback that lazily caches Module.enumerateSymbols() per module, and plumb it into temporaryApi.find as a last resort after findExportByName / findSymbolByName. Same plumbing restores Heap::CollectGarbageInternal resolution transparently for all ART-symbol callers.
  • Reroute the two raw art.findSymbolByName(...) call sites in instrumentArtGarbageCollection and instrumentArtFixupStaticTrampolines through api.find so they pick up the mini-debuginfo fallback.
  • When CopyingPhase is inlined (also seen on A16+), fall back to ConcurrentCopying::RunPhases as the hook point — one level up in the same phase-driver function.
  • Additionally hook MarkCompact::RunPhases with the existing artController.hooks.Gc.copyingPhase callback. The callback is collector-agnostic — it just synchronizes entrypoints at a "world is consistent again" lifecycle point — so reusing it for CMC is correct. Both hooks can coexist; only the active collector dispatches its phase.

Net diff: lib/android.js +38 / −4.

Testing

Reproduced #387 on a Cuttlefish x86_64 guest running aosp-android-latest-release (build 15150359, API 36, same libart BuildId class as the Pixel 7 reporter's build). Unpatched 7.0.13: HeapTaskDaemon SIGSEGV within seconds of attaching a .implementation hook to any hot constructor (e.g. java.net.URL.<init>(String) or java.io.File.<init>(String)).

With this patch applied, the following hook set runs to completion simultaneously on the same target for a full 90s analysis window:

  • .implementation swaps on java.net.URL.<init>(String) and java.io.File.<init>(String)
  • .implementation swaps on all 17 java.lang.StringFactory.newString* overloads
  • Interceptor.attach on art::mirror::String::AllocFromModifiedUtf8 (3 overloads) and AllocFromUtf16
  • Interceptor.attach on libc __system_property_get, __system_property_find, open, fopen*, freopen*, stat
  • Interceptor.attach on libdl dynamic-loader exports
  • Interceptor.attach on libart/libdexfile dex-retrieval paths

Observed: zero tombstones, no DecodeGcMasksOnly frames, full MITM / logcat / media / trace artifact set collected, hooks actively firing (600+ __system_property_get rewrites, 121 File.<init> callbacks, 60 MessageDigest callbacks in one run).

Notes

  • No behavior change on pre-A16 builds: the new MarkCompact lookup simply returns null there via api.find, and the mini-debuginfo fallback is a no-op when findSymbolByName already succeeds.
  • WeakMap-keyed symbol cache so entries die with the module.
  • Kept Thread::RunFlipFunction hook as-is — still exported, still correct on CC builds.

Android 16 ships libart.so without .symtab, so findExportByName and
findSymbolByName miss internal ART symbols. Parse .gnu_debugdata via
enumerateSymbols() and cache the result per module.

Also attach the GC synchronize-on-leave hook to MarkCompact::RunPhases
for Android 16's Concurrent Mark Compact collector, and fall back to
ConcurrentCopying::RunPhases when CopyingPhase is inlined.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Kolektori Kolektori marked this pull request as ready for review April 23, 2026 08:55
Comment thread lib/android.js
if (byName === undefined) {
byName = new Map();
try {
for (const sym of module.enumerateSymbols()) {
Copy link
Copy Markdown
Member

@oleavr oleavr May 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed write-up.

Before merging I want to understand the findSymbolByName vs enumerateSymbols split, because in Gum both go through the same gum_elf_module_enumerate_symbols, which already falls back to .gnu_debugdata when .symtab is missing. So in principle they shouldn't disagree.

A few questions:

  1. Which frida / frida-server version on the A16 device? The mini-debuginfo fallback landed in gum 8ed32c4d (Dec 2024), the dynsym fallback in 01eadbff (Mar 2026).
  2. Does findSymbolByName('libart.so', '_ZN3art2gc4Heap22CollectGarbageInternalENS0_9collector6GcTypeENS0_7GcCauseEbj') start working after an
    enumerateSymbols() pass on the same module? That would point at a state/ordering bug in Gum.
  3. readelf -S on the device's libart.so — which of .symtab / .dynsym / .gnu_debugdata are present?

If it's a Gum bug I'd rather fix it there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Android 16] App crashes during Garbage Collection when Frida is attached (Build BP4A.251205.006)

2 participants