Skip to content

Fix Android mobile-forge support artifact relocation#5

Merged
FeodorFitsner merged 5 commits into
flet-dev:mainfrom
ndonkoHenri:codex/fix-android-python-support-artifact
Jun 1, 2026
Merged

Fix Android mobile-forge support artifact relocation#5
FeodorFitsner merged 5 commits into
flet-dev:mainfrom
ndonkoHenri:codex/fix-android-python-support-artifact

Conversation

@ndonkoHenri
Copy link
Copy Markdown
Contributor

@ndonkoHenri ndonkoHenri commented May 31, 2026

ndonkoHenri added a commit to ndonkoHenri/mobile-forge that referenced this pull request May 31, 2026
The published python-android-mobile-forge-*.tar.gz bakes
/home/runner/ndk/<letter>/... paths into sysconfigdata and ships
libpython3.so as an ELF stub. We hold the line in two places until
flet-dev/python-build#5 ships a new tarball that fixes both upstream:

1. install_ndk.sh: also symlink the sdkmanager install at
   $HOME/ndk/<letter>/ so the baked sysconfigdata paths resolve to a
   real toolchain. Only fires when input is letter-form (the workflow's
   FORGE_NDK_VERSION=r27d path); component-form callers opt out.

2. build-wheels.yml: keep the inline libpython3.so → linker-script
   loop since the published tarball still ships the ELF stub.

Both blocks are wrapped in `CLEANUP-AFTER: flet-dev/python-build#5`
markers — grep that string post-merge and delete the bracketed
sections in one pass. They're guarded by idempotent checks so they
no-op when the new tarball lands, but the dead code is worth removing
for cleanliness.
CPython's Makefile rule for libpython skips `-Wl,-soname` when
INSTSONAME == LDLIBRARY, which is the case on Android (both are
`libpython3.X.so`). The shipped libpython3.X.so therefore has no
DT_SONAME entry. The visible consequence is that consumer wheels'
DT_NEEDED entries get whatever name the linker was asked for —
notably, any link path that opens `libpython3.so` via a real-lib
symlink writes `libpython3.so` into DT_NEEDED instead of
`libpython3.12.so`, and the device fails to dlopen at runtime.

Setting SONAME via patchelf right after `make install` makes
consumer wheels' DT_NEEDED entries correct regardless of how the
link was reached (stub, symlink, direct -lpython3.12, etc.).

3.13+ uses CPython's official Android tooling, which sets SONAME
natively — this fix only applies to the legacy 3.12 build path.
@FeodorFitsner
Copy link
Copy Markdown
Contributor

Pushed 49d866c to add a complementary fix: set SONAME on libpython3.12.so via patchelf after make install (scoped to the 3.12 build path).

Why

While investigating an actual ImportError: dlopen failed: library "libpython3.so" not found reproduction (in ndonkoHenri/mobile-forge run 26703178220), I noticed the deepest defect is in libpython3.12.so itself:

$ llvm-readelf -d install/android/arm64-v8a/python-3.12.12/lib/libpython3.12.so | grep -E 'SONAME|NEEDED'
  (NEEDED) libm.so
  (NEEDED) libdl.so
  (NEEDED) libc.so
  # no SONAME entry

CPython's Makefile.pre.in rule for libpython only emits -Wl,-h$(INSTSONAME) when INSTSONAME != LDLIBRARY. On Android both are libpython3.12.so, so the link skips -Wl,-soname and the shipped file has no DT_SONAME. (CPython's official 3.13+ Android tooling sets SONAME natively — I verified libpython3.13.so in the v3.13 tarball has SONAME=libpython3.13.so — so this only affects the legacy 3.12 path in android/build.sh.)

When SONAME is missing, ld.lld falls back to whatever name it was asked to open when recording DT_NEEDED in the consumer. That's how the repro chain bottoms out:

  1. pyo3-build-config emits -lpython3 for abi3 builds.
  2. -lpython3 resolves to libpython3.so (whether stub or a real-lib symlink).
  3. Linker reads the resolved file; no SONAME → uses the link-name libpython3.soDT_NEEDED=libpython3.so.
  4. Runtime: device only ships libpython3.12.soImportError.

Interaction with the existing fix in this PR

The replace_libpython_stub function in this PR is still load-bearing — it stops the upstream stub (which does have its own SONAME=libpython3.so) from ever being consulted. Together the two fixes form belt-and-suspenders:

Failure path Fixed by SONAME on libpython3.12.so Fixed by linker-script stub
Anyone opens libpython3.12.so via a wrong-named symlink
Anyone emits -lpython3 and hits the upstream stub
Both at once (the actual repro on the fork's older workflow)

Scope notes

  • android/build.sh: patchelf call is gated on command -v patchelf with a loud warning fallback, so it's safe for local builds without patchelf installed.
  • .github/workflows/build-python-version.yml: sudo apt-get install -y patchelf added before the Android build step.
  • Only the version_int <= 312 branch is touched. 3.13+ unaffected.

@FeodorFitsner FeodorFitsner merged commit 3622069 into flet-dev:main Jun 1, 2026
1 check passed
@ndonkoHenri ndonkoHenri deleted the codex/fix-android-python-support-artifact branch June 1, 2026 18:33
ndonkoHenri added a commit to ndonkoHenri/mobile-forge that referenced this pull request Jun 1, 2026
flet-dev/python-build#5 has landed. The new v3.12 tarball relocates
sysconfigdata's baked NDK paths at runtime AND ships libpython3.so as
a proper linker script (with libpython3.12.so now carrying its
DT_SONAME, per Feodor's follow-up patchelf fix). Both consumer-side
band-aids in this repo become no-ops against that tarball — remove
them.

Removed:

1. `.ci/install_ndk.sh`: the `$HOME/ndk/<letter>` compat-symlink block.
   Existed because sysconfigdata baked `/home/runner/ndk/<letter>/...`
   compiler paths, so crossenv looked for clang there. PR flet-dev#5's
   tarball-side relocation consults `$NDK_HOME` (and a fallback list)
   at sysconfig import time, so the symlink is no longer needed.

2. `.github/workflows/build-wheels.yml`: the `for libpy in ...` loop
   that replaced libpython3.so with a GNU ld linker script
   `INPUT ( -lpython3.12 )`. PR flet-dev#5's `replace_libpython_stub` writes
   the same linker script in-place at tarball-assembly time, and the
   new SONAME on libpython3.12.so closes the link-name-fallback path
   that originally made the workaround necessary. The loop is now a
   no-op against any extracted tarball — delete it.

bash -n + YAML validation both pass.
ndonkoHenri added a commit to ndonkoHenri/mobile-forge that referenced this pull request Jun 1, 2026
CrossVEnv.cross_kwargs() builds the env dict it later passes to
subprocess.run(env=…). It used to start from venv_kwargs.get("env", {})
— i.e. {} when the caller didn't supply env — and add PATH and
VIRTUAL_ENV on top. Because subprocess.run(env=<dict>) REPLACES (not
augments) the child env, the child process saw a near-empty environment
(PATH + VIRTUAL_ENV only). NDK_HOME, HOME, LANG, and everything else
the parent had were silently dropped.

That broke any downstream tool that reads os.environ from a forge
cross subprocess. Most recently: python-build's sysconfigdata
relocator (flet-dev/python-build#5), which crossenv exec_module()'s
when forge spawns it (forge/cross.py:334 → crossenv/__init__.py:217).
The relocator wants NDK_HOME to rewrite the build-time path
'/home/runner/ndk/r27d/…' baked into _sysconfigdata__linux_.py to the
local NDK on the runner. With the wipe in place the lookup returned
None, fell through to ~/ndk/<letter>/ legacy compat (which our
.ci/install_ndk.sh used to symlink), and only worked because of that
symlink. Once we dropped the symlink (e09628a "CLEANUP-AFTER:
flet-dev/python-build#5"), every Android wheel build failed with

    ERROR: Cannot find cross-compiler (
        '/home/runner/ndk/r27d/toolchains/llvm/prebuilt/linux-x86_64/
         bin/i686-linux-android24-clang')!

at crossenv/__init__.py:407 because the relocator's NDK_HOME branch
never fired.

# Options considered

Option A — Revert e09628a, restore the ~/ndk/<letter>/ legacy symlink
  in install_ndk.sh.
    + One-line, immediate.
    - Keeps the band-aid; the relocator's NDK_HOME branch stays dead
      in CI.
    - Doesn't fix any future tool that wants HOME / LANG / etc.
  Rejected: just hides the bug.

Option B — Surgical allowlist in cross_kwargs. Add only the env vars
  downstream tooling needs:

      for var in ("NDK_HOME", "ANDROID_NDK_HOME", "HOME"):
          if var in os.environ:
              env.setdefault(var, os.environ[var])

    + Minimal blast radius — three vars, every existing strip stays.
    + Direct, easy to defend in an upstream PR ("these three are
      required by python-build relocation").
    + setdefault preserves caller-supplied env semantics.
    - Partial: next tool that needs USER / XDG_CACHE_HOME / LANG /
      whatever brings us right back here.
    - Couples mobile-forge to the relocator's contract.
    - Doesn't answer "why is env scrubbed at all?".

Option C — Inherit-by-default (this commit). Change the default branch
  of `venv_kwargs.get("env", …)` from {} to os.environ.copy(), so the
  env-replace becomes opt-in (pass env=<dict> to wipe) rather than the
  accidental default. PATH / VIRTUAL_ENV / PYTHONHOME overrides on top
  unchanged.
    + Fixes the root cause: subprocess sees a sensible env, matching
      ordinary subprocess.run semantics.
    + Future-proofs every tool that reads os.environ — no more
      whack-a-mole.
    + Removes a foot-gun (the {} default was almost certainly an
      oversight: PATH has a justification comment, the wipe doesn't).
    + Net-shorter diff than B (the try/except PYTHONHOME also
      collapses to .pop()).
    - Broader behavior change: every cross subprocess now inherits the
      runner env. A build that accidentally relied on a var being
      ABSENT would now break.
    - Harder to land upstream — reviewers will want a regression
      survey.

# Why C now

Picked C because:
  * It actually fixes the mechanism instead of patching its symptom.
  * The wipe was an accident (the PATH override comment justifies PATH,
    nothing in the code justifies stripping the rest).
  * If a recipe build was relying on accidental env scrubbing, that's
    a latent bug we want surfaced now, not later.
  * If C ever needs to be partially reverted, B's allowlist remains
    the documented migration path — re-add the {} default and
    re-introduce the surgical allowlist.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auto-apply Android python-build install-tree patches in forge

2 participants