Skip to content

[DRAFT] feat(builder): use python-build-standalone instead of source builds#80

Open
AStaroverov wants to merge 6 commits into
mainfrom
feat/python-build-standalone
Open

[DRAFT] feat(builder): use python-build-standalone instead of source builds#80
AStaroverov wants to merge 6 commits into
mainfrom
feat/python-build-standalone

Conversation

@AStaroverov
Copy link
Copy Markdown
Contributor

@AStaroverov AStaroverov commented May 28, 2026

Stop compiling CPython from source on CI for macOS and Linux. Download prebuilt relocatable tarballs from astral-sh/python-build-standalone (hereafter PBS) — the same artifacts that power uv, rye, mise, and pyenv-install. Windows path is untouched.

Diff: −354 lines, +95 lines. Removed 2 builder modules (linux.ts, macos.ts) and the Docker environment (builder/docker/).

Why

1. Fixes flaky CI on macOS

In PR #79, four macOS jobs failed with the same error:

Creating directory /ppp-marker/3.12.10/lib/python3.12
Creating directory /ppp-marker/3.12.10/lib/python3.12
install: mkdir .../lib/python3.12: File exists
make: *** [libinstall] Error 71
make: *** Waiting for unfinished jobs....

This is a race condition in CPython's libinstall target: parallel sub-targets simultaneously run install -d on the same directory, and BSD install on macOS is not idempotent (unlike GNU). On the same workflow run, macos-14 arm64 h5ad passed while macos-14-large amd64 h5ad failed — a non-deterministic race. Past "successful" main runs were turbo cache hits at 1.6 minutes; a real cold build hadn't happened in a while.

PBS eliminates the source of the problem entirely — we no longer build CPython.

2. Speeds up CI

The CPython compilation phase goes away — roughly 10–15 min per cache-miss × 8 cache-miss jobs on a fresh PR ≈ 1.5 hours of machine time. On macOS/Linux, building a package is now: download tarball + pip download wheels. Fits comfortably in 2–3 minutes.

3. Shrinks our surface area

Removed:

  • builder/src/macos.ts (100 lines) — consolidateLibs copied external dylibs and patched install_name. PBS tarballs are already relocatable (@executable_path/...).
  • builder/src/linux.ts (168 lines) — consolidateLibs + patchelf for RPATH. PBS Linux already uses $ORIGIN-relative RPATH and bundles dependencies inside lib/.
  • builder/docker/ (Dockerfile + build.sh) — the Rocky Linux 8 container existed solely to build Python from source against an old glibc. PBS Linux is built against glibc 2.17 (manylinux2014).
  • buildFromSources and buildInDocker in build.ts.
  • isInBuilderContainer / BUILD_CONTAINER env plumbing in util.ts.

Less code, fewer bugs of our own making. The libinstall race is not our bug and now is not our problem.

4. Uses the industry standard

python-build-standalone is maintained by Astral and is the backend for uv, rye, mise, pyenv-install (optional), Hatch (optional), and Bazel rules_python. These tarballs run through CI with coverage much wider than ours; security/CVE updates land without our involvement; CPython Makefile bugs get patched upstream before they reach us.

What changes

Architecturally

Before

  • macOS: pipx run portable-python build 3.12.10 → unpack → consolidateLibs (otool/install_name_tool) → pip download
  • Linux: spawn Rocky 8 Docker → pipx run portable-python build 3.12.10 → unpack → consolidateLibs (ldd/patchelf) → pip download
  • Windows: download python-X.Y.Z-embed-amd64.zip from python.org → patch pip wheel → install

After

  • macOS / Linux: download cpython-${VERSION}+${PBS_TAG}-${TRIPLE}-install_only.tar.gz → unpack → pip download
  • Windows: unchanged

Key code

builder/src/pbs.ts (new, 75 lines):

const PBS_RELEASE_TAG = '20250529';

export async function getPortablePython(
  version: string,
  osType: util.OS,
  archType: util.Arch,
  installDir: string,
): Promise<void> {
  const triple = targetTriple(osType, archType);
  const fileName = `cpython-${version}+${PBS_RELEASE_TAG}-${triple}-install_only.tar.gz`;
  const url = `https://github.com/astral-sh/python-build-standalone/releases/download/${PBS_RELEASE_TAG}/${fileName}`;
  // download → extract → move python/* into installDir
}

Triples:

Platform PBS triple
macOS x64 x86_64-apple-darwin
macOS arm64 aarch64-apple-darwin
Linux x64 x86_64-unknown-linux-gnu
Linux arm64 aarch64-unknown-linux-gnu

In build.ts, the case 'macosx' and case 'linux' branches are merged; both call pbs.getPortablePython.

What does NOT change

  • CPython version: 3.12.10. PBS release 20250529 ships exactly this patch version.
  • loadPackages / downloadPackages / runNativeImportChecker logic.
  • Package configs (config.json for each python-3.12.10-*).
  • Windows build.
  • Signing/notarization paths in the workflow (pydist/**/bin/python, etc.) remain valid.

Trade-offs

We gain

  • Stable CI on macOS — no more libinstall race.
  • −10–15 min on every cache-miss build on macOS/Linux.
  • −354 LOC.
  • CPython patches inherited "for free" from upstream PBS.
  • Uniform platform baseline for end users (PBS guarantees, e.g., macOS builds work on ≥ macOS 13.0).

We give up

  • Dependency on Astral's GitHub Release CDN. If releases.github.com is down, our CI is down. Mitigation: retry with backoff, or mirror to our S3 (follow-up).
  • Less control over compile flags. Today we used no custom flags, so this is a theoretical loss. If we ever need --with-lto etc., we can switch to pgo+lto-full PBS variants or revert.
  • CPython patch version tied to PBS release. Bumping to 3.12.11 will require a one-line PBS_RELEASE_TAG bump — a recurring maintenance task that didn't exist before.

Risk & rollback

The change is isolated to builder/. If CI reveals incompatibility (unlikely — manylinux2014 / macOS 13.0 baseline is broader than portable-python's), rollback = revert one commit.

Testing plan

  • CI runs green across all 5 platforms × 9 packages = 45 jobs on this branch.
  • macOS build time drops from ~15 min to ~3 min.
  • Linux build time (no Docker) drops from ~25 min to ~5 min.
  • runNativeImportChecker passes for all packages without whitelist changes (the .so set in PBS lib-dynload/ matches portable-python's).
  • Locally verify one of the built runtimes launches and imports heavy packages (scanpy, scipy).

Future work (not in this PR)

  1. Mirror PBS tarballs in our own S3 for resilience against GitHub releases outages.
  2. Bump CPython to 3.12.13+ (latest PBS release 20260510 includes 3.12.13).
  3. Apply the same logic to Windows: PBS publishes *-pc-windows-msvc-install_only.tar.gz, which would let us unify windows.ts with the new path. Skipped here because Windows isn't broken and its pip-patch step is non-trivial.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request switches the CPython provisioning on macOS and Linux from portable-python source builds to prebuilt python-build-standalone (PBS) tarballs. This change removes the need for local compilation, Docker-based builds, and custom library consolidation scripts, significantly simplifying the build pipeline and reducing build times. Feedback on the changes highlights an opportunity to improve error handling in the new PBS downloader by wrapping the download and extraction process in a try...finally block to guarantee cleanup of temporary files and directories in case of failures.

Comment thread builder/src/pbs.ts
Comment thread builder/src/pbs.ts
const tarPath = path.join(stagingDir, fileName);

console.log(`[PBS] Downloading ${url}`);
await util.downloadFile(url, tarPath);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 downloadFile does not follow HTTP redirects — PBS downloads will always fail

util.downloadFile uses Node's https.get() and immediately rejects on any status code other than 200. GitHub release asset URLs (github.com/.../releases/download/…) issue a 302 redirect to objects.githubusercontent.com. Every call here will reject with Failed to get '...' (302), so macOS and Linux builds will never download a tarball.

The fix is to handle 3xx responses inside downloadFile (or a wrapper) by following response.headers.location with a recursive call.

Prompt To Fix With AI
This is a comment left during a code review.
Path: builder/src/pbs.ts
Line: 46

Comment:
**`downloadFile` does not follow HTTP redirects — PBS downloads will always fail**

`util.downloadFile` uses Node's `https.get()` and immediately rejects on any status code other than 200. GitHub release asset URLs (`github.com/.../releases/download/…`) issue a 302 redirect to `objects.githubusercontent.com`. Every call here will reject with `Failed to get '...' (302)`, so macOS and Linux builds will never download a tarball.

The fix is to handle 3xx responses inside `downloadFile` (or a wrapper) by following `response.headers.location` with a recursive call.

How can I resolve this? If you propose a fix, please make it concise.

Comment thread builder/src/pbs.ts
Comment on lines +46 to +72
await util.downloadFile(url, tarPath);

// PBS tarballs always extract into a top-level `python/` directory.
// Use a temp dir so we can rename it into place without colliding with
// the staging dir's siblings.
const tempExtractDir = path.join(stagingDir, `pbs-extract-${process.pid}`);
if (fs.existsSync(tempExtractDir)) {
fs.rmSync(tempExtractDir, { recursive: true });
}
fs.mkdirSync(tempExtractDir, { recursive: true });

console.log(`[PBS] Extracting ${tarPath} to ${tempExtractDir}`);
tar.x({ sync: true, file: tarPath, cwd: tempExtractDir });

const extractedRoot = path.join(tempExtractDir, 'python');
if (!fs.existsSync(extractedRoot)) {
throw new Error(`PBS archive layout unexpected: ${extractedRoot} not found`);
}

util.emptyDirSync(installDir);
// Move the extracted python/ contents into installDir
for (const entry of fs.readdirSync(extractedRoot)) {
fs.renameSync(path.join(extractedRoot, entry), path.join(installDir, entry));
}

fs.rmSync(tempExtractDir, { recursive: true });
fs.rmSync(tarPath);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Staging artifacts not cleaned up on error

If tar.x(...) throws, or if any renameSync call fails, both tarPath (the downloaded tarball) and tempExtractDir are left on disk. On the next invocation the tarball is re-downloaded and a new temp dir is created alongside the orphaned one, but the stale tarball is silently overwritten. Wrapping the extraction block in a try/finally that calls fs.rmSync(tarPath, { force: true }) and fs.rmSync(tempExtractDir, { recursive: true, force: true }) would keep the staging directory clean across retries.

Prompt To Fix With AI
This is a comment left during a code review.
Path: builder/src/pbs.ts
Line: 46-72

Comment:
**Staging artifacts not cleaned up on error**

If `tar.x(...)` throws, or if any `renameSync` call fails, both `tarPath` (the downloaded tarball) and `tempExtractDir` are left on disk. On the next invocation the tarball is re-downloaded and a new temp dir is created alongside the orphaned one, but the stale tarball is silently overwritten. Wrapping the extraction block in a `try/finally` that calls `fs.rmSync(tarPath, { force: true })` and `fs.rmSync(tempExtractDir, { recursive: true, force: true })` would keep the staging directory clean across retries.

How can I resolve this? If you propose a fix, please make it concise.

Comment thread builder/src/pbs.ts
Comment on lines +39 to +46
const fileName = `cpython-${version}+${PBS_RELEASE_TAG}-${triple}-install_only.tar.gz`;
const url = `https://github.com/astral-sh/python-build-standalone/releases/download/${PBS_RELEASE_TAG}/${fileName}`;

const stagingDir = path.dirname(installDir);
const tarPath = path.join(stagingDir, fileName);

console.log(`[PBS] Downloading ${url}`);
await util.downloadFile(url, tarPath);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No integrity check on the downloaded tarball

PBS publishes a .sha256 file alongside every tarball (same URL with .sha256 appended). Without verifying the checksum after download, a corrupted transfer or a compromised CDN response would silently extract a bad interpreter and produce a broken Python distribution. Consider downloading the companion .sha256 and comparing against crypto.createHash('sha256').update(fs.readFileSync(tarPath)).digest('hex') before extracting.

Prompt To Fix With AI
This is a comment left during a code review.
Path: builder/src/pbs.ts
Line: 39-46

Comment:
**No integrity check on the downloaded tarball**

PBS publishes a `.sha256` file alongside every tarball (same URL with `.sha256` appended). Without verifying the checksum after download, a corrupted transfer or a compromised CDN response would silently extract a bad interpreter and produce a broken Python distribution. Consider downloading the companion `.sha256` and comparing against `crypto.createHash('sha256').update(fs.readFileSync(tarPath)).digest('hex')` before extracting.

How can I resolve this? If you propose a fix, please make it concise.

AStaroverov and others added 3 commits May 28, 2026 14:06
Replace portable-python source compilation with prebuilt CPython tarballs from
astral-sh/python-build-standalone on macOS and Linux. Removes the libinstall
parallel-make race that breaks macOS CI intermittently, drops the Linux Docker
build container, and cuts ~10–15 minutes of CPython compile time per cache-miss
build. Windows path is unchanged.

Deleted: builder/src/linux.ts, builder/src/macos.ts, builder/docker/.
The pre-existing lib consolidation and rpath patching were needed only because
portable-python's output lacked relocatable load paths; PBS tarballs ship with
@executable_path / \$ORIGIN-relative paths and bundled deps, so the post-build
fixup steps are no longer required.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GitHub release asset URLs return 302 to S3 — the previous downloadFile
rejected any non-200 status, so PBS tarball fetches failed immediately.
Recurse on 3xx with a Location header, capped at 5 hops.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PBS install_only tarballs do not include a usable Tcl/Tk runtime in the
context of the bare-import checker. PIL._imagingtk and matplotlib.backends._tkagg
are GUI-only hooks that fail on standalone import even when CPython is fine.
These runtimes are not used for plotting/GUI, so the extensions are safe to skip.

Added pillow-*.whl and matplotlib-*.whl entries to all non-Windows whitelists.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@AStaroverov AStaroverov force-pushed the feat/python-build-standalone branch from 545d436 to 217d0df Compare May 28, 2026 12:13
AStaroverov and others added 3 commits May 28, 2026 15:38
PBS Linux Python ships with CC=clang in its distutils sysconfig. System
libraries built on Ubuntu (e.g. R) emit `-fopenmp` in their ldflags, intended
for gcc+libgomp. clang interprets `-fopenmp` as `-lomp`, which is not present
on standard CI runners. Source builds (rpy2-rinterface, etc.) fail at link
time with "cannot find -lomp".

Set CC=gcc/CXX=g++ before pip download on Linux. gcc+libgomp is what
portable-python used and what the system R was built against — extensions
built this way match the host toolchain transparently.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
defaultExecOpts captured process.env at module load, which discarded any
later mutations like the CC=gcc override on Linux. Switch to a factory so
each spawn sees the current process.env, plus whitelist numba tbbpool on
sccoda linux-x64 (already present on h5ad/rapids).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Git-pinned projects like parapred-pytorch often omit setuptools from their
[build-system].requires, so PEP 517's isolated venv lacks a build backend and
pip download fails with "Cannot import 'setuptools.build_meta'". Switch git
URL specs to --no-build-isolation and pre-install setuptools+wheel on the host
Python so the build backend is available.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant