Skip to content

feat(worker): native per-version GPU base images (AE-2827 parity)#94

Open
deanq wants to merge 5 commits intomainfrom
deanq/ae-2827-python-version-parity
Open

feat(worker): native per-version GPU base images (AE-2827 parity)#94
deanq wants to merge 5 commits intomainfrom
deanq/ae-2827-python-version-parity

Conversation

@deanq
Copy link
Copy Markdown
Contributor

@deanq deanq commented Apr 28, 2026

Summary

Stacks on top of #89. Replaces the runpod/pytorch + side-by-side install hack with native per-version base images on nvidia/cuda:12.8.1-cudnn-runtime-ubuntu22.04. Each published image variant has exactly one Python interpreter at /usr/local/bin/python (3.10 from upstream jammy, 3.11/3.12/3.13 from deadsnakes), with torch installed natively from the cu128 wheel index.

Eliminates the ~7 GB cold-start tax on non-3.12 images and decouples flash-worker from runpod/pytorch's Python release cadence. Adding 3.14/3.15 in the future is a CI matrix entry, not an upstream wait.

Phase

This is Phase 1 of the design at docs/superpowers/specs/2026-04-28-ae-2827-python-version-parity-design.md. Phase 2 lives in the stacked SDK PR on flash. Merge order: this PR after #89 → release-please publishes new image tags → flash SDK PR.

Changes

  • Dockerfile: rewrite — nvidia/cuda base, deadsnakes Python, native torch install, get-pip.py bootstrap.
  • Dockerfile-lb: rewrite — same shape with EXPOSE 80 and uvicorn entrypoint, byte-identical build chain to Dockerfile for diff-clean parity.
  • .github/workflows/ci.yml: add 3.13 to all docker test/release matrices (docker-test, docker-test-lb-cpu, docker-test-gpu, docker-test-lb, docker-prod-gpu, docker-prod-cpu, docker-prod-lb, docker-prod-lb-cpu). is-default: true stays on 3.12 (drives :latest aliases).

Test plan

  • All 16 docker-test cells green (4 image types × 4 Python versions: 3.10, 3.11, 3.12, 3.13)
  • Smoke handler passes inside each freshly-built image
  • After merge + release-please, all 16 runpod/flash*:py3.X-{tag} variants exist on Docker Hub

Known follow-ups (not blocking)

  • numpy install in both Dockerfiles is unpinned (preserved from prior worker setup). Worth pinning in a separate chore(dockerfile): pin numpy PR.

Base automatically changed from deanq/ae-2827-multi-python-versions to main April 28, 2026 21:56
deanq added 5 commits April 29, 2026 00:21
Add Python 3.10 and 3.11 support to GPU worker images via side-by-side
torch install in the existing runpod/pytorch base. 3.12 keeps the fast
path (torch pre-installed) to avoid the ~7 GB reinstall cost on hot
deployments; 3.10/3.11 images pay that cost once per cold start per DC.

Sibling to flash#322 which landed the SDK-level plumbing. Tags follow
the same ``py${VERSION}-${TAG}`` scheme already in use for CPU images.

- Dockerfile / Dockerfile-lb (GPU): accept PYTHON_VERSION build arg;
  install torch from download.pytorch.org/whl/cu128 and repoint
  /usr/local/bin/python for non-3.12 targets; validate interpreter
  matches the arg during build.
- Dockerfile-cpu / Dockerfile-lb-cpu (CPU): surface PYTHON_VERSION at
  runtime via FLASH_PYTHON_VERSION env so the worker's startup check
  can read it.
- src/version.py: new ``assert_python_version_matches_image`` — raises
  PythonVersionMismatchError at handler boot when ``sys.version_info``
  disagrees with the image's stamped FLASH_PYTHON_VERSION. Caught
  before user code runs; skipped when the env var is unset (local dev).
- src/handler.py / src/lb_handler.py: call the assertion immediately
  after logging setup, before ``maybe_unpack()`` and handler import.
- tests/unit/test_version.py: 4 new cases covering env-unset skip,
  match, mismatch raise, and message contents.
- tests/unit/test_lb_handler.py: extend the mocked ``version`` module
  with ``assert_python_version_matches_image`` so fresh-import tests
  don't break.
- .github/workflows/ci.yml: expand CI to build GPU and LB images
  across {3.10, 3.11, 3.12}; align prod CPU and LB-CPU default to
  3.12 (matches flash's DEFAULT_PYTHON_VERSION).
Ubuntu 22.04's system python3.10 has ensurepip disabled by Debian
policy, which broke the side-by-side torch install for 3.10 GPU images
(CI: docker-test-gpu (3.10), docker-test-lb (3.10)). python3.11 is a
separate interpreter without the disable, so only 3.10 was affected.

Use urllib+get-pip.py instead of ensurepip — works for any interpreter
regardless of distro patching, and urllib is stdlib so no curl dep.

Also corrects the outdated deadsnakes comment on both Dockerfiles: the
runpod/pytorch base image layers alt-Python 3.11/3.12 on top of the
system 3.10, not via deadsnakes.
Replace the runpod/pytorch + side-by-side install hack with a native
per-version GPU base built directly on nvidia/cuda. Each image variant
has exactly one Python interpreter at /usr/local/bin/python (3.10 from
upstream jammy, 3.11/3.12/3.13 from deadsnakes), with torch installed
natively for that interpreter from the cu128 wheel index.

Eliminates the ~7 GB cold-start tax on non-3.12 images and decouples
flash-worker from runpod/pytorch's Python release cadence. Adding 3.13
(or future 3.14/3.15) is now a CI matrix entry, not an upstream wait.

Refs AE-2827.
Mirror the GPU worker rewrite for the load-balanced GPU image. Same
nvidia/cuda + deadsnakes pattern, same native-per-version layout, just
with EXPOSE 80 and the uvicorn entrypoint instead of the QB handler.

Refs AE-2827.
Expands docker-test, docker-test-lb-cpu, docker-test-gpu, docker-test-lb,
docker-prod-gpu, docker-prod-cpu, docker-prod-lb, and docker-prod-lb-cpu
to include 3.13. is-default stays on 3.12 (drives :latest aliases).

Refs AE-2827.
@deanq deanq force-pushed the deanq/ae-2827-python-version-parity branch from 3c7632c to eec06e2 Compare April 29, 2026 07:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant