feat(worker): native per-version GPU base images (AE-2827 parity)#94
Open
feat(worker): native per-version GPU base images (AE-2827 parity)#94
Conversation
4 tasks
Add Python 3.10 and 3.11 support to GPU worker images via side-by-side
torch install in the existing runpod/pytorch base. 3.12 keeps the fast
path (torch pre-installed) to avoid the ~7 GB reinstall cost on hot
deployments; 3.10/3.11 images pay that cost once per cold start per DC.
Sibling to flash#322 which landed the SDK-level plumbing. Tags follow
the same ``py${VERSION}-${TAG}`` scheme already in use for CPU images.
- Dockerfile / Dockerfile-lb (GPU): accept PYTHON_VERSION build arg;
install torch from download.pytorch.org/whl/cu128 and repoint
/usr/local/bin/python for non-3.12 targets; validate interpreter
matches the arg during build.
- Dockerfile-cpu / Dockerfile-lb-cpu (CPU): surface PYTHON_VERSION at
runtime via FLASH_PYTHON_VERSION env so the worker's startup check
can read it.
- src/version.py: new ``assert_python_version_matches_image`` — raises
PythonVersionMismatchError at handler boot when ``sys.version_info``
disagrees with the image's stamped FLASH_PYTHON_VERSION. Caught
before user code runs; skipped when the env var is unset (local dev).
- src/handler.py / src/lb_handler.py: call the assertion immediately
after logging setup, before ``maybe_unpack()`` and handler import.
- tests/unit/test_version.py: 4 new cases covering env-unset skip,
match, mismatch raise, and message contents.
- tests/unit/test_lb_handler.py: extend the mocked ``version`` module
with ``assert_python_version_matches_image`` so fresh-import tests
don't break.
- .github/workflows/ci.yml: expand CI to build GPU and LB images
across {3.10, 3.11, 3.12}; align prod CPU and LB-CPU default to
3.12 (matches flash's DEFAULT_PYTHON_VERSION).
Ubuntu 22.04's system python3.10 has ensurepip disabled by Debian policy, which broke the side-by-side torch install for 3.10 GPU images (CI: docker-test-gpu (3.10), docker-test-lb (3.10)). python3.11 is a separate interpreter without the disable, so only 3.10 was affected. Use urllib+get-pip.py instead of ensurepip — works for any interpreter regardless of distro patching, and urllib is stdlib so no curl dep. Also corrects the outdated deadsnakes comment on both Dockerfiles: the runpod/pytorch base image layers alt-Python 3.11/3.12 on top of the system 3.10, not via deadsnakes.
Replace the runpod/pytorch + side-by-side install hack with a native per-version GPU base built directly on nvidia/cuda. Each image variant has exactly one Python interpreter at /usr/local/bin/python (3.10 from upstream jammy, 3.11/3.12/3.13 from deadsnakes), with torch installed natively for that interpreter from the cu128 wheel index. Eliminates the ~7 GB cold-start tax on non-3.12 images and decouples flash-worker from runpod/pytorch's Python release cadence. Adding 3.13 (or future 3.14/3.15) is now a CI matrix entry, not an upstream wait. Refs AE-2827.
Mirror the GPU worker rewrite for the load-balanced GPU image. Same nvidia/cuda + deadsnakes pattern, same native-per-version layout, just with EXPOSE 80 and the uvicorn entrypoint instead of the QB handler. Refs AE-2827.
Expands docker-test, docker-test-lb-cpu, docker-test-gpu, docker-test-lb, docker-prod-gpu, docker-prod-cpu, docker-prod-lb, and docker-prod-lb-cpu to include 3.13. is-default stays on 3.12 (drives :latest aliases). Refs AE-2827.
3c7632c to
eec06e2
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stacks on top of #89. Replaces the runpod/pytorch + side-by-side install hack with native per-version base images on
nvidia/cuda:12.8.1-cudnn-runtime-ubuntu22.04. Each published image variant has exactly one Python interpreter at/usr/local/bin/python(3.10 from upstream jammy, 3.11/3.12/3.13 from deadsnakes), with torch installed natively from the cu128 wheel index.Eliminates the ~7 GB cold-start tax on non-3.12 images and decouples flash-worker from runpod/pytorch's Python release cadence. Adding 3.14/3.15 in the future is a CI matrix entry, not an upstream wait.
Phase
This is Phase 1 of the design at
docs/superpowers/specs/2026-04-28-ae-2827-python-version-parity-design.md. Phase 2 lives in the stacked SDK PR on flash. Merge order: this PR after #89 → release-please publishes new image tags → flash SDK PR.Changes
Dockerfile: rewrite — nvidia/cuda base, deadsnakes Python, native torch install, get-pip.py bootstrap.Dockerfile-lb: rewrite — same shape with EXPOSE 80 and uvicorn entrypoint, byte-identical build chain toDockerfilefor diff-clean parity..github/workflows/ci.yml: add 3.13 to all docker test/release matrices (docker-test,docker-test-lb-cpu,docker-test-gpu,docker-test-lb,docker-prod-gpu,docker-prod-cpu,docker-prod-lb,docker-prod-lb-cpu).is-default: truestays on 3.12 (drives:latestaliases).Test plan
runpod/flash*:py3.X-{tag}variants exist on Docker HubKnown follow-ups (not blocking)
chore(dockerfile): pin numpyPR.