Skip to content

fix: lock all workers to Python 3.12, remove multi-version support#292

Merged
deanq merged 4 commits intomainfrom
zeke/ae-2434-flash-bug-cpu-worker-deploys-on-wrong-python-version
Mar 27, 2026
Merged

fix: lock all workers to Python 3.12, remove multi-version support#292
deanq merged 4 commits intomainfrom
zeke/ae-2434-flash-bug-cpu-worker-deploys-on-wrong-python-version

Conversation

@KAJdev
Copy link
Copy Markdown
Contributor

@KAJdev KAJdev commented Mar 25, 2026

Fixes AE-2434

CPU workers with no explicit python_version were deploying on the developer's local Python version (via local_python_version()) instead of 3.12. Since packages are always compiled for Python 3.12 (the only supported ABI for the one-tarball-serves-all-resources design), this caused C-extension packages to fail at runtime on py3.10/py3.11 images.

@promptless
Copy link
Copy Markdown

promptless bot commented Mar 25, 2026

📝 Documentation updates detected!

New suggestion: Update Flash Python version requirement to 3.12 only


Tip: Filter the Dashboard by labels or assignees to focus on what matters to you 🔎

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR standardizes Flash’s deployment/runtime to a single Python ABI (3.12) to prevent C-extension runtime failures caused by CPU workers selecting images based on the developer’s local interpreter version.

Changes:

  • Lock CPU and GPU image selection to Python 3.12 and add explicit validation errors when unsupported versions are requested for CPU endpoints.
  • Update flash build / manifest generation to target Python 3.12 independent of the local interpreter.
  • Update unit tests and documentation to reflect the single-version (3.12) worker/runtime behavior.

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/runpod_flash/core/resources/constants.py Pins CPU/GPU allowed versions to 3.12 and adds CPU image-type version enforcement in get_image_name.
src/runpod_flash/core/resources/live_serverless.py Switches live resource image resolution defaults to DEFAULT_PYTHON_VERSION (3.12).
src/runpod_flash/cli/commands/build.py Removes local-interpreter version gating and sets build packaging target to Python 3.12.
src/runpod_flash/cli/commands/build_utils/manifest.py Sets manifest python_version default to 3.12 and records per-resource target_python_version.
tests/unit/core/resources/test_constants.py Updates version expectations and adds new CPU-version rejection cases.
tests/unit/resources/test_live_serverless.py Updates CPU live serverless tests to reject 3.10/3.11 and default to 3.12.
tests/unit/resources/test_live_load_balancer.py Updates expected CPU LB image naming to always include py3.12.
tests/unit/cli/commands/test_build.py Updates mixed-version resolution test to align CPU target with 3.12.
tests/unit/cli/commands/build_utils/test_manifest.py Updates manifest python version assertion to DEFAULT_PYTHON_VERSION.
src/runpod_flash/cli/docs/flash-build.md Documents that wheel ABI selection targets Python 3.12 regardless of local interpreter.
docs/Deployment_Architecture.md Updates architecture docs to reflect Python 3.12 ABI targeting.
README.md Updates user-facing description: all workers run Python 3.12 and image tags are py3.12-*.
RELEASE_SYSTEM.md Narrows documented supported Python versions for build/release to 3.10–3.12.
CONTRIBUTING.md Updates contributor prerequisites and CI gates to 3.10–3.12.
CLAUDE.md Updates project overview Python version range to >=3.10 <3.13.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +22 to 31
# worker runtime Python versions. all flash workers run Python 3.12.
# one tarball serves every resource type (GPU and CPU), so packages,
# images, and the runtime must all target 3.12.
WORKER_PYTHON_VERSION: str = "3.12"
GPU_PYTHON_VERSIONS: tuple[str, ...] = ("3.12",)
CPU_PYTHON_VERSIONS: tuple[str, ...] = ("3.10", "3.11", "3.12")
CPU_PYTHON_VERSIONS: tuple[str, ...] = ("3.12",)

# GPU base image (runpod/pytorch:1.0.3-cu1281-torch291-ubuntu2204) ships Python 3.12.
# This is a fact of the Docker image, not configurable at build time.
GPU_BASE_IMAGE_PYTHON_VERSION: str = "3.12"

# Default must match GPU to avoid ABI mismatch (one tarball serves all resources)
DEFAULT_PYTHON_VERSION: str = "3.12"

Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WORKER_PYTHON_VERSION, GPU_BASE_IMAGE_PYTHON_VERSION, and DEFAULT_PYTHON_VERSION are all hard-coded to the same value, but WORKER_PYTHON_VERSION is currently unused. This duplication is prone to drift (e.g., updating one constant but forgetting the others). Consider keeping a single source-of-truth (e.g., define WORKER_PYTHON_VERSION once and set the others equal to it, or remove the redundant constant).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems worth removing, to avoid confusion

Comment on lines +36 to 40
def local_python_version() -> str:
"""Return the Python version used by flash workers (always 3.12)."""
return DEFAULT_PYTHON_VERSION


Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local_python_version() no longer returns the running interpreter’s version; it returns the worker/runtime version (3.12). With the current name, this is easy to misinterpret and could lead to incorrect usage downstream. Consider renaming it to something like worker_python_version() / runtime_python_version() (or deprecating the old name) and updating call sites accordingly, or remove it and use DEFAULT_PYTHON_VERSION directly.

Suggested change
def local_python_version() -> str:
"""Return the Python version used by flash workers (always 3.12)."""
return DEFAULT_PYTHON_VERSION
def worker_python_version() -> str:
"""Return the Python version used by flash workers (currently 3.12)."""
return DEFAULT_PYTHON_VERSION
def local_python_version() -> str:
"""
DEPRECATED: use worker_python_version().
Note: This returns the flash worker/runtime Python version, not the
version of the interpreter running this code.
"""
return worker_python_version()

Copilot uses AI. Check for mistakes.
Comment on lines +282 to 286
# all packaging and image selection targets 3.12 regardless of local python.
# pip downloads wheels for 3.12 via --python-version, and all worker images
# run 3.12, so the local interpreter version does not affect the build output.
python_version = DEFAULT_PYTHON_VERSION

Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python_version is set to DEFAULT_PYTHON_VERSION here, but dependency installation later falls back to the local interpreter when _resolve_pip_python_version() returns None (e.g., when the manifest has no resources). That can reintroduce ABI mismatches on empty/edge manifests. Consider defaulting target_python_version to DEFAULT_PYTHON_VERSION when it can’t be resolved from resources (or make _resolve_pip_python_version() fall back to manifest["python_version"]).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@runpod-Henrik runpod-Henrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #292 — fix: lock all workers to Python 3.12, remove multi-version support

Repo: runpod/flash
Severity: MEDIUM
Key findings: ABI fix is correct. Python 3.12 SDK requirement is a known beta constraint — users need to be warned via release notes. Users with existing 3.11 CPU endpoints will crash on next deploy with no recovery guidance.


1. The underlying bug is real and the fix is correct

CPU workers were being built against the developer's local Python version. If the developer was on 3.10 or 3.11, packages with C extensions would fail to import when the worker started on 3.12. Users would see an ImportError at runtime with no indication that a version mismatch was the cause. The fix — hardcoding 3.12 as the worker build target everywhere — is correct.

2. requires-python = "==3.12.*" — intentional beta constraint, needs release note

The restriction to Python 3.12 is intentional for this beta period. However, a user on 3.10 or 3.11 who runs pip install runpod-flash will get a pip resolver error with no explanation. The release notes should call this out explicitly so users know to switch to 3.12 before upgrading, rather than debugging a confusing install failure.

3. Existing users with 3.11 CPU endpoints will hit a crash on their next deploy

If a user previously deployed a CPU endpoint with python_version="3.11", that value is stored in their persisted resource config. On the first flash deploy after upgrading to this version, the deployment will crash with no indication that the stored config is the cause. There is no migration warning, no auto-migration, and no documented recovery path. The release notes should tell users how to handle this (e.g. delete the stored config and redeploy).

4. CONTRIBUTING.md contradicts the new requirement

CONTRIBUTING.md tells contributors the project supports Python 3.10+ and that CI runs quality gates on 3.10, 3.11, and 3.12. With the new constraint in place, those CI runs would fail on install. Worth updating for accuracy.


Verdict

PASS WITH NITS — the ABI fix is correct and the 3.12 constraint is a known beta decision. Before shipping: add a release note warning users on 3.10/3.11 that they need to upgrade, and include guidance for users with existing 3.11 CPU endpoint configs.

🤖 Reviewed by Henrik's AI-Powered Bug Finder

@deanq deanq merged commit e1f1dd8 into main Mar 27, 2026
8 checks passed
@deanq deanq deleted the zeke/ae-2434-flash-bug-cpu-worker-deploys-on-wrong-python-version branch March 27, 2026 00:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants