Skip to content

feat: Python version matrix for Docker images + error-masking fix#76

Open
deanq wants to merge 6 commits intomainfrom
deanq/ae-2391-python-versions-for-workers
Open

feat: Python version matrix for Docker images + error-masking fix#76
deanq wants to merge 6 commits intomainfrom
deanq/ae-2391-python-versions-for-workers

Conversation

@deanq
Copy link
Contributor

@deanq deanq commented Mar 6, 2026

Summary

  • Parameterize all 4 Dockerfiles with PYTHON_VERSION build arg (default 3.11)
  • Add GPU PyTorch base image mapping: 3.11 -> pytorch 2.9.1, 3.12 -> pytorch 2.10.0
  • Add Makefile version matrix: build-*-versioned, build-wip-versioned, smoketest-versioned targets
  • Produces 10 tagged images: GPU (3.11, 3.12) x 2 + CPU (3.10, 3.11, 3.12) x 2
  • Fix error-masking in handler.py: deployed mode now raises RuntimeError instead of silently falling back to Live Serverless

Context

AE-2391: Python version mismatch between user build environment and worker runtime causes silent failures with binary packages (numpy, etc). This PR adds versioned Docker images so the SDK can select the correct runtime for the user's Python version.

Companion PR: runpod/flash (SDK-side version detection and image selection)

Test plan

  • make build-all-versioned builds all 10 images locally
  • make smoketest-versioned verifies correct Python version in each image
  • Existing make test passes unchanged
  • Handler error-masking fix verified: deployed mode raises on handler load failure

deanq added 3 commits March 6, 2026 12:30
Parameterize all 4 Dockerfiles with PYTHON_VERSION build arg.
GPU images also accept PYTORCH_BASE to select the correct PyTorch
base image per Python version.

Add versioned Makefile targets:
- build-all-versioned: builds 10 images (GPU 3.11/3.12, CPU 3.10-3.12)
- build-wip-versioned: multi-platform push with latest alias
- smoketest-versioned: verify Python version in each image

GPU base image mapping:
- Python 3.11: pytorch/pytorch:2.9.1-cuda12.8-cudnn9-runtime
- Python 3.12: pytorch/pytorch:2.10.0-cuda12.8-cudnn9-runtime
Previously, _load_generated_handler() silently returned None on any
failure (missing file, import error, syntax error), causing deployed
endpoints to fall back to the FunctionRequest/Live Serverless handler.
This masked real deployment issues like Python version mismatches.

Now deployed mode (FLASH_RESOURCE_NAME set) treats handler loading
failures as fatal RuntimeError. Live Serverless mode skips the
generated handler entirely since it only uses FunctionRequest protocol.
Tests expected None returns but handler.py now raises RuntimeError in
deployed mode. Updated all 8 TestLoadGeneratedHandler tests to use
pytest.raises(RuntimeError). Also synced uv.lock to pick up latest
runpod-flash version.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Python-versioned Docker image matrix (CPU/GPU, QB/LB) to prevent runtime Python mismatches with binary deps, and makes deployed endpoints fail fast when the generated handler can’t be loaded (instead of silently falling back to the Live Serverless protocol).

Changes:

  • Parameterizes Docker builds via PYTHON_VERSION / PYTORCH_BASE build args and introduces versioned image tags (e.g., :py3.11-...).
  • Adds Makefile targets to build/push/smoketest the multi-version image matrix.
  • Updates generated-handler loading behavior to raise in deployed mode, with corresponding unit test updates.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
Makefile Adds Python version matrix variables and versioned build/push/smoketest targets.
Dockerfile Switches GPU base image selection to PYTORCH_BASE arg (and adds PYTHON_VERSION arg).
Dockerfile-lb Same as GPU QB Dockerfile, but for LB image.
Dockerfile-cpu Parameterizes CPU base image via python:${PYTHON_VERSION}-slim.
Dockerfile-lb-cpu Parameterizes CPU LB base image via python:${PYTHON_VERSION}-slim.
src/handler.py Makes deployed-mode handler loading failures fatal; keeps Live Serverless path separate.
tests/unit/test_handler.py Updates _load_generated_handler tests to expect RuntimeError instead of None.
uv.lock Updates lockfile with multiple dependency version bumps.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Include platform.python_version() in the worker boot banner for
runtime version visibility during E2E testing.
- Add build-time Python version validation to GPU Dockerfiles
- Restructure build-all-versioned to run setup once via internal targets
- Add version assertion to smoketest-versioned (fail on mismatch)
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

tests/unit/test_handler.py:179

  • test_loads_generated_handler_from_file patches handler.Path to return a tmp_path file, but _load_generated_handler() enforces handler_file.resolve().is_relative_to(Path('/app').resolve()). Since tmp_path is not under /app, this test will raise the "resolves outside /app" RuntimeError instead of returning a loaded handler. Adjust the test to bypass the /app sandbox check (e.g., mock resolve()/is_relative_to or patch the /app root used by the function) or refactor _load_generated_handler to use a patchable constant for the app root.
    def test_loads_generated_handler_from_file(self, tmp_path):
        """With valid generated handler file, loads and returns handler function."""
        handler_file = tmp_path / "handler_gpu_config.py"
        handler_file.write_text(
            "async def handler(event):\n"
            "    return {'result': event.get('input', {}).get('prompt', 'default')}\n"
        )

        with patch.dict("os.environ", {"FLASH_RESOURCE_NAME": "gpu_config"}):
            with patch("handler.Path", return_value=handler_file):
                result = _load_generated_handler()


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

- Add default case to pytorch_base() shell function in all Makefile targets
- Guard test_handler.py import against FLASH_RESOURCE_NAME env var
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants