Open
Conversation
…t communication Gap-driven PRD covering four work streams: cloudpickle removal from deployed environments, LB handler terminology cleanup, endpoint URL population post-provisioning, and selective RUNPOD_API_KEY injection.
…t/Response Widen args/kwargs from List[str]/Dict[str,str] to List[Any]/Dict[str,Any], add serialization_format field (default "cloudpickle"), and json_result field on FunctionResponse. Backward compatible -- existing cloudpickle callers work without changes.
Provides type safety for the resources_endpoints dict that the deployment pipeline already populates in manifest JSON after provisioning. Includes backward-compatible handling: from_dict defaults to None, to_dict omits the key when None.
…SHIP Update create_resource_from_manifest() to set the new FLASH_ENDPOINT_TYPE=lb env var for load-balanced resources, while preserving FLASH_IS_MOTHERSHIP=true for backward compatibility during transition. The condition now accepts both is_mothership (legacy) and is_load_balanced (new) resource data flags.
Remove cloudpickle serialize_args/serialize_kwargs from ProductionWrapper _execute_remote. Pass raw args as list and kwargs as dict with serialization_format: json in the payload, matching the FunctionRequest schema added in the prior commit.
Verify that reconcile_and_provision_resources() correctly populates resources_endpoints after provisioning and pushes the manifest to State Manager. Also fix unused variable lint error in test_deployment.
…nvelope
Aligns local dev server with RunPod's actual API format: /runsync URLs
and {"input": {...}} request envelope.
Root cause of slow flash run: rglob("*.py") walked 8,946 files (nested
.venv dirs). Replace with .gitignore/.flashignore-aware walker that
prunes directories early. Also replace runtime importlib check for
resource config types with a static frozenset.
ServerlessScalerType was dropped from runpod_flash/__init__.py when lazy loading was introduced, causing ImportError on from runpod_flash import ServerlessScalerType.
Extract first-line docstrings from @Remote functions/classes during AST scan and propagate them through WorkerInfo to both the CLI startup table (renamed "Resource" column to "Description") and generated FastAPI route summaries. Functions without docstrings fall back to the function name.
- Remove stale FLASH_IS_MOTHERSHIP assertions from test_resource_provisioner - Update PRD.md to reflect completed mothership-to-peer migration - Add completion status headers to historical plan documents - Rename mothership_provisioner.py to resource_provisioner.py - Delete obsolete mothership-related files and tests - Update all remaining mothership comments to LB endpoint terminology
Deployed QB endpoints now accept plain JSON input instead of requiring FunctionRequest/cloudpickle serialization. The build pipeline generates handler_<resource_name>.py files for each QB resource, and the manifest includes handler_file so flash-worker can delegate to them. Runtime changes: - create_deployed_handler() in generic_handler.py for plain JSON dispatch - ProductionWrapper QB/LB routing with get_routing_info() - ServiceRegistry routing info and endpoint URL population - ResourceConfig.is_load_balanced / is_live_resource flags on models - Cloudpickle imports scoped to LB handler only Build pipeline changes: - HandlerGenerator called in run_build() for QB resources - Manifest includes handler_file for non-LB resources - DEPLOYED_HANDLER_TEMPLATE for deployed (non-live) QB endpoints
The generated handler_<name>.py for deployed QB endpoints imported create_deployed_handler from runpod_flash.runtime.generic_handler, which triggered a pydantic import chain. When the bundled pydantic_core binary extension was incompatible with the container, both the generated handler and the FunctionRequest fallback failed. Inline the handler logic directly in the template using only stdlib imports (asyncio, inspect, traceback). This eliminates the runpod_flash dependency from generated deployed handlers entirely.
_extract_runpod_flash_dependencies() added flash's own dependencies (pydantic, cloudpickle, etc.) to the build tarball. When extracted to /app, these shadowed the working packages from the Docker base image. pydantic_core's native .so compiled for a different platform caused import failures. The base Docker image already includes all flash runtime dependencies. Only the flash source code needs bundling via --use-local-flash.
…rams Deployed LB endpoints returned HTTP 422 because FastAPI treated simple typed parameters (str, int) as query parameters. Adds dynamic Pydantic body model generation at route registration time for POST/PUT/PATCH/DELETE handlers, matching the pattern used by the dev server's make_input_model.
State Manager queries in ServiceRegistry._ensure_manifest_loaded() were passing RUNPOD_ENDPOINT_ID (the serverless endpoint ID) as the flash environment ID, causing "Flash environment not found" errors and falling through to ResourceManager for unnecessary re-provisioning. - service_registry: read FLASH_ENVIRONMENT_ID instead of RUNPOD_ENDPOINT_ID - resource_provisioner: inject FLASH_ENVIRONMENT_ID into env for endpoints with makes_remote_calls=True - client: add _resolve_deployed_endpoint_id() to look up pre-deployed endpoints via ServiceRegistry before falling back to ResourceManager - models: filter unknown keys in ResourceConfig.from_dict() for forward compatibility with manifest field changes - manifest: remove per-function fields (is_load_balanced, is_live_resource, config_variable) that belong at resource level only - build: generate _flash_resource_config.py after bundling local flash - CLAUDE.md: replace worktree template with auto-generated project docs
- Spell out LB/QB abbreviations (Load-balanced/Queue-based) - Add RunPod console link alongside doc links - Demote noisy INFO logs to DEBUG (resource config, env injection, LB deploy) - Remove attribute-call matching in scanner to prevent false positives
Reorganize deployment output so each endpoint type is a complete section: - LB: URLs + routes + one curl example (first POST route) - QB: URLs + one curl example using /runsync - Console/docs links moved to end as "view all" closer - Fix curl continuation line indentation
Replace exec_module() with ast.parse() in handler validation to avoid ImportErrors from modules that only resolve at runtime inside Docker. Add parent directory to sys.path in _extract_deployment_config() so sibling imports resolve during config extraction.
…or handling, and observability - Fix shared resource_config mutation in @Remote wrapper using model_copy() - Add error logging to deployed handler and generated handler template - Replace deprecated asyncio.get_event_loop() with get_running_loop() fallback - Raise ValueError at build time for empty function lists in handler generator - Upgrade log levels for manifest lookup failures and missing endpoint URLs - Preserve stale endpoint cache on State Manager unavailability - Add warning logs for LB handler introspection failures - Escape newlines/carriage returns in generated string literals - Warn when makes_remote_calls=True but RUNPOD_API_KEY is missing - Differentiate httpx error types in remote LB execution - Fix stale run_sync mock name to runsync in conftest
Contributor
There was a problem hiding this comment.
Pull request overview
This PR implements the fully deployed environment feature for Flash, enabling cross-endpoint communication with JSON serialization for deployed calls and a streamlined deployment pipeline.
Changes:
- Cross-endpoint routing via manifest-based ServiceRegistry with JSON serialization for deployed QB/LB calls
- Inline deployed handler template to avoid runpod_flash import at runtime, wrap LB handler params as JSON body
- Ignore-aware file walker for scanner, stop bundling flash deps that shadow base image packages
- Rename terminology from "mothership" to "load_balancer", use FLASH_ENVIRONMENT_ID for State Manager queries
- Add missing ServerlessScalerType to top-level exports, surface docstrings in startup table and Swagger UI
Reviewed changes
Copilot reviewed 63 out of 63 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
tests/unit/test_function_request_response_serialization.py |
New comprehensive tests for JSON serialization format support in FunctionRequest/Response |
tests/unit/test_client_should_execute_locally.py |
Added tests for _resolve_deployed_endpoint_id and wrapper manifest lookup |
tests/unit/runtime/test_service_registry.py |
Added get_routing_info tests, updated env var from RUNPOD_ENDPOINT_ID to FLASH_ENVIRONMENT_ID |
tests/unit/runtime/test_resource_provisioner.py |
New tests for create_resource_from_manifest with LB and remote call configurations |
tests/unit/runtime/test_production_wrapper.py |
Updated tests for QB/LB dispatch split, renamed run_sync to runsync |
tests/unit/runtime/test_models.py |
New tests for resources_endpoints field and is_load_balanced/is_live_resource flags |
tests/unit/runtime/test_lb_handler.py |
Added tests for _make_input_model, _wrap_handler_with_body_model body parsing |
tests/unit/runtime/test_deployed_handler.py |
New tests for create_deployed_handler (plain JSON QB handler) |
tests/unit/resources/test_serverless.py |
Renamed run_sync to runsync throughout |
tests/unit/cli/utils/test_deployment.py |
Updated terminology mothership→load_balancer, added API key validation tests |
tests/unit/cli/test_run.py |
Updated codegen expectations for wrapped body model and runsync paths |
tests/unit/cli/test_deploy.py |
Added comprehensive tests for _display_post_deployment_guidance output |
tests/unit/cli/commands/test_run_server_helpers.py |
Added tests for make_wrapped_model |
tests/unit/cli/commands/test_run.py |
Updated codegen tests for wrapped Request models and body.input access |
tests/unit/cli/commands/test_preview.py |
Renamed is_mothership to is_load_balanced throughout |
tests/unit/cli/commands/test_build.py |
Added tests for QB handler generation in build pipeline |
tests/unit/cli/commands/build_utils/test_scanner_load_balancer.py |
Removed detect_explicit_mothership tests (function deleted) |
tests/unit/cli/commands/build_utils/test_scanner.py |
Added test_exclude_nested_venv_directory, calls_remote_functions tracking tests |
tests/unit/cli/commands/build_utils/test_resource_config_generator.py |
Updated terminology and log level expectations |
tests/unit/cli/commands/build_utils/test_manifest.py |
Added tests for handler_file, makes_remote_calls, sys.path handling |
tests/unit/cli/commands/build_utils/test_handler_generator.py |
Added tests for deployed handler template and ast.parse validation |
tests/integration/test_deployment_url_population.py |
New integration tests for endpoint URL population in manifest |
tests/integration/test_cross_endpoint_routing.py |
Updated for runsync and FLASH_ENVIRONMENT_ID |
src/runpod_flash/stubs/serverless.py |
Renamed run_sync to runsync |
src/runpod_flash/stubs/live_serverless.py |
Renamed run_sync to runsync |
src/runpod_flash/runtime/state_manager_client.py |
Renamed mothership_id to flash_environment_id throughout |
src/runpod_flash/runtime/service_registry.py |
Added get_routing_info method, updated to use FLASH_ENVIRONMENT_ID |
src/runpod_flash/runtime/resource_provisioner.py |
New module replacing mothership_provisioner for resource creation |
src/runpod_flash/runtime/production_wrapper.py |
Split _execute_remote into _execute_remote_qb and _execute_remote_lb |
src/runpod_flash/runtime/models.py |
Added resources_endpoints, is_load_balanced, is_live_resource fields |
src/runpod_flash/runtime/lb_handler.py |
Added _make_input_model, _wrap_handler_with_body_model for body parsing |
src/runpod_flash/runtime/generic_handler.py |
Added create_deployed_handler for plain JSON endpoints |
src/runpod_flash/protos/remote_execution.py |
Changed args/kwargs to List[Any]/Dict[str, Any], added serialization_format and json_result |
src/runpod_flash/core/utils/http.py |
Updated comments mothership→load_balancer |
src/runpod_flash/core/resources/serverless.py |
Renamed run_sync to runsync, reduced log noise |
src/runpod_flash/core/resources/load_balancer_sls_resource.py |
Updated terminology and env var to FLASH_ENDPOINT_TYPE=lb |
src/runpod_flash/client.py |
Added _resolve_deployed_endpoint_id for manifest-based endpoint lookup |
src/runpod_flash/cli/utils/skeleton_template/README.md |
Updated examples to use runsync |
src/runpod_flash/cli/utils/ignore.py |
Added .venv/, venv/, .runpod/ to always_ignore |
src/runpod_flash/cli/utils/deployment.py |
Added API key validation, updated terminology, use resource_provisioner |
src/runpod_flash/cli/docs/flash-run.md |
Updated run_sync references to runsync |
src/runpod_flash/cli/docs/README.md |
Updated curl examples to use runsync with wrapped input |
src/runpod_flash/cli/commands/run.py |
Added docstring extraction and display, updated to generate wrapped Request models |
src/runpod_flash/cli/commands/preview.py |
Renamed mothership to load_balancer throughout |
src/runpod_flash/cli/commands/deploy.py |
Refactored _display_post_deployment_guidance to show LB and QB endpoints separately |
src/runpod_flash/cli/commands/build_utils/scanner.py |
Use ignore-aware file walker, extract docstrings, add calls_remote_functions tracking |
src/runpod_flash/cli/commands/build_utils/resource_config_generator.py |
Reduced log level to debug |
src/runpod_flash/cli/commands/build_utils/manifest.py |
Added sys.path handling for sibling imports, handler_file for QB resources |
src/runpod_flash/cli/commands/build_utils/lb_handler_generator.py |
Changed validation to ast.parse instead of import |
src/runpod_flash/cli/commands/build_utils/handler_generator.py |
Added DEPLOYED_HANDLER_TEMPLATE for plain JSON, ast.parse validation |
src/runpod_flash/cli/commands/build_utils/mothership_handler_generator.py |
Deleted (replaced by generic templates) |
src/runpod_flash/cli/commands/build.py |
Added QB handler generation, removed flash dep bundling |
src/runpod_flash/cli/commands/_run_server_helpers.py |
Added make_wrapped_model |
src/runpod_flash/__init__.py |
Added ServerlessScalerType to exports |
README.md |
Updated examples to use runsync with wrapped input |
CLAUDE.md |
Complete rewrite with project architecture documentation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
When an LB endpoint only has GET routes (e.g., /images), no curl example was shown after deploy. Fall back to the first GET route when no POST routes exist, omitting the request body and Content-Type header from the curl example.
Execution lifecycle events (API calls, job status, worker timing, route dispatch) were invisible at the default log level. Promote them to INFO so users see request flow without enabling DEBUG. Also add route-label logging to lb_execute and LoadBalancerSlsStub.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the fully deployed environment feature for Flash, enabling cross-endpoint communication, JSON-based serialization for deployed calls, and a streamlined deployment pipeline.
resources_endpointsto Manifest, use JSON serialization for deployed cross-endpoint calls, and refactor ServiceRegistry for endpoint URL populationrunpod_flashimport at runtime, wrap LB handler params as JSON bodyflash runstartup), stop bundling flash deps that shadow base image packages, eliminate noisy debug warningsFLASH_ENVIRONMENT_IDfor State Manager queriesrun_synctorunsync, setFLASH_ENDPOINT_TYPE=lbalongside legacyFLASH_IS_MOTHERSHIPServerlessScalerTypeto top-level exportsChanges
63 files changed across 21 commits.
New files
src/runpod_flash/runtime/generic_handler.py-- deployed QB handler for plain JSON endpointssrc/runpod_flash/runtime/resource_provisioner.py-- replaces mothership provisionertests/integration/test_deployment_url_population.py-- integration tests for endpoint URL populationtests/unit/runtime/test_deployed_handler.py,test_resource_provisioner.py,test_models.pytests/unit/cli/commands/build_utils/test_handler_generator.pytests/unit/cli/commands/test_build.py,tests/unit/cli/test_deploy.pytests/unit/test_function_request_response_serialization.pyRemoved files
src/runpod_flash/runtime/mothership_provisioner.py-- replaced by resource_provisionersrc/runpod_flash/runtime/manifest_fetcher.py-- consolidated into service_registrysrc/runpod_flash/cli/commands/test_mothership.py-- stale test commandsrc/runpod_flash/cli/commands/build_utils/mothership_handler_generator.pyTest plan
make test-unit)make quality-checkpasses (format, lint, typecheck, coverage >= 35%)flash runstarts without slow scanner delayflash buildproduces correct handler code (norunpod_flashimport in deployed handler)flash deploy --previewworks with updated Docker Compose configDeploy Example