Skip to content

Comments

feat: fully deployed environment#215

Open
deanq wants to merge 24 commits intomainfrom
deanq/ae-2079-fully-deployed-environment
Open

feat: fully deployed environment#215
deanq wants to merge 24 commits intomainfrom
deanq/ae-2079-fully-deployed-environment

Conversation

@deanq
Copy link
Member

@deanq deanq commented Feb 23, 2026

Summary

Implements the fully deployed environment feature for Flash, enabling cross-endpoint communication, JSON-based serialization for deployed calls, and a streamlined deployment pipeline.

  • Cross-endpoint routing: Add resources_endpoints to Manifest, use JSON serialization for deployed cross-endpoint calls, and refactor ServiceRegistry for endpoint URL population
  • Deployed handlers: Add generic QB handler for plain JSON endpoints, inline deployed handler to avoid runpod_flash import at runtime, wrap LB handler params as JSON body
  • Build pipeline: Ignore-aware file walker for scanner (fixes slow flash run startup), stop bundling flash deps that shadow base image packages, eliminate noisy debug warnings
  • Deployment polish: Self-contained LB/QB curl sections, polished deployment output, use FLASH_ENVIRONMENT_ID for State Manager queries
  • Cleanup: Remove mothership terminology and stale references, rename run_sync to runsync, set FLASH_ENDPOINT_TYPE=lb alongside legacy FLASH_IS_MOTHERSHIP
  • Exports: Add missing ServerlessScalerType to top-level exports
  • Dev UX: Surface docstrings in startup table and Swagger UI

Changes

63 files changed across 21 commits.

New files

  • src/runpod_flash/runtime/generic_handler.py -- deployed QB handler for plain JSON endpoints
  • src/runpod_flash/runtime/resource_provisioner.py -- replaces mothership provisioner
  • tests/integration/test_deployment_url_population.py -- integration tests for endpoint URL population
  • tests/unit/runtime/test_deployed_handler.py, test_resource_provisioner.py, test_models.py
  • tests/unit/cli/commands/build_utils/test_handler_generator.py
  • tests/unit/cli/commands/test_build.py, tests/unit/cli/test_deploy.py
  • tests/unit/test_function_request_response_serialization.py

Removed files

  • src/runpod_flash/runtime/mothership_provisioner.py -- replaced by resource_provisioner
  • src/runpod_flash/runtime/manifest_fetcher.py -- consolidated into service_registry
  • src/runpod_flash/cli/commands/test_mothership.py -- stale test command
  • src/runpod_flash/cli/commands/build_utils/mothership_handler_generator.py
  • Associated test files for removed modules

Test plan

  • All unit tests pass (make test-unit)
  • Integration tests pass for endpoint URL population
  • make quality-check passes (format, lint, typecheck, coverage >= 35%)
  • flash run starts without slow scanner delay
  • flash build produces correct handler code (no runpod_flash import in deployed handler)
  • flash deploy --preview works with updated Docker Compose config
  • Cross-endpoint calls use JSON serialization in deployed environment

Deploy Example

image

deanq added 21 commits February 21, 2026 19:37
…t communication

Gap-driven PRD covering four work streams: cloudpickle removal from
deployed environments, LB handler terminology cleanup, endpoint URL
population post-provisioning, and selective RUNPOD_API_KEY injection.
…t/Response

Widen args/kwargs from List[str]/Dict[str,str] to List[Any]/Dict[str,Any],
add serialization_format field (default "cloudpickle"), and json_result
field on FunctionResponse. Backward compatible -- existing cloudpickle
callers work without changes.
Provides type safety for the resources_endpoints dict that the
deployment pipeline already populates in manifest JSON after
provisioning. Includes backward-compatible handling: from_dict
defaults to None, to_dict omits the key when None.
…SHIP

Update create_resource_from_manifest() to set the new FLASH_ENDPOINT_TYPE=lb
env var for load-balanced resources, while preserving FLASH_IS_MOTHERSHIP=true
for backward compatibility during transition. The condition now accepts both
is_mothership (legacy) and is_load_balanced (new) resource data flags.
Remove cloudpickle serialize_args/serialize_kwargs from ProductionWrapper
_execute_remote. Pass raw args as list and kwargs as dict with
serialization_format: json in the payload, matching the FunctionRequest
schema added in the prior commit.
Verify that reconcile_and_provision_resources() correctly populates
resources_endpoints after provisioning and pushes the manifest to
State Manager. Also fix unused variable lint error in test_deployment.
…nvelope

Aligns local dev server with RunPod's actual API format: /runsync URLs
and {"input": {...}} request envelope.
Root cause of slow flash run: rglob("*.py") walked 8,946 files (nested
.venv dirs). Replace with .gitignore/.flashignore-aware walker that
prunes directories early. Also replace runtime importlib check for
resource config types with a static frozenset.
ServerlessScalerType was dropped from runpod_flash/__init__.py when
lazy loading was introduced, causing ImportError on
from runpod_flash import ServerlessScalerType.
Extract first-line docstrings from @Remote functions/classes during AST
scan and propagate them through WorkerInfo to both the CLI startup table
(renamed "Resource" column to "Description") and generated FastAPI route
summaries. Functions without docstrings fall back to the function name.
- Remove stale FLASH_IS_MOTHERSHIP assertions from test_resource_provisioner
- Update PRD.md to reflect completed mothership-to-peer migration
- Add completion status headers to historical plan documents
- Rename mothership_provisioner.py to resource_provisioner.py
- Delete obsolete mothership-related files and tests
- Update all remaining mothership comments to LB endpoint terminology
Deployed QB endpoints now accept plain JSON input instead of requiring
FunctionRequest/cloudpickle serialization. The build pipeline generates
handler_<resource_name>.py files for each QB resource, and the manifest
includes handler_file so flash-worker can delegate to them.

Runtime changes:
- create_deployed_handler() in generic_handler.py for plain JSON dispatch
- ProductionWrapper QB/LB routing with get_routing_info()
- ServiceRegistry routing info and endpoint URL population
- ResourceConfig.is_load_balanced / is_live_resource flags on models
- Cloudpickle imports scoped to LB handler only

Build pipeline changes:
- HandlerGenerator called in run_build() for QB resources
- Manifest includes handler_file for non-LB resources
- DEPLOYED_HANDLER_TEMPLATE for deployed (non-live) QB endpoints
The generated handler_<name>.py for deployed QB endpoints imported
create_deployed_handler from runpod_flash.runtime.generic_handler,
which triggered a pydantic import chain. When the bundled pydantic_core
binary extension was incompatible with the container, both the generated
handler and the FunctionRequest fallback failed.

Inline the handler logic directly in the template using only stdlib
imports (asyncio, inspect, traceback). This eliminates the runpod_flash
dependency from generated deployed handlers entirely.
_extract_runpod_flash_dependencies() added flash's own dependencies
(pydantic, cloudpickle, etc.) to the build tarball. When extracted to
/app, these shadowed the working packages from the Docker base image.
pydantic_core's native .so compiled for a different platform caused
import failures.

The base Docker image already includes all flash runtime dependencies.
Only the flash source code needs bundling via --use-local-flash.
…rams

Deployed LB endpoints returned HTTP 422 because FastAPI treated simple
typed parameters (str, int) as query parameters. Adds dynamic Pydantic
body model generation at route registration time for POST/PUT/PATCH/DELETE
handlers, matching the pattern used by the dev server's make_input_model.
State Manager queries in ServiceRegistry._ensure_manifest_loaded() were
passing RUNPOD_ENDPOINT_ID (the serverless endpoint ID) as the flash
environment ID, causing "Flash environment not found" errors and
falling through to ResourceManager for unnecessary re-provisioning.

- service_registry: read FLASH_ENVIRONMENT_ID instead of RUNPOD_ENDPOINT_ID
- resource_provisioner: inject FLASH_ENVIRONMENT_ID into env for
  endpoints with makes_remote_calls=True
- client: add _resolve_deployed_endpoint_id() to look up pre-deployed
  endpoints via ServiceRegistry before falling back to ResourceManager
- models: filter unknown keys in ResourceConfig.from_dict() for forward
  compatibility with manifest field changes
- manifest: remove per-function fields (is_load_balanced, is_live_resource,
  config_variable) that belong at resource level only
- build: generate _flash_resource_config.py after bundling local flash
- CLAUDE.md: replace worktree template with auto-generated project docs
- Spell out LB/QB abbreviations (Load-balanced/Queue-based)
- Add RunPod console link alongside doc links
- Demote noisy INFO logs to DEBUG (resource config, env injection, LB deploy)
- Remove attribute-call matching in scanner to prevent false positives
Reorganize deployment output so each endpoint type is a complete section:
- LB: URLs + routes + one curl example (first POST route)
- QB: URLs + one curl example using /runsync
- Console/docs links moved to end as "view all" closer
- Fix curl continuation line indentation
Replace exec_module() with ast.parse() in handler validation to avoid
ImportErrors from modules that only resolve at runtime inside Docker.
Add parent directory to sys.path in _extract_deployment_config() so
sibling imports resolve during config extraction.
@deanq deanq changed the title feat: fully deployed environment (AE-2079) feat: fully deployed environment Feb 23, 2026
@deanq deanq requested a review from Copilot February 23, 2026 04:46
…or handling, and observability

- Fix shared resource_config mutation in @Remote wrapper using model_copy()
- Add error logging to deployed handler and generated handler template
- Replace deprecated asyncio.get_event_loop() with get_running_loop() fallback
- Raise ValueError at build time for empty function lists in handler generator
- Upgrade log levels for manifest lookup failures and missing endpoint URLs
- Preserve stale endpoint cache on State Manager unavailability
- Add warning logs for LB handler introspection failures
- Escape newlines/carriage returns in generated string literals
- Warn when makes_remote_calls=True but RUNPOD_API_KEY is missing
- Differentiate httpx error types in remote LB execution
- Fix stale run_sync mock name to runsync in conftest
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements the fully deployed environment feature for Flash, enabling cross-endpoint communication with JSON serialization for deployed calls and a streamlined deployment pipeline.

Changes:

  • Cross-endpoint routing via manifest-based ServiceRegistry with JSON serialization for deployed QB/LB calls
  • Inline deployed handler template to avoid runpod_flash import at runtime, wrap LB handler params as JSON body
  • Ignore-aware file walker for scanner, stop bundling flash deps that shadow base image packages
  • Rename terminology from "mothership" to "load_balancer", use FLASH_ENVIRONMENT_ID for State Manager queries
  • Add missing ServerlessScalerType to top-level exports, surface docstrings in startup table and Swagger UI

Reviewed changes

Copilot reviewed 63 out of 63 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/unit/test_function_request_response_serialization.py New comprehensive tests for JSON serialization format support in FunctionRequest/Response
tests/unit/test_client_should_execute_locally.py Added tests for _resolve_deployed_endpoint_id and wrapper manifest lookup
tests/unit/runtime/test_service_registry.py Added get_routing_info tests, updated env var from RUNPOD_ENDPOINT_ID to FLASH_ENVIRONMENT_ID
tests/unit/runtime/test_resource_provisioner.py New tests for create_resource_from_manifest with LB and remote call configurations
tests/unit/runtime/test_production_wrapper.py Updated tests for QB/LB dispatch split, renamed run_sync to runsync
tests/unit/runtime/test_models.py New tests for resources_endpoints field and is_load_balanced/is_live_resource flags
tests/unit/runtime/test_lb_handler.py Added tests for _make_input_model, _wrap_handler_with_body_model body parsing
tests/unit/runtime/test_deployed_handler.py New tests for create_deployed_handler (plain JSON QB handler)
tests/unit/resources/test_serverless.py Renamed run_sync to runsync throughout
tests/unit/cli/utils/test_deployment.py Updated terminology mothership→load_balancer, added API key validation tests
tests/unit/cli/test_run.py Updated codegen expectations for wrapped body model and runsync paths
tests/unit/cli/test_deploy.py Added comprehensive tests for _display_post_deployment_guidance output
tests/unit/cli/commands/test_run_server_helpers.py Added tests for make_wrapped_model
tests/unit/cli/commands/test_run.py Updated codegen tests for wrapped Request models and body.input access
tests/unit/cli/commands/test_preview.py Renamed is_mothership to is_load_balanced throughout
tests/unit/cli/commands/test_build.py Added tests for QB handler generation in build pipeline
tests/unit/cli/commands/build_utils/test_scanner_load_balancer.py Removed detect_explicit_mothership tests (function deleted)
tests/unit/cli/commands/build_utils/test_scanner.py Added test_exclude_nested_venv_directory, calls_remote_functions tracking tests
tests/unit/cli/commands/build_utils/test_resource_config_generator.py Updated terminology and log level expectations
tests/unit/cli/commands/build_utils/test_manifest.py Added tests for handler_file, makes_remote_calls, sys.path handling
tests/unit/cli/commands/build_utils/test_handler_generator.py Added tests for deployed handler template and ast.parse validation
tests/integration/test_deployment_url_population.py New integration tests for endpoint URL population in manifest
tests/integration/test_cross_endpoint_routing.py Updated for runsync and FLASH_ENVIRONMENT_ID
src/runpod_flash/stubs/serverless.py Renamed run_sync to runsync
src/runpod_flash/stubs/live_serverless.py Renamed run_sync to runsync
src/runpod_flash/runtime/state_manager_client.py Renamed mothership_id to flash_environment_id throughout
src/runpod_flash/runtime/service_registry.py Added get_routing_info method, updated to use FLASH_ENVIRONMENT_ID
src/runpod_flash/runtime/resource_provisioner.py New module replacing mothership_provisioner for resource creation
src/runpod_flash/runtime/production_wrapper.py Split _execute_remote into _execute_remote_qb and _execute_remote_lb
src/runpod_flash/runtime/models.py Added resources_endpoints, is_load_balanced, is_live_resource fields
src/runpod_flash/runtime/lb_handler.py Added _make_input_model, _wrap_handler_with_body_model for body parsing
src/runpod_flash/runtime/generic_handler.py Added create_deployed_handler for plain JSON endpoints
src/runpod_flash/protos/remote_execution.py Changed args/kwargs to List[Any]/Dict[str, Any], added serialization_format and json_result
src/runpod_flash/core/utils/http.py Updated comments mothership→load_balancer
src/runpod_flash/core/resources/serverless.py Renamed run_sync to runsync, reduced log noise
src/runpod_flash/core/resources/load_balancer_sls_resource.py Updated terminology and env var to FLASH_ENDPOINT_TYPE=lb
src/runpod_flash/client.py Added _resolve_deployed_endpoint_id for manifest-based endpoint lookup
src/runpod_flash/cli/utils/skeleton_template/README.md Updated examples to use runsync
src/runpod_flash/cli/utils/ignore.py Added .venv/, venv/, .runpod/ to always_ignore
src/runpod_flash/cli/utils/deployment.py Added API key validation, updated terminology, use resource_provisioner
src/runpod_flash/cli/docs/flash-run.md Updated run_sync references to runsync
src/runpod_flash/cli/docs/README.md Updated curl examples to use runsync with wrapped input
src/runpod_flash/cli/commands/run.py Added docstring extraction and display, updated to generate wrapped Request models
src/runpod_flash/cli/commands/preview.py Renamed mothership to load_balancer throughout
src/runpod_flash/cli/commands/deploy.py Refactored _display_post_deployment_guidance to show LB and QB endpoints separately
src/runpod_flash/cli/commands/build_utils/scanner.py Use ignore-aware file walker, extract docstrings, add calls_remote_functions tracking
src/runpod_flash/cli/commands/build_utils/resource_config_generator.py Reduced log level to debug
src/runpod_flash/cli/commands/build_utils/manifest.py Added sys.path handling for sibling imports, handler_file for QB resources
src/runpod_flash/cli/commands/build_utils/lb_handler_generator.py Changed validation to ast.parse instead of import
src/runpod_flash/cli/commands/build_utils/handler_generator.py Added DEPLOYED_HANDLER_TEMPLATE for plain JSON, ast.parse validation
src/runpod_flash/cli/commands/build_utils/mothership_handler_generator.py Deleted (replaced by generic templates)
src/runpod_flash/cli/commands/build.py Added QB handler generation, removed flash dep bundling
src/runpod_flash/cli/commands/_run_server_helpers.py Added make_wrapped_model
src/runpod_flash/__init__.py Added ServerlessScalerType to exports
README.md Updated examples to use runsync with wrapped input
CLAUDE.md Complete rewrite with project architecture documentation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

When an LB endpoint only has GET routes (e.g., /images), no curl
example was shown after deploy. Fall back to the first GET route
when no POST routes exist, omitting the request body and
Content-Type header from the curl example.
Execution lifecycle events (API calls, job status, worker timing,
route dispatch) were invisible at the default log level. Promote
them to INFO so users see request flow without enabling DEBUG.
Also add route-label logging to lb_execute and LoadBalancerSlsStub.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant