Skip to content

feat: implicit flash endpoint resolution + CLI overhaul#324

Open
KAJdev wants to merge 53 commits intomainfrom
zeke/ae-2741-implicit-flash-endpoint-resolution-archetecture
Open

feat: implicit flash endpoint resolution + CLI overhaul#324
KAJdev wants to merge 53 commits intomainfrom
zeke/ae-2741-implicit-flash-endpoint-resolution-archetecture

Conversation

@KAJdev
Copy link
Copy Markdown
Contributor

@KAJdev KAJdev commented Apr 22, 2026

Two major changes:

Flash now resolves deployed endpoints by app/environment/endpoint name instead of requiring endpoint IDs. Uses FLASH_APP and FLASH_ENV env vars with X-Flash-App, X-Flash-Environment, X-Flash-Endpoint headers routed through ai-api middleware.

Redesigns output for flash dev, flash deploy, flash app, flash env, and flash undeploy to be clean, consistent, and useful for real development workflows.

The main flow is now flash dev
image

and then flash deploy in order to deploy all of your code
image

if you want to call your code programmatically, instead of via ai-api, you can just modify your script to call the function and then run it

e.g.

from runpod_flash import Endpoint, GpuGroup
import asyncio

@Endpoint(name="my-worker", gpu=GpuGroup.AMPERE_24, workers=5)
async def worker():
    print("doing some work!")
    return True

if __name__ == "__main__":
    asyncio.run(worker())

running this script will hit your deployed worker.

the FLASH_APP and FLASH_ENV environment variables override the app and environment assumptions.

KAJdev added 30 commits April 16, 2026 16:45
@KAJdev KAJdev requested review from deanq and jhcipar April 22, 2026 16:47
@promptless
Copy link
Copy Markdown

promptless Bot commented Apr 22, 2026

Promptless prepared a documentation update related to this change.

Triggered by runpod/flash PR #324

Updated CLI documentation for the flash runflash dev rename and documented the new implicit endpoint resolution feature using FLASH_APP and FLASH_ENV environment variables. Updated 20 files across the Flash documentation.

Review: Flash CLI dev rename + endpoint resolution

Copy link
Copy Markdown
Contributor

@runpod-Henrik runpod-Henrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QA Review — PR 324

Finding 1: get_flash_context() docstring contradicts the code

flash_context.py — the docstring says:

precedence:
1. FLASH_IS_LIVE_PROVISIONING=true forces live (flash dev)
2. FLASH_APP + FLASH_ENV both set -> sentinel
3. anything else -> live flow

The code does not fall through to a "live flow" for case 3. It returns None, which causes the caller (client.py, endpoint.py) to raise RuntimeError. Any developer reading this docstring will expect fallback — there is none.


Finding 2: Breaking change not documented in the PR description

Before this PR, a @remote-decorated function called outside of flash dev would fall through to ResourceManager for dynamic provisioning. After this PR, the same call raises RuntimeError("no flash context for endpoint '...'"). This is a behaviour change for any user who:

  • calls @remote functions programmatically without flash dev
  • has code that uses ResourceManager for ad-hoc provisioning

The PR description mentions the new sentinel flow but does not say the ResourceManager fallback is removed. If this removal is intentional, it needs a migration note.


Finding 3: Partial env var config gives a misleading error

If a user sets FLASH_APP but not FLASH_ENV (or vice versa), get_flash_context() returns None — same as no env vars at all — and the caller raises:

RuntimeError: no flash context for endpoint '...'. either:
  - use 'flash dev' for local development
  - set FLASH_APP and FLASH_ENV to target a deployed environment

The error gives no indication which env var is missing. A user who half-configured their environment sees the same message as a user who configured nothing. Adding "FLASH_APP is set but FLASH_ENV is missing" (or vice versa) to the error would avoid a debugging round-trip.


Finding 4: _handle_sentinel_response silently passes through unexpected response shapes

flash_sentinel.py:

output = data.get("output", data)

if isinstance(output, dict) and "error" in output:
    raise RuntimeError(...)

return output

If the sentinel returns a response with neither a status nor an output key (e.g., an unexpected shape from ai-api), data.get("output", data) returns the full raw dict. If that dict happens not to contain an "error" key, it's returned to the caller as if it were a successful result. The user gets back an unexpected dict with no error raised. The QB path should assert the expected keys are present rather than falling through silently.


Finding 5: LB sentinel path has no error handling for FAILED status

The QB path routes through _handle_sentinel_response() which checks data.get("status") == "FAILED". The LB path (sentinel_lb_request) calls response.raise_for_status() and returns response.json() directly — no status check, no error extraction. A FAILED response from ai-api with HTTP 200 and {"status": "FAILED", "error": "..."} body would be returned as a raw dict to the LB caller with no exception raised.


Question: What does _normalize_resource_name() assume about resource names?

endpoint.py passes _normalize_resource_name(self.name) as the sentinel endpoint name — stripping a live- prefix and -fb suffix if present. Are deployed endpoint names always generated with these affixes by the platform, making user-set resource config names unreliable as sentinel targets? Or does this normalization only apply to a subset of resources? If a user names a resource live-worker, normalization produces worker — potentially routing to the wrong endpoint.


No issues with the core sentinel flow mechanics. The header-based routing design is clear, sentinel_qb_execute + _args_to_kwargs is straightforward, and the new test coverage for sentinel vs live dispatch paths is solid.

raises:
RuntimeError: if remote execution fails
"""
body: Dict[str, Any] = {"method": request.method_name}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this doesn't work if decorated classes require constructors, because this only forwards method kwargs, i think those need to get passed as well. this fails for any classes w/ a constructor that has args

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants