feat: implicit flash endpoint resolution + CLI overhaul by KAJdev · Pull Request #324 · runpod/flash

KAJdev · 2026-04-22T16:45:02Z

Two major changes:

Flash now resolves deployed endpoints by app/environment/endpoint name instead of requiring endpoint IDs. Uses FLASH_APP and FLASH_ENV env vars with X-Flash-App, X-Flash-Environment, X-Flash-Endpoint headers routed through ai-api middleware.

Redesigns output for flash dev, flash deploy, flash app, flash env, and flash undeploy to be clean, consistent, and useful for real development workflows.

The main flow is now flash dev

and then flash deploy in order to deploy all of your code

if you want to call your code programmatically, instead of via ai-api, you can just modify your script to call the function and then run it

e.g.

from runpod_flash import Endpoint, GpuGroup
import asyncio

@Endpoint(name="my-worker", gpu=GpuGroup.AMPERE_24, workers=5)
async def worker():
    print("doing some work!")
    return True

if __name__ == "__main__":
    asyncio.run(worker())

running this script will hit your deployed worker.

the FLASH_APP and FLASH_ENV environment variables override the app and environment assumptions.

…ote calls

…messages

…ion-archetecture

promptless · 2026-04-22T16:57:02Z

Promptless prepared a documentation update related to this change.

Triggered by runpod/flash PR #324

Updated CLI documentation for the flash run → flash dev rename and documented the new implicit endpoint resolution feature using FLASH_APP and FLASH_ENV environment variables. Updated 20 files across the Flash documentation.

Review: Flash CLI dev rename + endpoint resolution

runpod-Henrik

QA Review — PR 324

Finding 1: get_flash_context() docstring contradicts the code

flash_context.py — the docstring says:

precedence:
1. FLASH_IS_LIVE_PROVISIONING=true forces live (flash dev)
2. FLASH_APP + FLASH_ENV both set -> sentinel
3. anything else -> live flow

The code does not fall through to a "live flow" for case 3. It returns None, which causes the caller (client.py, endpoint.py) to raise RuntimeError. Any developer reading this docstring will expect fallback — there is none.

Finding 2: Breaking change not documented in the PR description

Before this PR, a @remote-decorated function called outside of flash dev would fall through to ResourceManager for dynamic provisioning. After this PR, the same call raises RuntimeError("no flash context for endpoint '...'"). This is a behaviour change for any user who:

calls @remote functions programmatically without flash dev
has code that uses ResourceManager for ad-hoc provisioning

The PR description mentions the new sentinel flow but does not say the ResourceManager fallback is removed. If this removal is intentional, it needs a migration note.

Finding 3: Partial env var config gives a misleading error

If a user sets FLASH_APP but not FLASH_ENV (or vice versa), get_flash_context() returns None — same as no env vars at all — and the caller raises:

RuntimeError: no flash context for endpoint '...'. either:
  - use 'flash dev' for local development
  - set FLASH_APP and FLASH_ENV to target a deployed environment

The error gives no indication which env var is missing. A user who half-configured their environment sees the same message as a user who configured nothing. Adding "FLASH_APP is set but FLASH_ENV is missing" (or vice versa) to the error would avoid a debugging round-trip.

Finding 4: _handle_sentinel_response silently passes through unexpected response shapes

flash_sentinel.py:

output = data.get("output", data)

if isinstance(output, dict) and "error" in output:
    raise RuntimeError(...)

return output

If the sentinel returns a response with neither a status nor an output key (e.g., an unexpected shape from ai-api), data.get("output", data) returns the full raw dict. If that dict happens not to contain an "error" key, it's returned to the caller as if it were a successful result. The user gets back an unexpected dict with no error raised. The QB path should assert the expected keys are present rather than falling through silently.

Finding 5: LB sentinel path has no error handling for FAILED status

The QB path routes through _handle_sentinel_response() which checks data.get("status") == "FAILED". The LB path (sentinel_lb_request) calls response.raise_for_status() and returns response.json() directly — no status check, no error extraction. A FAILED response from ai-api with HTTP 200 and {"status": "FAILED", "error": "..."} body would be returned as a raw dict to the LB caller with no exception raised.

Question: What does _normalize_resource_name() assume about resource names?

endpoint.py passes _normalize_resource_name(self.name) as the sentinel endpoint name — stripping a live- prefix and -fb suffix if present. Are deployed endpoint names always generated with these affixes by the platform, making user-set resource config names unreliable as sentinel targets? Or does this normalization only apply to a subset of resources? If a user names a resource live-worker, normalization produces worker — potentially routing to the wrong endpoint.

No issues with the core sentinel flow mechanics. The header-based routing design is clear, sentinel_qb_execute + _args_to_kwargs is straightforward, and the new test coverage for sentinel vs live dispatch paths is solid.

…ge-mode test

jhcipar · 2026-04-23T01:41:42Z

+    raises:
+        RuntimeError: if remote execution fails
+    """
+    body: Dict[str, Any] = {"method": request.method_name}


I think this doesn't work if decorated classes require constructors, because this only forwards method kwargs, i think those need to get passed as well. this fails for any classes w/ a constructor that has args

KAJdev added 30 commits April 16, 2026 16:45

feat: implicit flash endpoint resolution via sentinel headers

a8b2420

refactor: drop flash.toml support in favor of env vars

f2ceab2

feat: rename flash run to flash dev, require explicit context for rem…

f92db36

…ote calls

merge: resolve conflict in execute_class.py

ad90740

feat: rename flash run to flash dev, require explicit context for rem…

978ba69

…ote calls

refactor: clean up CLI output formatting

fb37042

refactor: establish consistent color palette across CLI

5485e3c

refactor: flatten deploy output, remove nesting

6df78e1

refactor: use tree chars for deploy endpoint listing

d0161aa

fix: handle hyphenated directory names in flash dev codegen

04c6d7b

refactor: route worker logs through print() instead of logging

cec8019

fix: improve worker log filtering and add color to runtime output

40c518f

fix: indent user stdout under request, print before completion line

8015207

fix: worker log filters now handle timezone offsets and JSON-wrapped …

f11911d

…messages

fix: drop Rich Status spinner for pull progress

04744f0

fix: duplicate logs on subsequent requests to warm workers

da65c5d

feat: redesign dev console lifecycle output

409a4cd

fix: strip 'live-' prefix from endpoint names in dev console output

8d7d33c

feat: redesign flash dev startup and shutdown output

a24b9ec

fix: detect duplicate endpoint names across files in manifest builder

bfad91e

fix: clean up flash dev startup route table

939182a

feat: redesign flash deploy output

23486a7

fix: standardize spinner styles and add completion lines

b91ac4d

feat: add upload progress bar to flash deploy

7e7ee8a

feat: redesign flash app and env command output

d3c06f7

feat: add column headers to app and env list/get output

f6f4bd5

feat: simplify app list and env list output

a8baa6a

feat: redesign undeploy command output

0379d5d

feat: G1a log format for flash dev runtime

478068e

fix: align name columns in dev console output

e318a3a

KAJdev added 5 commits April 22, 2026 09:22

fix: use resource_name not name on WorkerInfo

6074f67

fix: set_name_width in generated server.py not parent process

a80033c

fix: catch remote execution errors in dev server route handlers

3074db6

Update pyproject.toml

07cae82

style: run ruff format

781f900

KAJdev requested review from deanq and jhcipar April 22, 2026 16:47

KAJdev added 3 commits April 22, 2026 09:47

fix: lint errors (F541 f-string, F401 unused import)

f5ffdfa

Merge branch 'main' into zeke/ae-2741-implicit-flash-endpoint-resolut…

38f11dc

…ion-archetecture

fix: unused variable lint errors

3deb9ca

KAJdev added 4 commits April 22, 2026 10:10

fix: update tests to match new CLI output format

368fbb7

fix: set FLASH_IS_LIVE_PROVISIONING in integration tests

89cf61e

fix: set .name on mock resources in LB and live serverless tests

6a4e833

fix: set FLASH_IS_LIVE_PROVISIONING in concurrency integration tests

6420b78

runpod-Henrik reviewed Apr 22, 2026

View reviewed changes

KAJdev added 11 commits April 22, 2026 12:18

fix: pad empty sentinel input to prevent runpod dropping input field

3c7c7ac

style: format

4759db0

fix: remove unused os import

741be70

fix: update handler generator tests for empty input acceptance

dbcf20e

fix: skip sentinel for client-mode endpoints, update empty input tests

528c7d4

fix: keep sentinel for client endpoints, set live provisioning in ima…

5a4080a

…ge-mode test

fix: live provisioning only in flash dev, guard fallback path

562962d

fix: use Live resource classes for all non-deploy contexts

eee09e3

fix: catch sentinel timeout with clear error message, 30s default

1fe3394

fix: sentinel timeout 90s

b2731b4

fix: update _is_live_provisioning tests for new default behavior

844b106

jhcipar reviewed Apr 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implicit flash endpoint resolution + CLI overhaul#324

feat: implicit flash endpoint resolution + CLI overhaul#324
KAJdev wants to merge 53 commits intomainfrom
zeke/ae-2741-implicit-flash-endpoint-resolution-archetecture

KAJdev commented Apr 22, 2026 •

edited

Loading

Uh oh!

promptless Bot commented Apr 22, 2026

Uh oh!

runpod-Henrik left a comment

Uh oh!

jhcipar Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

KAJdev commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

promptless Bot commented Apr 22, 2026

Uh oh!

runpod-Henrik left a comment

Choose a reason for hiding this comment

QA Review — PR 324

Uh oh!

jhcipar Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

KAJdev commented Apr 22, 2026 •

edited

Loading