feat: migrate from AST to import + inspect by KAJdev · Pull Request #265 · runpod/flash

KAJdev · 2026-03-11T19:01:29Z

Replace AST-based resource discovery with runtime import scanning

the old scanner (RemoteDecoratorScanner, ResourceDiscovery) used Python's AST module to statically parse source files and find @remote decorated functions and resource configurations. This worked for simple cases but broke on any dynamic Python pattern (computed decorators, conditional definitions, programmatic resource construction, class inheritance, etc.)

this approach imports each user module via importlib and inspects live objects. The @remote decorator and Endpoint class stamp a __remote_config__ dict on decorated functions, so the scanner just walks module members looking for that attribute. this means anything that runs at import time is discovered correctly, regardless of how it was constructed.

How it works

RuntimeScanner in scanner.py:

Collects .py files in the project (respecting .flashignore)
Imports each file as a module via importlib.util
Walks module members for __remote_config__ attributes
Extracts RemoteFunctionMetadata from the live objects (resource config, function signature, dependencies, paths)
Runs a lightweight AST pass on function bodies only for cross-call analysis (detecting when one @remote function calls another)

Import failures are collected and surfaced clearly in both flash run and flash deploy. Syntax errors, missing dependencies, and validation errors (e.g. forgetting name= on a load-balanced endpoint) show the filename and error message. The build stops immediately on fatal errors instead of silently continuing with zero discovered functions.

Fixes AE-2321
Fixes AE-2318

Other fixes bundled in

idleTimeout missing from deploy pipeline: _extract_config_properties() in manifest.py and create_resource_from_manifest() in resource_provisioner.py were not passing idleTimeout through, so all endpoints got the default 60s regardless of what the user specified.
_wrapped_class on RemoteClassWrapper: Added a class-level attribute so the scanner can recover the original class name and method definitions from wrapped classes.

Copilot

Pull request overview

This PR migrates Flash’s project scanning and resource discovery from AST-based parsing to runtime importing + object inspection, enabling detection of dynamically constructed endpoints/resources while also fixing missing idleTimeout propagation.

Changes:

Replace RemoteDecoratorScanner/ResourceDiscovery patterns with RuntimeScanner-driven import+inspect scanning across build/run flows.
Update Endpoint to support nameless QB decorator mode (derive name from decorated target) while requiring name/id only for image= client mode and enforcing name for LB route registration.
Propagate idleTimeout through manifest generation and runtime provisioning; update/trim tests accordingly.

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/unit/test_p2_remaining_gaps_2.py	Updates scanner references from `RemoteDecoratorScanner` to `RuntimeScanner` for empty-project behavior.
tests/unit/test_p2_remaining_gaps.py	Updates nested-class/conditional scanning tests to use `RuntimeScanner`.
tests/unit/test_p2_gaps.py	Updates docstring extraction test to use `RuntimeScanner`.
tests/unit/test_endpoint.py	Adjusts validation expectations and adds coverage for nameless QB decorator mode.
tests/unit/test_discovery_endpoint.py	Removes ResourceDiscovery endpoint pattern tests (scanner/discovery migration).
tests/unit/test_discovery.py	Removes ResourceDiscovery unit tests (scanner/discovery migration).
tests/unit/cli/test_run.py	Updates mocks for `_scan_project_workers` new return shape.
tests/unit/cli/commands/test_run_endpoint.py	Updates `_scan_project_workers` call sites for new return shape.
tests/unit/cli/commands/test_run.py	Updates `_scan_project_workers` call sites for new return shape.
tests/unit/cli/commands/test_build.py	Updates test fixtures to construct valid Live* resources under runtime import scanning (adds `name=`).
tests/unit/cli/commands/build_utils/test_scanner_load_balancer.py	Removes AST-scanner FastAPI detection tests (scanner migration).
tests/unit/cli/commands/build_utils/test_scanner_endpoint.py	Removes AST-scanner Endpoint pattern tests (scanner migration).
tests/unit/cli/commands/build_utils/test_scanner.py	Removes AST-scanner test suite (scanner migration).
tests/unit/cli/commands/build_utils/test_path_utilities.py	Updates scanner import to `RuntimeScanner` for LB/QB flags and init exclusion tests.
tests/integration/test_lb_remote_execution.py	Updates integration assertions for `RuntimeScanner` and name normalization (`-fb` suffix).
tests/integration/test_build_pipeline.py	Updates scanner import and expected resource naming (`-fb` suffix); minor docstring normalization.
src/runpod_flash/runtime/resource_provisioner.py	Passes `idleTimeout` from manifest into deployment kwargs.
src/runpod_flash/execute_class.py	Adds `_wrapped_class` to `RemoteClassWrapper` for scanner/introspection to recover original class metadata.
src/runpod_flash/endpoint.py	Allows nameless QB decorator mode (derive `name` in `__call__`), tightens LB route validation, clarifies image-mode requirement.
src/runpod_flash/core/discovery.py	Removes AST-based `ResourceDiscovery` implementation.
src/runpod_flash/cli/commands/run.py	Switches runtime worker scan/discovery to import+inspect; surfaces scanner import errors; updates worker scanning API.
src/runpod_flash/cli/commands/build_utils/scanner.py	Introduces `RuntimeScanner` import+inspect discovery with AST-only cross-call analysis pass.
src/runpod_flash/cli/commands/build_utils/manifest.py	Ensures `idleTimeout` is included in manifest resource config extraction.
src/runpod_flash/cli/commands/build.py	Uses `RuntimeScanner` in build; prints import failures and exits when none are discovered.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

src/runpod_flash/cli/commands/build_utils/scanner.py

src/runpod_flash/cli/commands/run.py

src/runpod_flash/cli/commands/build_utils/scanner.py

…py files

runpod-Henrik

1. Architecture shift — sound motivation, correct implementation

AST parsing breaking on computed decorators, conditional definitions, and programmatic resource construction is a real limitation. The __remote_config__ stamp pattern is the right fix: @remote and Endpoint both attach a dict at decoration time, and the scanner just walks live module members looking for it. Dynamic patterns that defeated AST parsing work correctly under import inspection.

The idleTimeout fix (manifest.py + resource_provisioner.py) and _wrapped_class on RemoteClassWrapper are clean bundled fixes that correctly close gaps in the manifest pipeline and class introspection.

2. Bug: partial build failure silently produces incomplete manifest

# build.py
if scanner.import_errors:
    console.print("\n[red bold]Failed to load:[/red bold]")
    for filename, err in scanner.import_errors.items():
        console.print(f"  [red]{filename}[/red]: {err}")
    if not remote_functions:      # ← only aborts if ALL modules failed
        raise typer.Exit(1)

If a user has five worker files and two fail to import, the build continues with a three-worker manifest, prints the errors in red, exits 0, and deploys. The user has a silently incomplete app.

The if not remote_functions guard is too permissive. At minimum, import failures should print a warning that the build is incomplete and require --allow-partial or similar to proceed. A hard stop on any import error is the safer default.

3. Issue: no timeout on module imports

_import_module_from_file() calls spec.loader.exec_module(module) with no timeout. Module-level blocking code — a database connection, time.sleep(), or an infinite loop — will hang flash build indefinitely with no output. The AST scanner was immune to this; runtime import is not. A per-file timeout (e.g., concurrent.futures with a 10s limit) would make failure explicit.

4. High: ~2,346 lines of scanner tests deleted, zero unit tests added

Three test files deleted:

test_scanner.py — 1,480 lines
test_scanner_endpoint.py — 600 lines
test_scanner_load_balancer.py — 266 lines

The PR adds no new unit tests for RuntimeScanner. test_build_pipeline.py has 15 line-for-line updates but no new scenarios. What's now untested:

_import_module_from_file() error paths (syntax error, missing dep, import exception)
_find_remote_decorated() / _find_endpoint_instances() introspection edge cases
_analyze_cross_calls_ast() with files that failed to import
Cross-call detection partial-failure degradation (item 2 above)
Module cleanup in finally block
__flash_local__ flag propagation through _metadata_from_remote_config()
LB route stamp correctness (method / path from @ep.get(...))

The old tests covered these exhaustively. The new implementation is simpler, but no behavioral guarantees are locked in.

5. Testing — what remains

Integration tests in test_build_pipeline.py and test_lb_remote_execution.py are updated and cover the happy path. The test name change ("test-gpu" → "test-gpu-fb") correctly reflects that runtime scanning reads the actual resource name (with suffix) rather than the AST string literal — that's a latent correctness improvement.

Verdict

The architectural direction is right and the implementation looks correct for the cases it targets. Two things should be addressed before merge: the partial-build silent-success behavior (item 2 — real user impact) and the unit test gap (item 4 — behavioral guarantees are unverified for error paths). The import timeout (item 3) is lower priority but worth a follow-up.

jhcipar · 2026-03-13T01:23:07Z

src/runpod_flash/cli/commands/build_utils/scanner.py


-            if not isinstance(decorator.func, ast.Attribute):
-                continue
+        remote_objects = _find_remote_decorated(module)


this is outside the scope of this PR, but I this feels like just a couple steps away from resolving things outside of decorated functions and classes and shipping them

like if we are primarily working at the module level, could we just ship the whole module over? and that way we could not have to worry about imports inside of functions

hmmm, possibly, though we run into the same issue when modules import from other modules. Though I think this is easier to fix.

I think thats a good exploration for a fast follow.

…ner tests Addresses PR #265 review feedback: - build.py/run.py: any import error is now a hard stop instead of silently producing an incomplete manifest - scanner.py: per-file SIGALRM timeout (30s) prevents module-level blocking code from hanging flash build indefinitely - scanner.py: _find_remote_decorated wraps hasattr inside try/except so objects with broken __getattr__ are skipped - test_runtime_scanner.py: 48 unit tests covering _import_module_from_file error paths, introspection edge cases, cross-call analysis, module cleanup, __flash_local__ propagation, and LB route stamps

…ort-parse-objects-in-memory

jhcipar · 2026-03-18T13:17:56Z

src/runpod_flash/cli/commands/build_utils/scanner.py

+                    config_to_varname[id(cached)] = member_name
+
+        for attr_name, obj in remote_objects.items():
+            dedup_key = f"{module_path}:{attr_name}"


what is the behavior if you create an endpoint and then alias it, and you create two configs for that? I feel like it's a smell if you create + save two configs for the same object/endpoint/function def that's aliased. would it work to deduplicate on object identity?

hmm yeah that's probably better, edge case would be importing the attribute in another module, things get weird because you'd end up with possibly the "original" function being the imported one? I think?. But honestly we should be pushing Endpoint syntax anyway. I want to cut out all the remote decorators and Serverless class configs eventually. I dont think this as much of an issue with Endpoint

jhcipar

this looks good to me, just one comment/question about potential "duplicate" resource configs if we use module:attr_name, but otherwise 👍

runpod-Henrik

1. Core approach

Replacing AST static analysis with importlib + inspect is the right call architecturally — AST was inherently fragile against dynamic Python patterns and had a hardcoded _RESOURCE_CONFIG_TYPES frozenset that had to be kept in sync with exports. The new approach handles inheritance, computed decorators, and programmatic resource construction correctly. The __remote_config__ attribute stamp is a clean contract.

The bundled fixes are solid: idleTimeout was silently dropped from manifest → provisioner (two-site fix, both correct), __flash_local__ on decorated functions, _wrapped_class on RemoteClassWrapper, and the Endpoint name-not-required relaxation for QB decorator mode are all necessary enablers for the scanner to work.

2. Bug: Local imports will break real projects

This is the highest-risk change in the PR.

The old AST scanner never executed user code — it could scan a file importing torch with no local PyTorch installation. The new scanner runs exec_module on every .py file. Any worker file that does import torch at module level will raise ModuleNotFoundError during flash build, causing an immediate exit(1).

This is a regression for any project that has heavy ML dependencies installed only in the container image (the common case — most GPU workers). Users need to either:

Install all their worker deps locally before running flash build, or
Move all imports inside the function body

Neither is documented. The PR description does not mention this requirement change. Before merging, this needs at minimum a clear error message distinguishing "your import failed locally because the package isn't installed — this is expected if it's only in your container" from a real bug, or a --no-import / --skip-import-errors fallback to AST for unimportable files.

3. Issue: `detect_main_app` / FastAPI auto-detection silently removed

The old scanner had a complete FastAPI app auto-detection system: detect_main_app, _extract_fastapi_routes, _find_included_routers, _resolve_router_import, _normalize_route_path. These detected main.py / app.py / server.py with FastAPI routes and produced LB route metadata automatically. The test files for all of this (test_scanner_load_balancer.py, 266 lines) were deleted rather than migrated. The PR description doesn't mention this removal.

If any users relied on FastAPI app auto-detection for LB routes, they silently get zero routes after upgrading. Worth an explicit call-out in the description and a deprecation notice if it's intentional.

4. Issue: Resource name suffix change is a semantic break

Integration tests explicitly update assertions from "test-gpu" to "test-gpu-fb" because LiveServerless(name="test-gpu") at import time runs the constructor which appends the flashboot suffix -fb. The old AST scanner read the literal "test-gpu" string from source; the new scanner sees the post-construction name.

Any manifest consumer, handler generator, or flash run dev server that expected resource names to match what the user typed in name= will now see names with suffixes. This should be tested across the full build → deploy → run pipeline. The integration test updates confirm it works in those tests, but the behavioral change deserves a note in the PR description so other teams consuming the manifest know to expect it.

5. Question: `_wrapped_class` class vs instance attribute

create_remote_class sets RemoteClassWrapper._wrapped_class = cls as a class-level attribute on a freshly-created local class. The scanner then checks inspect.isclass(obj) and hasattr(obj, "_wrapped_class").

For @remote(cfg)\nclass Foo: to work, obj in the scanner walk must be the RemoteClassWrapper class (not an instance). Can you confirm that @remote applied to a class returns the class object itself (not RemoteClassWrapper()), so that inspect.isclass(obj) is True? If it returns an instance, this branch is dead code and class detection falls through to the wrong path.

6. Issue: Timeout not honoured on Windows

The 30-second SIGALRM guard is skipped on Windows. A blocking module-level call (e.g., waiting on a socket, time.sleep(inf)) hangs flash build forever with no timeout or cancellation. This is documented inline but not in the PR description. Given that Windows support is in-flight, a thread-based timeout fallback or an explicit note that Windows users have no protection would be appropriate.

7. Issue: `sys.modules` sub-imports not cleaned up on timeout

When a module times out mid-exec_module, _restore_module removes the top-level module name from sys.modules via finally. But any sub-imports that the module successfully completed before timing out remain registered. On the next scan iteration a different file that imports those same sub-modules gets stale, potentially-partially-initialized entries. Clearing all newly-registered sys.modules entries (snapshot before / diff after) in the finally block would be safer.

8. Nits

config_to_varname falls back to name-matching (ep.name == getattr(resource_config, "name", None)) if id() lookup fails. Two endpoints with the same name in the same module would silently match the wrong variable. Worth a comment or a duplicate-name guard.
_analyze_cross_calls_ast function-name matching is still string-based — generate calling another generate in an unrelated module would produce a false positive cross-call annotation. Acceptable known limitation but worth a comment.
The except typer.Exit: raise guard in build.py is necessary and correct — good catch. A brief comment explaining why it exists would help future readers.
test_runtime_scanner.py SIGALRM timeout test is skipped on Windows via a decorator — confirm the skip condition matches the actual runtime skip condition in _import_module_from_file.

Verdict: NEEDS WORK

The architecture is correct and the bundled fixes are valuable. The blocker is item 2 — the local import requirement is a silent breaking change for any real ML project with container-only dependencies. Either a fallback path (graceful skip-to-AST on ModuleNotFoundError) or a clear error message that distinguishes "missing local dep" from "real bug" is needed before this ships.

🤖 Reviewed by Henrik's AI-Powered Bug Finder

…ort-parse-objects-in-memory

feat: migrate from AST to import + inspect

a23877f

KAJdev requested review from deanq, jhcipar and runpod-Henrik March 11, 2026 19:01

KAJdev added 2 commits March 11, 2026 12:08

style: format integration test files

6fbc7d0

fix: unpack _scan_project_workers tuple in tests

2c47ce5

deanq requested a review from Copilot March 11, 2026 21:36

Copilot started reviewing on behalf of deanq March 11, 2026 21:36 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

KAJdev added 2 commits March 11, 2026 17:39

fix: address copilot review comments on scanner and client

cf94fb3

fix: use synthetic sys.modules packages instead of creating __init__.…

16a0263

…py files

runpod-Henrik reviewed Mar 12, 2026

View reviewed changes

jhcipar reviewed Mar 13, 2026

View reviewed changes

KAJdev mentioned this pull request Mar 13, 2026

fix(discovery): single-pass scan prevents duplicate resource discovery #271

Closed

4 tasks

KAJdev added 5 commits March 13, 2026 16:03

style: format test_runtime_scanner.py

8fab589

fix: remove unused imports in test_runtime_scanner

aa8edb5

Merge branch 'main' into zeke/ae-2438-migrate-from-ast-walking-to-imp…

a93e501

…ort-parse-objects-in-memory

Merge branch 'main' into zeke/ae-2438-migrate-from-ast-walking-to-imp…

787ef59

…ort-parse-objects-in-memory

jhcipar reviewed Mar 18, 2026

View reviewed changes

runpod-Henrik reviewed Mar 18, 2026

View reviewed changes

jhcipar approved these changes Mar 23, 2026

View reviewed changes

Merge branch 'main' into zeke/ae-2438-migrate-from-ast-walking-to-imp…

e2dbe2b

…ort-parse-objects-in-memory

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: migrate from AST to import + inspect#265

feat: migrate from AST to import + inspect#265
KAJdev wants to merge 11 commits intomainfrom
zeke/ae-2438-migrate-from-ast-walking-to-import-parse-objects-in-memory

KAJdev commented Mar 11, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

runpod-Henrik left a comment

Uh oh!

jhcipar Mar 13, 2026

Uh oh!

KAJdev Mar 13, 2026

Uh oh!

jhcipar Mar 18, 2026

Uh oh!

KAJdev Mar 21, 2026

Uh oh!

jhcipar left a comment

Uh oh!

runpod-Henrik left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

KAJdev commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Replace AST-based resource discovery with runtime import scanning

How it works

Other fixes bundled in

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

runpod-Henrik left a comment

Choose a reason for hiding this comment

1. Architecture shift — sound motivation, correct implementation

2. Bug: partial build failure silently produces incomplete manifest

3. Issue: no timeout on module imports

4. High: ~2,346 lines of scanner tests deleted, zero unit tests added

5. Testing — what remains

Verdict

Uh oh!

jhcipar Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

KAJdev Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

jhcipar Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

KAJdev Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

jhcipar left a comment

Choose a reason for hiding this comment

Uh oh!

runpod-Henrik left a comment

Choose a reason for hiding this comment

1. Core approach

2. Bug: Local imports will break real projects

3. Issue: detect_main_app / FastAPI auto-detection silently removed

4. Issue: Resource name suffix change is a semantic break

5. Question: _wrapped_class class vs instance attribute

6. Issue: Timeout not honoured on Windows

7. Issue: sys.modules sub-imports not cleaned up on timeout

8. Nits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

KAJdev commented Mar 11, 2026 •

edited

Loading

3. Issue: `detect_main_app` / FastAPI auto-detection silently removed

5. Question: `_wrapped_class` class vs instance attribute

7. Issue: `sys.modules` sub-imports not cleaned up on timeout