Skip to content

feat: wire up framework route extraction#46

Open
mschreib28 wants to merge 18 commits into
mainfrom
upstream/feat/framework-extract-wiring
Open

feat: wire up framework route extraction#46
mschreib28 wants to merge 18 commits into
mainfrom
upstream/feat/framework-extract-wiring

Conversation

@mschreib28
Copy link
Copy Markdown
Owner

Problem\n\nFrameworkResolver.extractNodes is declared in the type at src/resolution/types.ts but has zero callers across the entire src/ tree (confirmed via grep). Meanwhile every framework resolver (Django, Flask, FastAPI, Express, Laravel, Rails, Spring, Go, Rust, C#, Swift, React, Svelte) ships an extractNodes implementation that does real work and is then discarded.\n\nAs a result, the graph has zero route kind nodes in practice — checked on a real Django codebase: 23 urls.py files indexed, 0 route nodes produced, 0 edges from URL configs to view classes. codegraph_callers(MyView) silently misses its most important caller: the URL pattern that binds it.\n\nSeparately, the existing Django extractor's regex captures the view name in group 2 but the destructure discards it, so even if the hook were wired up it wouldn't link routes to views. Similar bugs exist in other frameworks.\n\n## Fix\n\n- Replaces the dead extractNodes?(filePath, content): Node[] hook with extract?(filePath, content): { nodes, references }.\n- Runs extract() inside the extraction pipeline for every framework whose declared languages include the current file's language. The orchestrator detects frameworks once per index run via a filesystem-backed ResolutionContext and plumbs the names through the parse-worker boundary (strings, not function refs — structured clone can't serialize methods).\n- Updates all 13 existing framework resolvers to emit both route nodes AND handler references. The references flow through the existing resolution pipeline (name matching, import resolution, framework-specific resolve()) to produce route -> handler edges with kind references.\n\nAfter this change, codegraph_callers(UserListView) on a Django project returns the URL pattern that binds it.\n\n## Frameworks covered\n\n| Framework | Shapes recognized |\n|---|---|\n| Django | path(), re_path(), url(), include() in urls.py (CBV .as_view(), dotted module paths) |\n| Flask | @app.route('/x', methods=[...]), blueprint routes |\n| FastAPI | @app.get(...), @router.post(...), all standard methods |\n| Express | app.get(...), router.post(...) with middleware chains (handler = last arg) |\n| Laravel | Route::get(), Route::resource(), Controller@action, tuple syntax |\n| Rails | get '/x', to: 'users#index', hash-rocket => |\n| Spring | @GetMapping, @PostMapping, @RequestMapping on methods |\n| Gin / chi / gorilla / mux | r.GET(...), router.HandleFunc(...) |\n| Axum / actix / Rocket | .route("/x", get(handler)) |\n| ASP.NET | [HttpGet("/x")] attributes |\n| Vapor | app.get("x", use: handler) |\n| React Router / SvelteKit | Route component nodes (interface migration only; handler refs are a follow-up) |\n\n## Tests\n\n- Unit tests per framework in __tests__/frameworks.test.ts — 29 tests. Each framework asserts a representative route pattern produces both the expected route node and a handler reference with correct fromNodeId / referenceName / referenceKind.\n- End-to-end Django test in __tests__/frameworks-integration.test.ts — builds a real tmp Django project on disk (manage.py, requirements.txt, users/views.py with UserListView, users/urls.py with path("users/", UserListView.as_view(), ...)), runs full indexAll(), asserts the route node exists, the class node exists, and an edge between them with kind references.\n\nBefore this PR the integration test fails (0 route nodes). After, it passes. Full suite: 410 tests, 409 pass. The 1 pre-existing failure is FileWatcher > debounced sync > should trigger sync after file change — an fs.watch timing flake that reproduces on the base commit too and is unrelated to this work.\n\n## Architecture\n\nThe cleanest hook point turned out to be inside extractFromSource itself, because both the main-thread fallback path and the worker-thread parse path go through it. That way the worker doesn't need to know anything about framework objects, only a string[] of detected names.\n\n\nindexAll()\n ├─ detectFrameworks() → string[] (once per run, filesystem-backed context)\n └─ for each file: postMessage({ ..., frameworkNames })\n worker: extractFromSource(path, content, lang, frameworkNames)\n ├─ tree-sitter pass → {nodes, unresolvedReferences, errors}\n └─ for fw in getApplicableFrameworks(names, lang):\n fw.extract(path, content) → {nodes, references}\n merge into result\n\n\nThe references flow through the existing ReferenceResolver.resolveAll so they're linked by the same name-matching / import-resolution / framework resolve() machinery that handles every other kind of reference. That means Django's view-class-targeting logic in djangoResolver.resolve() is re-used automatically for route references — no new resolution path to maintain.\n\n## Scope notes\n\n- Regex-based extraction throughout. AST-based is a tracked follow-up (the plan doc explicitly scopes it out). Current regex handles the realistic shapes covered by the test suite; known edge cases (namespaced include(('api.urls', 'api')), comments containing fake path(...) calls, DRF router.register action expansion) are listed as follow-ups.\n- Node IDs embed line numbers (route:<file>:<line>:<url>). Matches existing framework precedent; an edit that adds a route at the top of a file will churn downstream IDs. Worth revisiting when incremental indexing lands.\n- React Router / SvelteKit only migrate to the new interface without emitting handler refs — <Route element={<Page/>}/>Page wiring is a follow-up.\n\n## Stats\n\n| Category | Lines |\n|----------|------:|\n| Production code (src/) | +760 / -683 |\n| Tests (tests/) | +370 |\n| Docs (README + plan) | +1139 |\n\nThe bulk of the docs delta is docs/plans/2026-04-24-framework-resolver-extract.md — the implementation plan. Happy to drop that commit if you'd prefer the PR without the planning artifact.\n\n## Commit sequence\n\n15 commits, one per framework (revertable independently):\n\n\ndocs: add framework extract wiring plan\nfeat(resolution): replace extractNodes with extract() returning nodes and references\nfeat(resolution): add getApplicableFrameworks helper for per-language dispatch\nfeat(django): emit route nodes and route->view references in extract()\nfeat(flask,fastapi): emit route nodes and route->handler references\nfeat(express): emit route nodes and route->handler references\nfeat(laravel): emit route nodes and route->handler references\nfeat(rails): emit route nodes and route->handler references\nfeat(spring): emit route nodes and route->handler references\nfeat(go): emit route nodes and route->handler references\nfeat(rust): emit route nodes and route->handler references\nfeat(aspnet): emit route nodes and route->handler references\nfeat(swift,vapor): emit route nodes and route->handler references\nchore(react,svelte): migrate resolvers to extract() interface\nfeat(extraction): run framework extractors after tree-sitter parse\ndocs: document framework route extraction\n


Copied from colbymchenry/codegraph#89

… extractors

Replaces comment characters and string-literal contents with spaces (not
removal) so source offsets stay valid for downstream regex match index ->
line number conversion. Handles Python triple-quoted docstrings, Ruby
=begin/=end, Rust nested block comments, and the standard //, #, /* */
forms across the supported languages.

This is consumed by framework extract() methods in a follow-up commit so
that commented-out / docstring routing examples don't surface as phantom
route nodes in the graph.
…antom routes)

Pipes the per-language stripCommentsForRegex helper into every framework
extract() that scans raw source: django/flask/fastapi (python.ts),
express, laravel, rails, spring, go, rust, aspnet, vapor, plus
swiftui/uikit struct extraction in swift.ts.

Without this, examples like:

    # path('/admin/', AdminPanel.as_view())
    """ path('/users/', UserListView.as_view()) """
    urlpatterns = [path('/real/', RealView.as_view())]

produced 3 phantom route nodes. Now only the real one is extracted.

Each framework gets a regression test in __tests__/frameworks.test.ts
asserting that line-, block-, docstring- and (where relevant)
heredoc-style commented-out routes do not surface as nodes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants