Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,12 @@

Full release notes with details on each version: [GitHub Releases](https://github.com/safishamsi/graphify/releases)

## Unreleased
## 0.9.3 (2026-06-30)

- Feat: cross-file member-call resolution for C++ and Objective-C (#1547, #1556). A class declared in a header and defined in its `.cpp`/`.m` no longer fragments into two nodes (a decl/def merge pass collapses the sibling header/impl pair, gated to same-directory same-name so unrelated classes never merge), and a member call now resolves across files by the receiver's inferred type: C++ `Foo f; f.bar()` / `Foo::bar()` / `this->bar()` and ObjC `Foo *f = [[Foo alloc] init]; [f doThing]` / `[self render]` link to the owning class's method. Resolution is by receiver type, never bare name, with the single-definition god-node guard — an uninferable or ambiguous receiver produces no edge (high precision over recall, grounded in how compiler-free indexers like ctags/Doxygen mis-resolve by name). Also routes C++ headers to the C++ extractor and ObjC `#import` bridging headers to the ObjC extractor. Reported by @c0dezer019 and @JabberYQ. (Residual cross-file `#include` edge resolution under symlinked roots and ObjC dynamic-dispatch receivers remain follow-ups.)
- Feat: namespace-aware C# cross-file type resolution (#1562, thanks @TheFedaikin). The namespace is folded into the C# node id (so same-named types in different namespaces stay distinct), `using` directives are honored with lexical per-block scope, and qualified references (`Namespace.Type`, `using` aliases) resolve — disambiguating a bare reference to the one in-scope namespace that provides it, and refusing (no edge) when ambiguous. Advances the #1318 shadow-node umbrella for C#.
- Fix: test mocks no longer erase the real cross-file call graph (#1553, thanks @Schweinehund). When a bare callee name had 2+ definitions without unique import evidence, the god-node guard dropped the edge entirely — so a single same-named test mock wiped the real call graph (a 76-stub Pester suite erased everything). The guard now applies tie-breakers — non-test preference (a shared, segment-aware path classifier) then path proximity — and resolves only when exactly one candidate survives, else still bails. A real def plus a test mock resolves to the real def; two genuine non-test defs still bail (no fan-out).
- Fix: hyperedge member lists keyed `members` or `node_ids` are now accepted, not silently dropped (#1561, thanks @askalot-io). Normalized to the canonical `nodes` at ingest (in build_from_json and semantic_cleanup), deduped, with a warning — mirroring the existing from/to edge-endpoint aliasing.
- Feat: work-memory overlay — `graphify reflect` now projects the verdicts it distills (preferred / tentative / contested, recency-weighted) into a `.graphify_learning.json` sidecar next to graph.json, and `graphify explain` / `query` / `GRAPH_REPORT.md` / the HTML viewer surface them where you look (a `Lesson:` hint, a colored node ring). Builds on the idea in #1441/#1542 (thanks @TPAteeq), implemented as a sidecar rather than stamping graph.json: structural truth stays separate (no `learning_*` in graph.json or GraphML exports, no rebuild churn). Each verdict carries the source questions that produced it (provenance) and a content fingerprint of the cited code, so a verdict on a file that has changed since is flagged "code changed — re-verify" instead of shown as still-authoritative. Dead-ends stay query-scoped (a report section, never a node attribute). Letting verdicts influence query traversal is deliberately deferred (it needs propensity correction + exploration to avoid a self-reinforcing feedback loop).
- Feat: type-aware `this.field.method()` resolution for TypeScript/JS (#1316, thanks @guyoron1). A member call through a constructor-injected dependency (`constructor(private db: Database)` then `this.db.query()`) now produces a `calls` edge to the field type's method, resolved by the field's declared type and gated by the single-definition god-node guard (an ambiguous or untyped field produces no edge — no global name-match fan-out). EXTRACTED confidence; constructor parameter-property injection scope.
- Feat: resolve TypeScript wildcard path aliases (#1544, thanks @oleksii-tumanov). A `compilerOptions.paths` pattern like `@app/*` or `@*/interfaces` now captures the matched segment and substitutes it into each target in order, honoring tsc's longest-prefix / exact-wins specificity, baseUrl, and the first-existing-target fallback. Extends the #1531 resolver.
Expand Down
69 changes: 67 additions & 2 deletions graphify/build.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,56 @@
}


# Hyperedge member lists are canonically keyed `nodes` (see graphify/llm.py
# extraction spec), but LLM/subagent drift and externally-supplied graph.json
# sometimes emit `members` or `node_ids`. _normalize_hyperedge_members folds
# those aliases into `nodes` at ingest so every downstream consumer reads one
# canonical key — mirroring the `from`/`to` edge-endpoint tolerance below.
_HE_MEMBER_ALIASES = ("members", "node_ids")


def _normalize_hyperedge_members(he: object) -> None:
"""Canonicalize a hyperedge's member list onto the `nodes` key, in place.

If `nodes` is already a list it wins (canonical), and only stray alias keys
are dropped. Otherwise the first alias (`members`, then `node_ids`) that is a
list is moved to `nodes`, deduped preserving order, with a single stderr
WARNING naming the hyperedge id and alias used. Leftover alias keys are
always removed so downstream code never re-reads them.
"""
if not isinstance(he, dict):
return
if not isinstance(he.get("nodes"), list):
for alias in _HE_MEMBER_ALIASES:
val = he.get(alias)
if isinstance(val, list):
seen: set = set()
deduped: list = []
for ref in val:
try:
is_dupe = ref in seen
except TypeError:
is_dupe = False # unhashable ref: keep it, validator flags it
if is_dupe:
continue
try:
seen.add(ref)
except TypeError:
pass
deduped.append(ref)
he["nodes"] = deduped
print(
f"[graphify] WARNING: hyperedge "
f"'{he.get('id', '?')}' uses field '{alias}' instead of "
f"'nodes'; normalizing.",
file=sys.stderr,
)
break
# Drop any leftover alias keys regardless of which branch ran above.
for alias in _HE_MEMBER_ALIASES:
he.pop(alias, None)


def _norm_source_file(p: str | None, root: str | None = None) -> str | None:
"""Normalize path separators and relativize absolute paths.

Expand Down Expand Up @@ -279,6 +329,14 @@ def build_from_json(extraction: dict, *, directed: bool = False, root: str | Pat
if ft and ft not in {"code", "document", "paper", "image", "rationale", "concept"}:
node["file_type"] = _FILE_TYPE_SYNONYMS.get(ft, "concept")

# Canonicalize hyperedge member lists (#1561): producers sometimes key the
# member list `members`/`node_ids` instead of `nodes`. Fold aliases onto
# `nodes` here — BEFORE validation and the semantic-rekey loop below — so
# every downstream consumer (rekey, source_file relativize, to_json) reads
# one canonical key, the same way edge endpoints alias from/to at build.
for he in extraction.get("hyperedges", []) or []:
_normalize_hyperedge_members(he)

errors = validate_extraction(extraction)
# Dangling edges (stdlib/external imports) are expected - only warn about real schema errors.
real_errors = [e for e in errors if "does not match any node id" not in e]
Expand Down Expand Up @@ -489,8 +547,15 @@ def build_from_json(extraction: dict, *, directed: bool = False, root: str | Pat
".ts": "js", ".tsx": "js",
".go": "go", ".rs": "rs",
".java": "jvm", ".kt": "jvm", ".scala": "jvm", ".groovy": "jvm",
".c": "c", ".h": "c", ".cc": "cpp", ".cpp": "cpp", ".hpp": "cpp",
".cu": "cpp", ".cuh": "cpp", ".metal": "cpp",
# C, C++, and ObjC interoperate within one compilation unit: a method
# declared in a shared `.h` is defined/called from a `.c`/`.cpp`/`.m`
# sibling, so a cross-file INFERRED call from impl to its header decl
# is legitimate, not a phantom name-collision across languages. Treat
# the whole C family as one so the receiver-typed C++/ObjC member-call
# resolvers' header-targeting edges survive build (#1547/#1556).
".c": "c", ".h": "c", ".cc": "c", ".cpp": "c", ".hpp": "c",
".cxx": "c", ".hh": "c", ".hxx": "c",
".cu": "c", ".cuh": "c", ".metal": "c", ".m": "c", ".mm": "c",
".rb": "rb", ".php": "php", ".cs": "cs", ".swift": "swift", ".lua": "lua",
}
src_ext = Path(G.nodes[src].get("source_file") or "").suffix.lower()
Expand Down
2 changes: 1 addition & 1 deletion graphify/export.py
Original file line number Diff line number Diff line change
Expand Up @@ -679,7 +679,7 @@ def to_html(
if raw_hyperedges:
remapped = []
for he in raw_hyperedges:
he_members = he.get("nodes") or he.get("members") or []
he_members = he.get("nodes", [])
comm_ids, seen = [], set()
for nid in he_members:
c = node_to_community.get(nid)
Expand Down
Loading
Loading