Replies: 1 comment 1 reply
-
|
Really thoughtful writeup, thanks @abeo — and the bug you spotted is the more urgent half, so let me split the two: The staleness bug is real — confirmed it. On the temporal feature — yes, the snapshot +
On rungs 3-4 (interval store + rename-stable identity): let's defer. The path-derived node ID is load-bearing (it has to match what semantic extraction emits, see #1033) — moving to signature/body-hash identity touches AST extraction, the semantic prompt, dedup, and cross-file resolution, so it's a big cross-cutting change. Fine for an early If you want to take the snapshot-archive + |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Idea: a temporal knowledge graph — remember when facts were true, not just the current snapshot
Category: Ideas
TL;DR
graphify currently models a codebase as it is right now. I'd love to see it also model when things were true — node and edge lifetimes across commits. The graph would stop being a HEAD snapshot and become a history you can query: "when did this symbol disappear, and in which commit?", "what did the call graph look like at v1.0?", "which functions have been caller-less for the last N commits?". Git already stores the time axis; graphify currently collapses it. The payoff is large, and there's a cheap on-ramp that reuses what graphify already produces.
What prompted this
I hit a subtle staleness bug that I think is actually a temporal problem in disguise.
As I understand the current incremental model (
graphify updatedriven by git hooks), an incremental pass prunes nodes for deleted files, but a symbol removed from a surviving file is never pruned — the node lingers and the graph quietly contradicts source. In my case a function was deleted from a file that lived on (a subsystem cutover), and its node + its inboundcallsedge survived every incremental rebuild for weeks. Only a full clean rebuild (clearinggraph.json+ manifest) removed it.The deeper issue: the graph asserts "symbol X exists" as a timeless fact, when the truth is "X existed until commit abc123." A model that records when a fact was true can't have this bug — removal is a recorded event (you close the symbol's interval), not a deletion you can forget to apply.
The model
The clean insight is that git is already a temporal store of the code; graphify just throws the time axis away at extraction. Two ways to keep it:
A — Interval-stamped (the "real" version). Every node and edge carries
valid_from_commit → valid_to_commit. A live symbol has an open interval; a removed one gets its interval closed at the commit that removed it. All queries are "as of commit X" (default = HEAD). Staleness becomes structurally impossible — you never leave a dangling node, you close an interval.B — Snapshot + diff (the pragmatic on-ramp). Retain
graph.jsonkeyed by commit SHA and diff on demand (or emit an append-onlynode_added / node_removed / edge_added / edge_removedevent stream). Current state is a fold up to T; history is a fold up to any earlier T. This is easy to bolt on, because anyone running the git-hook integration already regeneratesgraph.jsonevery commit — the raw material is being produced and discarded today.Why it's worth it — use cases
Several of these are things people currently do by hand:
someSymboldisappear, and in which PR?" One query instead of an archaeology session.get_pr_impactalready gestures at this; a temporal store makes it concrete and historical (any two points, not just HEAD).On the API side this is mostly additive: the existing MCP tools (
get_node,get_neighbors,god_nodes,shortest_path, …) would gain an optionalas_of=<commit>parameter; new tools liketimeline(symbol)anddiff(commitA, commitB)fall out naturally.The genuinely hard part (worth naming)
Node identity across time. If a function is renamed or moved between files, is that one node with a continuous interval or a death + a birth? Get it wrong and every rename reads as delete-then-add, which poisons "when did X change." This needs rename/move detection — git similarity scoring, or identity by signature/body-hash rather than a path-derived id. Everything else (storage — graphs are a few MB; query filtering) is comparatively mechanical. This is the part that separates a trustworthy temporal graph from a noisy one, and it's where the design effort really goes.
A pragmatic ladder (cheap → rich)
graph.jsonkeyed by commit SHA + timestamp. Nearly free; it's already regenerated per commit.graphify diff <A> <B>— added/removed nodes + edges between two points. Built on (1); covers most "what changed when" needs with no temporal store at all.Rungs 1–2 are valuable standalone and a low-risk way to validate demand before committing to the identity-tracking work in 3–4.
Question for the maintainer / community
get_pr_impactalready diffing two graph states under the hood)?diffrung be a welcome contribution as a first step?Happy to help prototype the snapshot/diff rung if there's interest. Thanks for graphify — it's become a daily-driver navigation tool for a large multi-app monorepo here.
Beta Was this translation helpful? Give feedback.
All reactions