feat: receiver type tracking with graded confidence (4.2)#505
feat: receiver type tracking with graded confidence (4.2)#505carlos-alm merged 6 commits intomainfrom
Conversation
…ng (4.2)
Upgrade typeMap from Map<string, string> to Map<string, {type, confidence}>
across all 8 language extractors. Adds graded confidence per type source:
1.0 for constructors, 0.9 for type annotations/parameters, 0.7 for factory
methods. Adds factory pattern extraction (JS Foo.create(), Go NewFoo() and
composite literals, Python constructor calls). Edge builder uses type-source
confidence for receiver edges instead of hardcoded values.
Impact: 17 functions changed, 51 affected
|
Claude finished @carlos-alm's task —— View job PR Review: Receiver Type Tracking with Graded ConfidenceCompleted comprehensive analysis of PR #505 implementing roadmap item 4.2 for receiver type tracking with graded confidence. Root Cause AnalysisProblem Being Solved: Previously Why This Approach: Upgrades from string-based typeMap to structured Risk Assessment: Low risk. Changes are additive with backwards compatibility guards ( Backlog Compliance
Critical ConcernsNone identified. Implementation is high quality: ✅ Code Quality Analysis
✅ Test Coverage
✅ Implementation Soundness
Final Recommendation
This is exactly the type of precision improvement that makes codegraph more valuable for AI coding assistants - reducing false positive call edges while capturing real method dispatch relationships. |
Greptile SummaryThis PR implements roadmap item 4.2 — Receiver Type Tracking for Method Dispatch — by upgrading the Key changes:
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["AST Node (per file)"] --> B{Node type?}
B -->|"variable_declarator / assignment"| C{Has value?}
C -->|"new_expression / composite_literal"| D["setIfHigher(name, Type, 1.0)"]
C -->|"Foo.create() / NewFoo()"| E{In BUILTIN_GLOBALS?}
E -->|No| F["setIfHigher(name, Foo, 0.7)"]
E -->|Yes| G[Skip]
B -->|"type_annotation / typed_parameter / var_spec"| H["setIfHigher(name, Type, 0.9)"]
D & F & H --> I["typeMap: varName → type + confidence"]
I --> J["buildCallEdgesNative - serialize to native"]
I --> K["resolveByMethodOrGlobal / buildReceiverEdge - JS path"]
I --> L["resolveCallTargets - incremental path"]
J --> M["nf.typeMap array with typeName + confidence"]
M --> N["supplementReceiverEdges - reconstruct Map"]
K & N & L --> O["Receiver edge: caller → TypeNode\nconfidence = entry.confidence or fallback"]
K & N & L --> P["Qualified-name edge: caller → Type.method"]
Last reviewed commit: "fix: use nullish coa..." |
| // Factory method: const x = Foo.create() → confidence 0.7 | ||
| else if (valueN.type === 'call_expression') { | ||
| const fn = valueN.childForFieldName('function'); | ||
| if (fn && fn.type === 'member_expression') { | ||
| const obj = fn.childForFieldName('object'); | ||
| if (obj && obj.type === 'identifier') { | ||
| const objName = obj.text; | ||
| if ( | ||
| objName[0] === objName[0].toUpperCase() && | ||
| objName[0] !== objName[0].toLowerCase() | ||
| ) { | ||
| setIfHigher(nameN.text, objName, 0.7); | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Factory heuristic fires on non-factory method calls
Any method call whose receiver starts with an uppercase letter is treated as a factory and types the variable. This means calls like const result = Math.random(), const data = JSON.parse('{}'), or const p = Promise.resolve(42) will all populate typeMap with {type: 'Math'/'JSON'/'Promise', confidence: 0.7}.
In practice this won't create false edges because those builtins have no AST nodes in the graph, but it does pollute typeMap and could misfire when a user-defined class shares a name with a common global. A method-name allowlist (e.g. create, build, make, of, getInstance) or a short reserved-name denylist would prevent the most obvious false positives:
const BUILTIN_GLOBALS = new Set(['Math', 'JSON', 'Promise', 'Array', 'Object', 'Date', 'Error', 'Symbol', 'Map', 'Set', 'RegExp', 'Number', 'String', 'Boolean']);
if (
objName[0] !== objName[0].toLowerCase() &&
!BUILTIN_GLOBALS.has(objName)
) {
setIfHigher(nameN.text, objName, 0.7);
}The same pattern applies to the identical check in src/extractors/python.js lines 344–345.
There was a problem hiding this comment.
Fixed — added a BUILTIN_GLOBALS denylist (Math, JSON, Promise, Array, Object, Date, Error, etc.) that prevents the factory heuristic from matching built-in globals. Applied the same pattern to the Python extractor with BUILTIN_GLOBALS_PY. Also added a test that verifies Math.random(), JSON.parse(), and Promise.resolve() are filtered out.
src/extractors/javascript.js
Outdated
| if ( | ||
| objName[0] === objName[0].toUpperCase() && | ||
| objName[0] !== objName[0].toLowerCase() | ||
| ) { |
There was a problem hiding this comment.
Redundant first condition in uppercase guard
objName[0] === objName[0].toUpperCase() is always true when objName[0] !== objName[0].toLowerCase(), so the first condition is redundant. Digits and symbols are also === toUpperCase() but are correctly excluded by the second condition. The same redundancy is present in src/extractors/python.js at lines 336 and 344.
| if ( | |
| objName[0] === objName[0].toUpperCase() && | |
| objName[0] !== objName[0].toLowerCase() | |
| ) { | |
| if (objName[0] !== objName[0].toLowerCase()) { |
There was a problem hiding this comment.
Fixed — simplified the guard to just objName[0] !== objName[0].toLowerCase(), removing the redundant toUpperCase() check. Applied consistently across both JS and Python extractors.
| }); | ||
|
|
||
| it('ignores lowercase factory calls', () => { | ||
| const symbols = parseJS(`const result = utils.create();`); | ||
| expect(symbols.typeMap.has('result')).toBe(false); |
There was a problem hiding this comment.
Semantic reversal of annotation-vs-constructor priority
The old test was named "prefers type annotation over new expression" and asserted Base; the new test flips this — Derived (constructor 1.0) now wins over the explicit type annotation Base (0.9).
For most call-graph use cases this is fine (the runtime object IS a Derived), but it silently changes resolution for dependency-injection patterns where the declared type is intentional:
const cache: CacheInterface = new RedisCache();
cache.get(key); // now resolves → RedisCache.get, not CacheInterface.getIf downstream users rely on annotation-first semantics (e.g. to follow interface contracts in architectural analysis), this is a breaking semantic change. It's worth a deliberate callout in the PR for any consumer of the edge data, since the old 0.9 hard-coded value was also meant for annotation-derived edges.
There was a problem hiding this comment.
Fixed — added a deliberate comment in the test explaining why constructor beats annotation: the runtime type is what matters for call resolution (const x: Base = new Derived() → x.method() dispatches to Derived.method, not Base.method). This is an intentional semantic reversal from the old annotation-first behaviour.
- Add BUILTIN_GLOBALS denylist to JS/TS factory heuristic to prevent
false positives from Math.random(), JSON.parse(), Promise.resolve() etc
- Add BUILTIN_GLOBALS_PY denylist for Python factory heuristic
- Simplify redundant uppercase guard (remove toUpperCase check, keep
toLowerCase which is sufficient)
- Handle multi-variable Go short declarations (x, y := A{}, B{})
instead of only typing the first variable
- Add test for built-in globals filtering
- Document deliberate constructor-over-annotation priority reversal
Impact: 4 functions changed, 7 affected
|
Addressed all review feedback in 3f49e6b:
All 1913 tests pass, lint clean. |
| const rights = | ||
| right.type === 'expression_list' | ||
| ? Array.from({ length: right.childCount }, (_, i) => right.child(i)).filter(Boolean) | ||
| : [right]; |
There was a problem hiding this comment.
rights array includes comma tokens, misaligning with lefts in multi-variable declarations
For x, y := A{}, B{}, Go's tree-sitter expression_list AST contains alternating expression and comma unnamed nodes: [A_literal, ',', B_literal]. The lefts array correctly filters to identifiers only ([x, y]), but rights uses .filter(Boolean) which retains the comma tokens.
When indexing by position:
idx=0:varNode=x,rhs=rights[0]=A_literal→ correct,x → A✓idx=1:varNode=y,rhs=rights[1]=','(comma token) → no branch matches,yis never typed ✗
The fix is to exclude unnamed/comma nodes from rights similarly to how lefts does it:
| const rights = | |
| right.type === 'expression_list' | |
| ? Array.from({ length: right.childCount }, (_, i) => right.child(i)).filter(Boolean) | |
| : [right]; | |
| const rights = | |
| right.type === 'expression_list' | |
| ? Array.from({ length: right.childCount }, (_, i) => right.child(i)).filter( | |
| (c) => c && c.isNamed, | |
| ) | |
| : [right]; |
This ensures rights = [A_literal, B_literal] so index 1 correctly maps to B_literal.
There was a problem hiding this comment.
Fixed in 3dbd8c4. The rights array now filters to named nodes only (c?.isNamed) matching how lefts filters to identifiers. This ensures x, y := A{}, B{} correctly types both variables.
For `x, y := A{}, B{}`, the `rights` expression_list includes comma
tokens between expressions. Filter to named nodes only (matching how
`lefts` filters to identifiers), so index alignment is correct and
all variables get typed.
Impact: 1 functions changed, 2 affected
…asses The previous denylist only had lowercase entries (None, True, False, etc.) that never reach the uppercase factory guard. Add all uppercase Python builtins: exception classes (ValueError, TypeError, etc.) and common stdlib classes (Path, OrderedDict, Counter, Decimal, Fraction).
Replace `||` with `??` so a hypothetical confidence of 0 is not silently replaced by the fallback value. Impact: 1 functions changed, 5 affected
|
Addressed the remaining Greptile findings from round 2:
All 1913 tests pass, lint clean. |
Summary
Implements roadmap item 4.2 — Receiver Type Tracking for Method Dispatch.
typeMapvalue from plain string to{type, confidence}across all 8 language extractors. Confidence:1.0constructor,0.9annotation/parameter,0.7factory methodFoo.create(), GoNewFoo()/&Struct{}/Struct{}, PythonFoo()/Foo.create()— previously missing, now tracked at 0.7 confidencesetIfHigherpriority: When the same variable has multiple type sources (e.g.,const x: Base = new Derived()), highest confidence winstypeof entry === 'string'guards handle mixed old/new formats during native binary transitionsChanged files
src/extractors/*.js(8 files)typeMap.set(name, typeName)→typeMap.set(name, {type, confidence}), factory patterns addedsrc/domain/graph/builder/stages/build-edges.js.type/.confidencefrom entries, pass confidence to native pathsrc/domain/graph/builder/incremental.jstests/parsers/javascript.test.jstests/parsers/java.test.jstests/integration/build.test.jsCLAUDE.mddocs/roadmap/ROADMAP.mdTest plan