Skip to content

feat: receiver type tracking with graded confidence (4.2)#505

Merged
carlos-alm merged 6 commits intomainfrom
feat/receiver-type-tracking
Mar 19, 2026
Merged

feat: receiver type tracking with graded confidence (4.2)#505
carlos-alm merged 6 commits intomainfrom
feat/receiver-type-tracking

Conversation

@carlos-alm
Copy link
Contributor

Summary

Implements roadmap item 4.2 — Receiver Type Tracking for Method Dispatch.

  • Graded confidence: Upgrades typeMap value from plain string to {type, confidence} across all 8 language extractors. Confidence: 1.0 constructor, 0.9 annotation/parameter, 0.7 factory method
  • Factory pattern extraction: JS/TS Foo.create(), Go NewFoo()/&Struct{}/Struct{}, Python Foo()/Foo.create() — previously missing, now tracked at 0.7 confidence
  • Receiver edges use type confidence: Instead of hardcoded 0.9/0.7, receiver edge confidence now reflects the actual type source precision
  • setIfHigher priority: When the same variable has multiple type sources (e.g., const x: Base = new Derived()), highest confidence wins
  • Backwards-compatible: typeof entry === 'string' guards handle mixed old/new formats during native binary transitions

Changed files

File Change
src/extractors/*.js (8 files) typeMap.set(name, typeName)typeMap.set(name, {type, confidence}), factory patterns added
src/domain/graph/builder/stages/build-edges.js Extract .type/.confidence from entries, pass confidence to native path
src/domain/graph/builder/incremental.js Same entry format update
tests/parsers/javascript.test.js Updated assertions + new factory/confidence tests
tests/parsers/java.test.js Updated assertions
tests/integration/build.test.js Receiver edge confidence 0.9 → 1.0 (constructor)
CLAUDE.md Added rule to sync with origin/main before starting work
docs/roadmap/ROADMAP.md Marked 4.2 complete

Test plan

  • 807 tests pass (50 test files), 0 failures
  • New tests: factory method extraction, graded confidence assertions
  • Lint clean (biome)
  • Build parity test passes (WASM vs native)
  • Query-walk parity test passes

…ng (4.2)

Upgrade typeMap from Map<string, string> to Map<string, {type, confidence}>
across all 8 language extractors. Adds graded confidence per type source:
1.0 for constructors, 0.9 for type annotations/parameters, 0.7 for factory
methods. Adds factory pattern extraction (JS Foo.create(), Go NewFoo() and
composite literals, Python constructor calls). Edge builder uses type-source
confidence for receiver edges instead of hardcoded values.

Impact: 17 functions changed, 51 affected
@claude
Copy link

claude bot commented Mar 18, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Receiver Type Tracking with Graded Confidence

Completed comprehensive analysis of PR #505 implementing roadmap item 4.2 for receiver type tracking with graded confidence.

Root Cause Analysis

Problem Being Solved: Previously obj.method() calls resolved to ANY exported method in scope with no receiver type awareness, missing repository pattern calls, builder chains, and visitor dispatch. This caused imprecise call resolution and false positive edges.

Why This Approach: Upgrades from string-based typeMap to structured {type, confidence} objects with graded confidence scoring (1.0 constructor, 0.9 annotations, 0.7 factory methods). Uses setIfHigher logic so strongest evidence wins when same variable has multiple type sources. This is architecturally sound and backwards-compatible.

Risk Assessment: Low risk. Changes are additive with backwards compatibility guards (typeof entry === 'string'). All extractors consistently implement the same pattern.

Backlog Compliance

  • Zero-dep: ✓ — No new runtime dependencies added
  • Foundation-aligned: ✓ — Improves core graph accuracy without violating principles
  • Problem-fit: 5 — Directly addresses AI agent accuracy by reducing false call edges and improving method dispatch resolution
  • Breaking: No — Backwards-compatible with mixed old/new format handling
  • Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Critical Concerns

None identified. Implementation is high quality:

✅ Code Quality Analysis

  • Consistent implementation: All 8 extractors use identical setIfHigher(name, type, confidence) pattern
  • Proper confidence scoring: 1.0 constructors, 0.9 annotations, 0.7 factory methods - well-reasoned hierarchy
  • Factory pattern coverage: JS/TS Foo.create(), Go NewFoo()/&Struct{}/Struct{}, Python factory calls
  • Edge builder integration: Both native (build-edges.js:105-114) and JS fallback paths updated to extract confidence
  • Backwards compatibility: Guards handle mixed string/object formats during transitions

✅ Test Coverage

  • New test assertions: JavaScript extractor tests verify confidence values and factory patterns
  • Integration test update: Receiver edge confidence expectation changed from 0.9 → 1.0 (reflects constructor vs annotation logic)
  • Claims verified: PR states 807 tests pass with 0 failures

✅ Implementation Soundness

  • Type resolution: buildReceiverEdge() in build-edges.js:380-399 uses type confidence when available, falls back to 0.7
  • Priority system: setIfHigher ensures const x: Base = new Derived() → confidence 1.0 from constructor beats 0.9 from annotation
  • Native binary compatibility: Handles format conversion for older native binaries (lines 106-111)

Final Recommendation

  • Rating: ⭐⭐⭐⭐⭐ (5/5)
  • Action: APPROVE
  • Reasoning: Exceptional implementation of a high-impact feature. Directly addresses core AI agent accuracy problems with clean, consistent, backwards-compatible code. Test coverage is strong. No configuration relaxation or technical debt introduced.

This is exactly the type of precision improvement that makes codegraph more valuable for AI coding assistants - reducing false positive call edges while capturing real method dispatch relationships.


@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 19, 2026

Greptile Summary

This PR implements roadmap item 4.2 — Receiver Type Tracking for Method Dispatch — by upgrading the typeMap value across all 8 language extractors from a plain string to a {type, confidence} object, and wiring that confidence through the edge-building pipeline so receiver edges carry graded precision (1.0 constructor, 0.9 annotation/parameter, 0.7 factory method) instead of a hardcoded value.

Key changes:

  • Graded confidence extraction: All 8 extractors (javascript, go, python, java, csharp, php, rust, plus TypeScript via the JS extractor) now emit {type, confidence} entries.
  • Factory pattern detection added for JS/TS (Foo.create()), Go (Struct{}, &Struct{}, NewFoo()), and Python (Foo(), Foo.create()), each at 0.7 confidence.
  • setIfHigher priority logic ensures the highest-confidence type source wins per variable — so const x: Base = new Derived() correctly resolves to Derived (1.0) over Base (0.9).
  • BUILTIN_GLOBALS denylist (JS) and BUILTIN_GLOBALS_PY (Python) prevent built-in globals from polluting the type map via the factory heuristic.
  • Full backward compatibility: typeof entry === 'string' guards in build-edges.js and incremental.js handle mixed old/new formats during native binary transitions.
  • Minor inconsistency: Java, C#, PHP, and Rust extractors use direct ctx.typeMap.set() rather than a setIfHigher helper. All their entries are currently at uniform 0.9 confidence so there is no practical priority conflict, but when constructor detection (1.0) is eventually added to these languages the direct set() calls will silently behave as last-write-wins.

Confidence Score: 4/5

  • Safe to merge; no logic bugs found. Three style-level observations but nothing that affects correctness.
  • The implementation is well-structured, backwards-compatible, and thoroughly tested (807 passing tests). All previously-flagged issues from the review thread have been addressed. The only findings are: a redundant console entry in BUILTIN_GLOBALS (already excluded by the lowercase guard), sequential if blocks for mutually-exclusive type checks in the Go extractor (clarity issue only), and the absence of setIfHigher in the Java/C#/PHP/Rust extractors (no practical effect today but a future-proofing gap).
  • No files require special attention. The Go short_var_declaration block and the JS BUILTIN_GLOBALS set have the minor style notes above, but neither affects runtime behaviour.

Important Files Changed

Filename Overview
src/extractors/javascript.js Adds BUILTIN_GLOBALS denylist, setIfHigher closure, and factory-method detection (0.7). Logic is correct; console in the set is redundant (already excluded by the lowercase guard).
src/extractors/go.js Adds setIfHigher and short_var_declaration handling for composite literals, address-of literals, and NewFoo() factory calls. Multi-variable fix (named-node filter on rights) is correct; three sequential if blocks on mutually-exclusive rhs.type should be else-if for clarity.
src/extractors/python.js Adds BUILTIN_GLOBALS_PY, setIfHigherPy, and assignment detection for direct constructor (1.0) and factory attribute calls (0.7). Implementation is consistent with the JS pattern.
src/domain/graph/builder/stages/build-edges.js Correctly extracts .type/.confidence from the new object format in buildCallEdgesNative, supplementReceiverEdges, resolveByMethodOrGlobal, and buildReceiverEdge. Backward-compat string guards are present throughout. Confidence fallback logic (typeConfidence ?? (typeName ? 0.9 : 0.7)) is correct.
src/domain/graph/builder/incremental.js Correctly handles the new {type, confidence} format with a typeof string guard for backward compatibility. Simple and clean update.
src/extractors/csharp.js Updated to emit {type, confidence: 0.9} objects. Uses direct ctx.typeMap.set() rather than setIfHigher; consistent with Java/PHP/Rust but diverges from JS/Go/Python pattern.
src/extractors/java.js Updated to emit {type, confidence: 0.9} for local declarations and parameters. Uses direct typeMap.set() without priority logic; no constructor (1.0) detection added (intentional scope).
tests/parsers/javascript.test.js Updated existing assertions to toEqual({type, confidence}) and added new tests for factory patterns, built-in global filtering, and confidence priority. Good coverage of new paths.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["AST Node (per file)"] --> B{Node type?}

    B -->|"variable_declarator / assignment"| C{Has value?}
    C -->|"new_expression / composite_literal"| D["setIfHigher(name, Type, 1.0)"]
    C -->|"Foo.create() / NewFoo()"| E{In BUILTIN_GLOBALS?}
    E -->|No| F["setIfHigher(name, Foo, 0.7)"]
    E -->|Yes| G[Skip]

    B -->|"type_annotation / typed_parameter / var_spec"| H["setIfHigher(name, Type, 0.9)"]

    D & F & H --> I["typeMap: varName → type + confidence"]

    I --> J["buildCallEdgesNative - serialize to native"]
    I --> K["resolveByMethodOrGlobal / buildReceiverEdge - JS path"]
    I --> L["resolveCallTargets - incremental path"]

    J --> M["nf.typeMap array with typeName + confidence"]
    M --> N["supplementReceiverEdges - reconstruct Map"]

    K & N & L --> O["Receiver edge: caller → TypeNode\nconfidence = entry.confidence or fallback"]
    K & N & L --> P["Qualified-name edge: caller → Type.method"]
Loading

Last reviewed commit: "fix: use nullish coa..."

Comment on lines +863 to 878
// Factory method: const x = Foo.create() → confidence 0.7
else if (valueN.type === 'call_expression') {
const fn = valueN.childForFieldName('function');
if (fn && fn.type === 'member_expression') {
const obj = fn.childForFieldName('object');
if (obj && obj.type === 'identifier') {
const objName = obj.text;
if (
objName[0] === objName[0].toUpperCase() &&
objName[0] !== objName[0].toLowerCase()
) {
setIfHigher(nameN.text, objName, 0.7);
}
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Factory heuristic fires on non-factory method calls

Any method call whose receiver starts with an uppercase letter is treated as a factory and types the variable. This means calls like const result = Math.random(), const data = JSON.parse('{}'), or const p = Promise.resolve(42) will all populate typeMap with {type: 'Math'/'JSON'/'Promise', confidence: 0.7}.

In practice this won't create false edges because those builtins have no AST nodes in the graph, but it does pollute typeMap and could misfire when a user-defined class shares a name with a common global. A method-name allowlist (e.g. create, build, make, of, getInstance) or a short reserved-name denylist would prevent the most obvious false positives:

const BUILTIN_GLOBALS = new Set(['Math', 'JSON', 'Promise', 'Array', 'Object', 'Date', 'Error', 'Symbol', 'Map', 'Set', 'RegExp', 'Number', 'String', 'Boolean']);

if (
  objName[0] !== objName[0].toLowerCase() &&
  !BUILTIN_GLOBALS.has(objName)
) {
  setIfHigher(nameN.text, objName, 0.7);
}

The same pattern applies to the identical check in src/extractors/python.js lines 344–345.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added a BUILTIN_GLOBALS denylist (Math, JSON, Promise, Array, Object, Date, Error, etc.) that prevents the factory heuristic from matching built-in globals. Applied the same pattern to the Python extractor with BUILTIN_GLOBALS_PY. Also added a test that verifies Math.random(), JSON.parse(), and Promise.resolve() are filtered out.

Comment on lines +870 to +873
if (
objName[0] === objName[0].toUpperCase() &&
objName[0] !== objName[0].toLowerCase()
) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Redundant first condition in uppercase guard

objName[0] === objName[0].toUpperCase() is always true when objName[0] !== objName[0].toLowerCase(), so the first condition is redundant. Digits and symbols are also === toUpperCase() but are correctly excluded by the second condition. The same redundancy is present in src/extractors/python.js at lines 336 and 344.

Suggested change
if (
objName[0] === objName[0].toUpperCase() &&
objName[0] !== objName[0].toLowerCase()
) {
if (objName[0] !== objName[0].toLowerCase()) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — simplified the guard to just objName[0] !== objName[0].toLowerCase(), removing the redundant toUpperCase() check. Applied consistently across both JS and Python extractors.

Comment on lines +155 to +159
});

it('ignores lowercase factory calls', () => {
const symbols = parseJS(`const result = utils.create();`);
expect(symbols.typeMap.has('result')).toBe(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Semantic reversal of annotation-vs-constructor priority

The old test was named "prefers type annotation over new expression" and asserted Base; the new test flips this — Derived (constructor 1.0) now wins over the explicit type annotation Base (0.9).

For most call-graph use cases this is fine (the runtime object IS a Derived), but it silently changes resolution for dependency-injection patterns where the declared type is intentional:

const cache: CacheInterface = new RedisCache();
cache.get(key);  // now resolves → RedisCache.get, not CacheInterface.get

If downstream users rely on annotation-first semantics (e.g. to follow interface contracts in architectural analysis), this is a breaking semantic change. It's worth a deliberate callout in the PR for any consumer of the edge data, since the old 0.9 hard-coded value was also meant for annotation-derived edges.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added a deliberate comment in the test explaining why constructor beats annotation: the runtime type is what matters for call resolution (const x: Base = new Derived()x.method() dispatches to Derived.method, not Base.method). This is an intentional semantic reversal from the old annotation-first behaviour.

- Add BUILTIN_GLOBALS denylist to JS/TS factory heuristic to prevent
  false positives from Math.random(), JSON.parse(), Promise.resolve() etc
- Add BUILTIN_GLOBALS_PY denylist for Python factory heuristic
- Simplify redundant uppercase guard (remove toUpperCase check, keep
  toLowerCase which is sufficient)
- Handle multi-variable Go short declarations (x, y := A{}, B{})
  instead of only typing the first variable
- Add test for built-in globals filtering
- Document deliberate constructor-over-annotation priority reversal

Impact: 4 functions changed, 7 affected
@carlos-alm
Copy link
Contributor Author

Addressed all review feedback in 3f49e6b:

  • Factory heuristic on builtins (P1): Added BUILTIN_GLOBALS denylist to JS/TS and BUILTIN_GLOBALS_PY to Python, preventing false positives from Math.random(), JSON.parse(), Promise.resolve(), etc. Added test coverage.
  • Redundant uppercase guard (P2): Simplified to single toLowerCase() check across JS and Python extractors.
  • Annotation-vs-constructor priority (P1): Added deliberate comment documenting the semantic reversal and its rationale.
  • Go multi-variable short declarations (P1, outside diff): Rewrote to iterate all left-hand identifiers in parallel with their corresponding RHS expressions, so x, y := A{}, B{} now types both variables.

All 1913 tests pass, lint clean.

@greptileai

Comment on lines +267 to +270
const rights =
right.type === 'expression_list'
? Array.from({ length: right.childCount }, (_, i) => right.child(i)).filter(Boolean)
: [right];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 rights array includes comma tokens, misaligning with lefts in multi-variable declarations

For x, y := A{}, B{}, Go's tree-sitter expression_list AST contains alternating expression and comma unnamed nodes: [A_literal, ',', B_literal]. The lefts array correctly filters to identifiers only ([x, y]), but rights uses .filter(Boolean) which retains the comma tokens.

When indexing by position:

  • idx=0: varNode=x, rhs=rights[0]=A_literal → correct, x → A
  • idx=1: varNode=y, rhs=rights[1]=',' (comma token) → no branch matches, y is never typed

The fix is to exclude unnamed/comma nodes from rights similarly to how lefts does it:

Suggested change
const rights =
right.type === 'expression_list'
? Array.from({ length: right.childCount }, (_, i) => right.child(i)).filter(Boolean)
: [right];
const rights =
right.type === 'expression_list'
? Array.from({ length: right.childCount }, (_, i) => right.child(i)).filter(
(c) => c && c.isNamed,
)
: [right];

This ensures rights = [A_literal, B_literal] so index 1 correctly maps to B_literal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3dbd8c4. The rights array now filters to named nodes only (c?.isNamed) matching how lefts filters to identifiers. This ensures x, y := A{}, B{} correctly types both variables.

For `x, y := A{}, B{}`, the `rights` expression_list includes comma
tokens between expressions. Filter to named nodes only (matching how
`lefts` filters to identifiers), so index alignment is correct and
all variables get typed.

Impact: 1 functions changed, 2 affected
…asses

The previous denylist only had lowercase entries (None, True, False,
etc.) that never reach the uppercase factory guard. Add all uppercase
Python builtins: exception classes (ValueError, TypeError, etc.) and
common stdlib classes (Path, OrderedDict, Counter, Decimal, Fraction).
Replace `||` with `??` so a hypothetical confidence of 0 is not
silently replaced by the fallback value.

Impact: 1 functions changed, 5 affected
@carlos-alm
Copy link
Contributor Author

Addressed the remaining Greptile findings from round 2:

  1. Go multi-variable comma tokens (3dbd8c4): Filter rights expression_list to named nodes only, fixing index misalignment for x, y := A{}, B{} patterns.
  2. BUILTIN_GLOBALS_PY expansion (5d60587): Replaced the effectively-empty denylist with 30+ uppercase Python builtins (exception hierarchy + common stdlib classes like Path, OrderedDict, Counter).
  3. typeConfidence || to ?? (33be99d): Use nullish coalescing in build-edges.js so a zero-confidence value is not silently replaced.

All 1913 tests pass, lint clean.

@greptileai

@carlos-alm carlos-alm merged commit 4cbb1f7 into main Mar 19, 2026
13 checks passed
@carlos-alm carlos-alm deleted the feat/receiver-type-tracking branch March 19, 2026 04:06
@github-actions github-actions bot locked and limited conversation to collaborators Mar 19, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant