Skip to content

docs(roadmap): Phase 8.8 — language-specific analysis reference map (34 languages)#1338

Open
carlos-alm wants to merge 10 commits into
mainfrom
docs/phase8-language-analysis-reference-map
Open

docs(roadmap): Phase 8.8 — language-specific analysis reference map (34 languages)#1338
carlos-alm wants to merge 10 commits into
mainfrom
docs/phase8-language-analysis-reference-map

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Adds § 8.8 — Language-Specific Analysis Reference Map to docs/roadmap/ROADMAP.md
  • Covers all 34 languages codegraph supports across 13 families (JVM, Python, JS/TS, Go, Rust, .NET, Ruby/PHP/BEAM, C-family, Swift/Dart/Zig, ML/Scientific, Scripting/HW, HCL/Terraform, multi-language frameworks)
  • For each family: state-of-the-art Jelly-equivalent tools, published P/R figures with citations, codegraph's current gap, concrete adoption candidates, benchmark suites with license annotations
  • Adds a fixture acquisition guide in the 8.8 intro: policy for when fixtures can be committed (MIT/Apache/BSD/CC-BY) vs. reference-only (GPL, unlicensed academic)
  • Cross-references § 8.6 (CI gate) to extend benchmarking to all 34 languages, not just JS/TS vs. Jelly/ACG

Notable reference tools per family

Family Jelly-equivalent Approach
JVM Doop, OPAL/Unimocg, Qilin Datalog points-to, modular CHA/RTA/XTA, context-debloating
Python PyCG, JARVIS, HeaderGen, PoTo, PyAnalyzer Assignment-graph, stub integration, Andersen-style
JS/TS Jelly, ACG, TAJS Field-based points-to, approximate interpretation
Go golang.org/x/tools/go/callgraph (VTA/CHA/RTA), callgraph-go VTA type propagation
Rust MIRAI, cargo-call-stack, Rudra MIR-level analysis, trait dispatch
.NET Roslyn CG, cclyzer++ port IL-level + source-level analysis
Ruby/PHP TypeProf/Shopify loupe, TChecker/Artemis Type inference, static class hierarchy
C/C++ SVF, Andersen/Steensgaard, LLVM CallGraph Field-sensitive points-to, MLTA
Swift Swift compiler SIL CG SIL protocol dispatch
Haskell/OCaml Calligraphy, Salto HIE files, IL normalisation
HCL Checkov graph runner, Pulumi Converter, TerraDS Interpolation walking, module call resolution

Fixture sources ready for adoption

  • PyCG micro-benchmark (Apache-2.0, 112 tests) — Python
  • JCG (BSD, annotated Java CG benchmark) — JVM
  • TerraDS (CC-BY-4.0, 279k modules) — HCL
  • Trivy fixture corpus (Apache-2.0) — HCL
  • flowR corpus (MIT, 779 points) — R
  • Pulumi converter tests (Apache-2.0) — HCL cross-module resolution

Test plan

  • Read through the new § 8.8 section in the rendered ROADMAP
  • Confirm all 34 languages appear as subsections
  • Confirm fixture acquisition policy is clear at the top of 8.8
  • Confirm § 8.6 cross-reference to 8.8 reads correctly

…ap (34 languages)

Adds section 8.8 to the Phase 8 roadmap covering every language codegraph
supports. For each of the 34 languages (grouped into 13 families including
HCL/Terraform), documents:

- State-of-the-art call-graph / points-to analysis tools (Jelly-equivalents)
- Published precision/recall figures with paper/benchmark citations
- Codegraph's current gap vs. those tools
- Concrete adoption candidates (techniques to implement)
- Benchmark suites and fixture sources with license annotations

Fixture acquisition guide added to the 8.8 intro: MIT/Apache/BSD/CC-BY
fixtures may be committed directly; GPL fixtures are reference-only; academic
suites with no license should be derived by running the tool, not copied.

Cross-references 8.6 (CI gate) to extend benchmarking beyond Jelly/ACG to
all 34 languages using the reference map.
@claude
Copy link
Copy Markdown

claude Bot commented Jun 6, 2026

Claude finished @carlos-alm's task —— View job


PR Review — §8.8 Language-Specific Analysis Reference Map

  • Gather context
  • Read changed file (ROADMAP.md additions)
  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity
  • Phase 4: Content quality
  • Phase 5: Backlog compliance
  • Final assessment

Root Cause Analysis

Problem Being Solved: Phases 8.x sub-tasks lack a shared reference point for "what does best-in-class look like for this language?" Without it, future implementation work either re-researches the landscape ad hoc or ignores non-JS/TS languages entirely. This section establishes a canonical research anchor for all 34 supported languages.

Why This Approach: A single ROADMAP section avoids scattering the literature survey across individual issues, keeps it version-controlled alongside the features it informs, and provides cross-referencing to §8.6 (CI gate). This is the right place for it.

Risk Assessment: Purely additive documentation. Zero runtime/build risk.


Phase 2 & 3: Configuration and Test Integrity

No configuration, test, or code files touched. Nothing to flag.


Phase 4: Content Quality

The content is genuinely high-quality for a research reference document. That said, several factual concerns require attention before this becomes a lasting canonical reference.

Factual accuracy concerns

Language count inconsistency (minor)
CLAUDE.md still reads Languages supported (23) at the top. The registry actually has 36 entries covering 34 distinct languages (the two extras are ocaml-interface and fsharp-signature). The PR correctly says 34, but the discrepancy with CLAUDE.md will confuse future readers who check both. This is out of scope for this PR, but should be tracked.

Groovy / Clojure section is absent from the per-language subsections
The summary table at the end lists Groovy and Clojure with appropriate notes ("compiles almost entirely to invokedynamic; source-level analysis is the practical ceiling"). However, there are no dedicated subsection headers for Groovy or Clojure like there are for every other language family. This is an inconsistency: all 34 languages appear in the summary table but only 32 have individual subsections. The summary-table entries are good and accurate; they just need corresponding body sections.

Objective-C subsection is also absent
CUDA and Objective-C each have rows in the summary table (| Objective-C | Clang CallGraph | ... |) but neither has a dedicated subsection. CUDA is mentioned in the C/C++/CUDA/Objective-C section header, and the adoption candidates mention <<<...>>> kernel launch. But Objective-C's [receiver selector] message-send semantics (CHA via protocol/superclass graph) are only in the summary table, not in a dedicated or shared body section. The CUDA case is similarly implicit.

Reach (Gleam section) — unverified claim on dispatch precision
The Gleam subsection states: "Gleam's Hindley-Milner type system enables near-precise dispatch." This is a characterization of Reach's capability, not a stated benchmark result. The table correctly marks it as "No published P/R figures." The phrasing "near-precise dispatch" implies a quality claim without a citation, which contradicts the document's own rule ("Entries marked (unverified) could not be confirmed"). Consider softening to "Gleam's static type system provides full dispatch information at the source level" (which is a language property fact, not a tool performance claim).

.NET section references Roslyn APIs that imply compiler dependency
The adoption candidate says: "walk IMethodSymbol.OverriddenMethod… enumerate interface implementors via FindImplementationsAsync." These are Roslyn API calls that require the Roslyn compiler, not tree-sitter. The document is a research reference so naming the API is appropriate, but it should explicitly note this requires the Roslyn SDK — unlike JS/TS where TypeScript compiler API is already in the codegraph pipeline. Without that note, a future implementer may assume this is achievable from the tree-sitter AST alone.

arXiv 2601.22952 for CodeQL C# claim
The Precision/Recall cell says: "Highest F1 on OWASP Benchmark v1.2 across SAST tools (arXiv 2601.22952, 2025)." That paper covers taint-tracking / SAST vulnerability detection, not raw call-graph edge P/R. The document itself acknowledges this: "OWASP Benchmark v1.2 covers taint analysis but not raw CG edge precision/recall." The table cell implies a stronger call-graph precision claim than the source supports — it would be more accurate to say "highest F1 on taint-tracking benchmark" rather than leaving it in the Precision/Recall column for a CG tool comparison table.

ArkAnalyzer listed for TypeScript in the summary table without a body entry
The summary table lists Jelly / ArkAnalyzer as the top reference for TypeScript, but the JavaScript/TypeScript subsection body does not mention ArkAnalyzer at all. Either it should be added to the body section's reference table or removed from the summary.

SWARM-JS (50 npm packages, 163K edges, EMSE 2025) — verify publication date
The benchmark is cited as EMSE 2025 for a paper about JS/TS call graph benchmarking. If this is currently in press or a preprint, it should be cited accordingly. Given the knowledge cutoff this can't be verified here, but the entry should be flagged with (unverified) if the 2025 date cannot be confirmed from a published DOI.

Structure

The section is well-structured. The fixture acquisition policy at the top is clear, practical, and directly actionable. The cross-reference to §8.6 is correct. The summary table is a genuinely useful at-a-glance view.

One readability note: the section header uses -- (### 8.8 -- Language-Specific...) while the rest of the roadmap uses (em dash). Minor, but inconsistent with the surrounding document style.


Phase 5: Backlog Compliance

  • Zero-dep: ✓ — Documentation only. No new runtime dependencies.
  • Foundation-aligned: ✓ — No cloud calls, no external toolchain requirements, no agent/generation behavior. Fully local-first documentation.
  • Problem-fit: 3/5 — Useful research infrastructure, but this section's value is realized when future implementers consult it before writing extractors. It does not itself reduce token waste or prevent hallucinated deps. It is the right enabler for work that will score 4–5.
  • Breaking: No — Purely additive. No API, CLI, DB, or MCP contract changes.
  • Tier: 1 — Zero-dep + foundation-aligned. No objection on these axes.

Critical Concerns

  1. Missing subsections for Groovy, Clojure, and Objective-C (body coverage inconsistency — 32 sections vs 34 languages in the summary table). Not a blocker, but reduces the document's completeness claim.
  2. ArkAnalyzer listed in summary table but absent from JS/TS body section. Should be reconciled.
  3. The Roslyn adoption candidate should note the Roslyn SDK dependency to avoid misleading implementers into thinking it's achievable from tree-sitter alone.
  4. OWASP Benchmark / CodeQL P/R cell conflates taint-analysis F1 with call-graph precision — worth correcting for accuracy.

None of these are blockers for merging (this is a research reference document, not a spec), but items 1 and 2 reduce self-consistency and should be fixed.


Final Recommendation

- Rating: ⭐⭐⭐☆☆ (3/5)
- Action: APPROVE WITH CONDITIONS
- Reasoning: Strong research content, well-structured, fixture acquisition policy
  is thoughtful and practical, cross-references are correct. Docked from ⭐⭐⭐⭐
  because: (a) Groovy, Clojure, and Objective-C have summary-table entries but no
  body subsections, making the "34 languages" coverage claim partially hollow;
  (b) ArkAnalyzer in the TS summary table without a body entry; (c) the CodeQL
  P/R cell overstates the source. These are accuracy issues in a document whose
  stated purpose is to be the canonical accuracy reference — they set a lower bar
  than the document's own standard. Fix the missing subsections (even brief ones)
  and reconcile the ArkAnalyzer reference before merging.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 6, 2026

Greptile Summary

This PR adds § 8.8 — Language-Specific Analysis Reference Map to docs/roadmap/ROADMAP.md, a 570-line research reference covering state-of-the-art call-graph and points-to tools, precision/recall figures, codegraph gaps, adoption candidates, and fixture acquisition guidance for all 34 supported languages across 13 families.

  • Adds dedicated subsections for every language family (JVM, Python, JS/TS, Go, Rust, .NET, Ruby, PHP, BEAM, C-family, Swift/Dart/Zig, ML/Scientific, Scripting/HW, HCL/Terraform) with reference tools, benchmark suites, and concrete adoption candidates.
  • Introduces a fixture acquisition policy distinguishing which benchmark licenses allow direct commit (MIT/Apache/BSD/CC-BY) vs. reference-only use (GPL, unlicensed academic).
  • Updates the § 8.6 CI gate cross-reference to scope benchmarking to all 34 languages via the new § 8.8 reference map.

Confidence Score: 3/5

Documentation-only change; no runtime behavior affected, but contains two factual inaccuracies that would mislead engineers acting on the reference map.

The Groovy row in the Summary table tells engineers 'source-level analysis is the only tractable path,' which contradicts the dedicated Groovy subsection's explicit statement that no source-level precision improvement is achievable without JVM bytecode access. Separately, DyPyBench — a Python-specific benchmark corpus — is cited in both the Dart subsection and the Summary table as the target benchmark methodology for Dart, sending developers to Python-only tooling. Both errors are internally inconsistent with content elsewhere in the same PR and could cause misdirected engineering effort if the reference map is used as intended.

docs/roadmap/ROADMAP.md — Groovy Summary table row (line 2070) and Dart benchmark citations (lines 1833 and 2056).

Important Files Changed

Filename Overview
docs/roadmap/ROADMAP.md Adds § 8.8 (570+ lines) covering 34-language reference map with two internal accuracy issues: Groovy Summary table contradicts the Groovy subsection on source-level viability, and DyPyBench (a Python-only benchmark) is cited as the Dart benchmark methodology in both the Dart subsection and Summary table.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[§ 8.8 Fixture Acquisition] --> B{License?}
    B -->|MIT / Apache-2.0 / BSD / CC-BY| C[Commit to tests/benchmarks/resolution/fixtures/lang/]
    B -->|CC-BY| D[Commit + add attribution in fixture README]
    B -->|GPL| E[Reference-only: run tool, record expected edges — do NOT copy source files]
    B -->|No explicit license / Academic| F[Reference-only: derive expected edges by running tool]
    C --> G[§ 8.6 CI Gate extended to all 34 languages]
    D --> G
    E --> G
    F --> G
    G --> H[Per-language reference tools consulted from § 8.8]
Loading

Fix All in Claude Code

Reviews (12): Last reviewed commit: "docs(roadmap): add dedicated subsections..." | Re-trigger Greptile

@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressed all concerns in commit 9734376:

  1. **Section header → **: Fixed the em dash in the §8.8 heading.

  2. Groovy and Clojure subsection coverage: Added a dedicated "Groovy and Clojure note" paragraph in the JVM body section explaining the invokedynamic precision ceiling, why source-level name matching is the practical ceiling for both, and why no dedicated benchmark exists. The summary table entries remain unchanged.

  3. ArkAnalyzer in summary table but absent from JS/TS body: Added ArkAnalyzer as a fourth row in the JavaScript/TypeScript tool table with its approach and coverage notes.

  4. Roslyn SDK dependency note: Added an explicit Note to the CHA via Roslyn adoption candidate making clear this requires the Roslyn SDK (Microsoft.CodeAnalysis.CSharp) and cannot be driven from tree-sitter output alone.

  5. CodeQL P/R cell overstates source: Changed to "Highest F1 on taint-tracking OWASP Benchmark v1.2" to make clear this is a taint-analysis result, not a raw CG precision figure.

  6. Gleam "near-precise dispatch" unsupported claim: Softened to "Gleam's static type system provides full dispatch information at the source level" in both the tool table and the codegraph gap paragraph.

  7. Stale "as of 2025" assertions: Replaced all three occurrences (.NET, Haskell, Bash benchmark sections) with "as of the time of writing".

Out of scope: CLAUDE.md language count discrepancy (says 23, registry has 34) — filed as #1342.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressed all 3 Greptile findings in commit 9734376:

  1. Missing horizontal rule before Summary table (line 500): Added --- separator between the Terraform/HCL section and the Summary table — consistent with every other language subsection.

  2. Stale "as of 2025" assertions (line 155): Replaced all three occurrences (.NET, Haskell, Bash benchmark sections) with "as of the time of writing", removing the ambiguity about whether the 2025 verification applies given the 2026 tool version citations.

  3. Groovy and Clojure no dedicated subsections (line 37): Added a "Groovy and Clojure note" paragraph in the JVM body section explaining why source-level analysis reaches a precision ceiling for both languages (both compile almost entirely to invokedynamic bytecode), satisfying the test plan's "all 34 languages appear as subsections" criterion.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressed the two remaining Greptile findings in commit 5c82617:

  1. TSX not acknowledged in JS/TS subsection — Added a dedicated TSX note paragraph at the end of the JavaScript / TypeScript section (after the benchmark suites line) confirming that TSX analysis is identical to TypeScript: same grammar extension, same TypeScript compiler API pipeline, same Jelly/ACG reference tools, and no additional dispatch patterns from JSX syntax. Satisfies the test plan's 'all 34 languages appear as subsections' criterion.

  2. Objective-C has no body coverage despite appearing in the section header — Added a dedicated Objective-C note paragraph at the end of the C / C++ / CUDA / Objective-C section explaining: Clang's built-in CallGraph and SVF both model ObjC method dispatch via CHA over the class/protocol hierarchy; tree-sitter-objc captures receiver/selector pairs syntactically; no standalone ObjC CG benchmark exists; Apple's objc4/Foundation serve as informal ground truth; all C/C++ toolchain references cover ObjC when compiled with Clang. Directly analogous to the JVM section's Groovy and Clojure note.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

carlos-alm and others added 2 commits June 6, 2026 00:23
… §8.8 (#1338)

- Add BSD license annotation to JCG entry in JVM benchmark suites
- Add benchmark suites entry to Erlang subsection (ELP call hierarchy,
  Dialyzer OTP scalability benchmarks, Set-theoretic Types test suite)
- Add benchmark suites entry to Gleam subsection (no dedicated benchmark;
  Reach project BEAM test cases as closest ground truth)
@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressed all 3 Greptile findings in commit d6533bf:

  1. JCG missing BSD license annotation (line 1540): Added (BSD) to the JCG entry — JCG (opalj/JCG, BSD). Per the fixture acquisition policy in §8.8, license information is required for commit eligibility; this omission would have left future implementers unable to determine if JCG fixtures can be committed to tests/benchmarks/resolution/fixtures/jvm/.

  2. Erlang subsection missing Benchmark suites entry (lines 1718-1722): Added **Benchmark suites:** ELP call hierarchy (WhatsApp/erlang-language-platform, Apache-2.0); Dialyzer OTP scalability benchmarks (Jansen et al.); Set-theoretic Types for Erlang test suite (321 tests, arXiv 2302.12783). after the adoption candidates — consistent with every other language subsection in §8.8.

  3. Gleam subsection missing Benchmark suites entry (lines 1731-1735): Added **Benchmark suites:** No dedicated Gleam call-graph precision/recall benchmark exists as of the time of writing. The Reach project's BEAM bytecode test cases (elixir-vibe/reach, MIT) are the closest available ground truth. — matches the pattern used in Bash and Haskell subsections.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

Comment thread docs/roadmap/ROADMAP.md Outdated
- Adopt Psalm-style flow-sensitive receiver narrowing: at `$v->method()`, use the narrowed type of `$v` from preceding `instanceof` guards or assignment context rather than the full class hierarchy.
- Phan's two-phase design: build a global class/method index from all parsed files before resolving any call site. Replicates what codegraph's build pipeline already does for JS/TS but is not yet applied to PHP.

**Benchmark suites:** TChecker evaluation corpus (CCS 2022); Artemis corpus (250 PHP web apps, OOPSLA 2025); SWC Registry PHP test cases.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 The "SWC Registry PHP test cases" entry is a misattribution. The SWC Registry (swcregistry.io) is the Smart Contract Weakness Classification registry — a Solidity/EVM vulnerability catalogue. It is correctly cited in the Solidity subsection of this very document. There is no PHP benchmark corpus under that name; linking it here will send engineers looking for a PHP test corpus to an Ethereum security resource.

Suggested change
**Benchmark suites:** TChecker evaluation corpus (CCS 2022); Artemis corpus (250 PHP web apps, OOPSLA 2025); SWC Registry PHP test cases.
**Benchmark suites:** TChecker evaluation corpus (CCS 2022); Artemis corpus (250 PHP web apps, OOPSLA 2025).

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 744edf7 — removed "SWC Registry PHP test cases" from the PHP benchmark suites line. The SWC Registry (swcregistry.io) is correctly cited in the Solidity subsection only. Applied the suggested replacement text.

carlos-alm and others added 2 commits June 6, 2026 01:59
…r's own class (#1343)

* fix(edge_builder): restrict same-file this-dispatch fallback to caller's own class

When a file contains multiple unrelated classes all defining a method with the
same name, the broader same-file suffix scan emitted false-positive call edges
(e.g. this.area() in Shape.describe matched Calculator.area and Formatter.area).

The fix: when the scan finds more than one method with the matching suffix,
restrict the result to methods whose qualified name starts with the caller's
own class prefix.  A single unambiguous match is returned as-is (handles the
CHA case of one subclass override).  If multiple classes match and none is the
caller's class, return nothing rather than emitting false edges.

WASM tests are active; native tests marked todo pending next binary release.
Closes #1324
docs check acknowledged

* fix(edge_builder): apply caller-class scope to single-match suffix scan too (#1343)

The 1-match arm of the same-file suffix scan was returning any sole
method regardless of whether it belonged to the caller's class.  A file
with Caller (no area()) and Sibling (area()) would produce a false
Caller.run → Sibling.area edge — the same bug fixed for 2+ matches.

Replace the three-arm match with a unified caller-prefix filter applied
to all non-empty results.  Add fixture single-sibling.ts and a
corresponding WASM assertion to cover the single-match false-positive
path.  Initialize callEdges to [] for cleaner beforeAll failure mode.
- Remove SWC Registry from PHP benchmark suites (it is the Smart Contract
  Weakness Classification registry for Solidity/EVM, not a PHP test corpus)
- Fix Dart TFA speedup notation: "−49.5%" → "49.5% reduction in AOT
  compilation time" (negative speedup is mathematically undefined)
- Clarify Ruby CLBG citation: Ali et al. TSE study covers JVM-hosted languages
  (Groovy/Clojure/Scala/Kotlin), not MRI Ruby; no dedicated Ruby CG P/R
  evaluation exists as of the time of writing
carlos-alm added a commit that referenced this pull request Jun 6, 2026
…on, Dart sign, Ruby citation (#1338)

- Remove SWC Registry from PHP benchmark suites (it is the Smart Contract
  Weakness Classification registry for Solidity/EVM, not a PHP test corpus)
- Fix Dart TFA speedup notation: "−49.5%" → "49.5% reduction in AOT
  compilation time" (negative speedup is mathematically undefined)
- Clarify Ruby CLBG citation: Ali et al. TSE study covers JVM-hosted languages
  (Groovy/Clojure/Scala/Kotlin), not MRI Ruby; no dedicated Ruby CG P/R
  evaluation exists as of the time of writing
@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressed 3 remaining factual issues from Greptile's review in commit 744edf7:

  1. SWC Registry PHP misattribution (Greptile P1 inline, line 1681): Removed "SWC Registry PHP test cases" from the PHP benchmark suites line. The Smart Contract Weakness Classification registry is a Solidity/EVM catalogue — correctly cited only in the Solidity subsection.

  2. Dart TFA speedup sign (Greptile, line 1784): Changed "AOT compilation speedup −49.5%" to "49.5% reduction in AOT compilation time". Negative speedup is mathematically undefined for the T_old/T_new ratio; the intended meaning is a time reduction.

  3. Ruby CLBG benchmark citation (Greptile, line 1661): Clarified that the Ali et al. IEEE TSE study applied CLBG programs to JVM-hosted languages (Groovy, Clojure, Scala, Kotlin), not MRI Ruby. Added explicit note that no dedicated Ruby CG precision/recall evaluation exists as of the time of writing.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm force-pushed the docs/phase8-language-analysis-reference-map branch from 0127e27 to 9d817e7 Compare June 6, 2026 09:39
…ve-C (#1338)

Addresses review feedback: all 34 languages now have individual subsections
in §8.8. Previously Groovy and Clojure were covered only by a shared note
inside the JVM section; Objective-C was covered only by a note inside the
C/C++ section.

Each new subsection follows the standard template: reference tools, codegraph
gap, adoption candidates, and benchmark suites.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressed all remaining Claude review concerns:

  1. Missing Groovy and Clojure subsections — Added dedicated #### Groovy and #### Clojure subsections (commit 04e606e), each with the full template: reference tools, codegraph gap, adoption candidates, benchmark suites. The JVM section's shared 'Groovy and Clojure note' is preserved for cross-reference.

  2. Missing Objective-C subsection — Converted the **Objective-C note:** paragraph in the C/C++ section into a proper #### Objective-C subsection (same commit). Includes reference tools, codegraph gap, adoption candidates, and benchmark suites.

  3. ArkAnalyzer reconciliation — Already present in the JS/TS body section at the reference tools table (line 1572 as of the previous commit series). No further change needed.

  4. Roslyn SDK dependency note — Already added in a prior commit. The .NET section explicitly states the Roslyn SDK requirement.

  5. CodeQL P/R cell — Already fixed in a prior commit: the C# CodeQL entry now correctly reads 'Highest F1 on taint-tracking OWASP Benchmark v1.2' and includes 'no standalone C# CG P/R benchmark published'.

  6. CI commitlint failure — Fixed by rewriting the 105-character commit header to 50 characters via git filter-branch + force-push (the only valid fix for a commitlint violation in a non-HEAD commit).

All 34 languages now have dedicated subsections in §8.8.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@claude

@claude
Copy link
Copy Markdown

claude Bot commented Jun 6, 2026

Claude finished @carlos-alm's task —— View job


Re-Review — §8.8 Language-Specific Analysis Reference Map

  • Gather context
  • Read current ROADMAP.md state (§8.8 section)
  • Verify fixes from previous review rounds
  • Final assessment

Re-Review Summary

All items from the previous Claude review (3/5, "approve with conditions") have been addressed:

Previous condition Status
Groovy and Clojure subsections missing ✅ Added (lines 1544, 1558) — full template: reference tools, codegraph gap, adoption candidates, benchmark suites
Objective-C subsection missing ✅ Added (line 1788) — full template
ArkAnalyzer in summary table but absent from JS/TS body ✅ Added to tool table at line 1600
Roslyn SDK dependency note absent ✅ Added to .NET adoption candidate (line 1667)
CodeQL P/R cell overstated taint-analysis as CG precision ✅ Fixed — now reads "Highest F1 on taint-tracking OWASP Benchmark v1.2" (line 1660)
Gleam "near-precise dispatch" unsupported claim ✅ Softened to "Gleam's static type system provides full dispatch information at the source level"
Em dash inconsistency in section header ✅ Fixed (line 1510: ### 8.8 —)
SWC Registry PHP misattribution ✅ Removed from PHP benchmark suites (line 1709)
Dart TFA speedup sign error ✅ Fixed — "49.5% reduction in AOT compilation time" (line 1824)
Ruby CLBG misattribution ✅ Fixed with accurate caveat (line 1689)
JCG missing BSD license annotation ✅ Added (line 1540)
Erlang benchmark suites missing ✅ Added (line 1748)
Gleam benchmark suites missing ✅ Added (line 1763)
"as of 2025" stale assertions ✅ Replaced with "as of the time of writing"

One remaining open issue

DyPyBench misattribution for Dart (lines 1833 and 2056) — not yet fixed.

Greptile's P1 finding from the most recent review cycle: arXiv 2403.00539 is "DyPyBench: A Benchmark of Executable Python Software" — a Python-only corpus (50 Python projects, 681k lines of Python code, built on the DynaPyt Python analysis framework). The document's Dart subsection describes it as "portable to Dart via Dart Observatory/VM coverage" but no published work ports DyPyBench to Dart.

Current state at line 1833:

**Benchmark suites:** DyPyBench methodology (arXiv 2403.00539) — executable benchmark comparing static vs dynamic CG — portable to Dart via Dart Observatory/VM coverage.

And at line 2056 in the summary table:

| Dart | ... | DyPyBench methodology (arXiv 2403.00539) |

The correct fix (consistent with how Bash and Haskell are handled — both acknowledge "no public benchmark exists") would be to either:

  • Replace with the actual Dart-specific reference: the Aarhus DLS 2016 paper (cited in the Dart body section as "Heinze, Møller, Strocco") and the Dart VM TFA's Flutter app evaluation
  • Or simply state: "No dedicated Dart CG P/R benchmark exists. Heinze et al. DLS 2016 is the closest reference."

Revised Rating

⭐⭐⭐⭐☆ (4/5) — APPROVE WITH ONE FIX

All four conditions from the previous review are resolved. The section now has full subsection coverage for all 34 languages, all cited review concerns are addressed, and the content quality is genuinely strong. The single remaining blocker is the DyPyBench misattribution, which is the same class of error (Python corpus cited for a different language) that was already corrected for PHP. It should be fixed before merge.

View job

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant