feat(graph): C parser — function defs, types, decls, calls, includes#670
Merged
shivasurya merged 1 commit intomainfrom May 3, 2026
Merged
feat(graph): C parser — function defs, types, decls, calls, includes#670shivasurya merged 1 commit intomainfrom
shivasurya merged 1 commit intomainfrom
Conversation
SafeDep Report SummaryNo dependency changes detected. Nothing to scan. This report is generated by SafeDep Github App |
Code Pathfinder Security ScanNo security issues detected.
Powered by Code Pathfinder |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #670 +/- ##
==========================================
+ Coverage 85.10% 85.13% +0.03%
==========================================
Files 176 177 +1
Lines 25240 25550 +310
==========================================
+ Hits 21480 21752 +272
- Misses 2956 2978 +22
- Partials 804 820 +16 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
6 tasks
Owner
Author
This was referenced May 3, 2026
Owner
Author
Merge activity
|
…cludes Convert tree-sitter C AST nodes into graph.Node objects. After this PR, scanning a C project produces a populated CodeGraph for every node type the C parser is responsible for: function definitions, forward declarations, structs, enums, typedefs, variable declarations, includes, and call expressions. # parser_c.go (new) Single file in package graph (matching parser_python.go / parser_golang.go convention). Organised into clearly-marked sections — function definitions, struct/enum/typedef, variable declarations, call expressions, includes, and a small block of internal helpers. Two functions — parseCLikeDeclaration and parseCLikeInclude — accept an isCpp flag so parser_cpp.go (PR-04) can reuse them without duplicating logic. All AST extraction (function metadata, type strings, parameters, struct fields, call info) goes through graph/clike (PR-02). The parser is essentially a thin layer that turns clike's structured info into graph.Node objects with the right Type, Language, Metadata, and SourceLocation. Notable design choices: - Forward declarations: tree-sitter emits `declaration` (not `function_definition`) for prototypes such as `int add(int, int);`. parseCLikeDeclaration detects function_declarator children via isFunctionPrototype and routes them to emitFunctionDeclaration, which produces a function_definition node with Metadata["is_declaration"] = true. This means rule writers can find every callable function under a single Type, with the declaration/definition distinction surfaced as metadata. - Type-reference vs type-declaration: `struct Buffer*` in a parameter is not a struct declaration. parseCStructSpecifier and parseCEnumSpecifier short-circuit when the body field is nil, leaving the variable_declaration / parameter to carry the type information. - Multi-declarator declarations: `int a = 1, b = 2, c;` produces three variable_declaration nodes — one per init_declarator child reached via childrenByFieldName (which iterates field-name matches, since ChildByFieldName returns only the first). - Constants for Node.Type, Node.Language, and Metadata keys are declared at the top of the file so consumers (rules, call-graph builder) can reference them by symbol rather than string literal. # parser.go (modified) Two existing cases gained a C branch: - function_definition: C dispatch first, Python second - call_expression: C dispatch first, Go second Five new cases for C/C++ specific node types: - struct_specifier (C only at top level — C++ uses class_specifier) - enum_specifier - type_definition - declaration - preproc_include Java and Python paths are untouched; existing tests pass with zero changes. # Tests testdata/c/example.c covers every node type the parser handles, plus a neighbouring buffer.h with two forward declarations. parser_c_test.go runs Initialize() against the directory and asserts: - function definitions emit correct Name, ReturnType, params, modifiers - forward declarations carry Metadata["is_declaration"] = true - struct fields appear in MethodArgumentsType as "name: type" pairs - enum enumerators appear in Metadata["enumerators"] preserving values - typedefs capture both the alias name and the underlying type - multi-declarator declarations emit one node per variable - function-local variables carry their enclosing function name as Scope - system vs project includes are tagged correctly via Metadata - call expressions are linked to their enclosing function via OutgoingEdges Two focused unit tests cover the call-shape branches (arrow method, qualified call) and the isCpp=true path on parseCLikeDeclaration that the integration fixture cannot exercise yet — those branches go live when parser_cpp.go (PR-04) starts dispatching from .cpp files. Co-Authored-By: Claude <noreply@anthropic.com>
shivasurya
added a commit
that referenced
this pull request
May 3, 2026
…low (#671) ## Summary Stacked on **#670** (C parser). Adds the C++ parser. After this PR a `.cpp` project produces a fully populated `CodeGraph` with `Language=\"cpp\"` on every node, plus the C++- only constructs the security analysis layer needs: classes with inheritance, methods with access modifiers, namespaces (named and anonymous, including nesting), templates, and exception flow (throw/try/catch). ## Files - **`graph/parser_cpp.go`** — new, ~890 lines. Class / namespace / template / field / throw / try / call / struct / enum / typedef. Uses `currentContext` to detect class membership and propagate namespace PackageName through the AST recursion. - **`graph/parser_cpp_test.go`** — `TestParseCppEndToEnd` (15 sub-tests covering every gap-analysis point from the tech spec) plus four targeted unit tests for defensive paths. - **`graph/testdata/cpp/{example.cpp,buffer.hpp}`** — single-project fixture exercising every C++ construct the parser handles. - **`graph/parser.go`** — modified. Two existing cases gained a C++ branch (`function_definition`, `call_expression`); five new cases for C++-only node types; existing `struct_specifier` / `enum_specifier` / `type_definition` cases now dispatch to the C-flavour or C++-flavour parse function based on file type. **Java-only handlers** (block, if/while/do/for, yield, binary_expression, class_declaration, block_comment) gated by `isJavaSourceFile` to fix cross-language pollution that produced Java-tagged nodes inside C/C++ files. - **`graph/parser_c.go`** — minor: `parseCLikeDeclaration` now routes destructor-shaped declarations to the C++ helper when in class context; `childrenByFieldName` renamed to `childDeclarators` (linter caught unused generality). - **`graph/graph_test.go`** — two existing tests updated to reflect the now-correct reality: `.cpp` files are parsed (not ignored), and the Java `BlockStmt` leak that inflated Python-test node counts is fixed. ## Design choices - **Separate files per language.** `parser_c.go` and `parser_cpp.go` are independent. Where the AST shape is genuinely identical (`declaration`, `preproc_include`), the existing C functions take an `isCpp` flag and emit the right Language tag. Where the shape differs (classes, namespaces, templates, throw/try, methods inside class bodies), each language has its own parse function. Where the shape is similar but C++ adds features (struct inheritance, scoped enums), the C++ flavour is a separate function so future C++ extensions don't ripple into the C path. - **Method dispatch via `currentContext`.** `parseCppFunctionDefinition` detects class membership via `classFromContext(currentContext)` and emits `method_declaration` instead of `function_definition` when inside a class body. Same primitive disambiguates field_declaration: `int x;` becomes a data member, `void bar();` becomes a method declaration with `is_declaration=true`. - **Access specifier as side-channel state.** tree-sitter emits `access_specifier` as a sibling preceding the fields/methods it governs. `recordAccessSpecifier` mutates the class node's `Metadata[current_access]`; subsequent handlers read it. This avoids a separate AST pre-pass while keeping the graph nodes context-independent (each field/method carries its own Modifier). - **Constants over magic strings.** `nodeType*` and `meta*` declared next to the parser that emits them. C++-only constants live in `parser_cpp.go`; shared constants stay in `parser_c.go`. - **Pre-existing bugs fixed.** Java-only parsers were producing Java-tagged nodes for non-Java files. Each gate is a single-line `if isJavaSourceFile {}` wrap — no Java parser internals touched. ## Test plan - [x] `go build ./...` — clean - [x] `go vet ./...` — clean - [x] `golangci-lint run ./graph/...` — 0 issues - [x] `go test ./... -count=1` — all 25 packages pass, zero regressions - [x] `TestParseCppEndToEnd` — 15 sub-tests covering every gap-analysis point: inheritance, namespace propagation, anonymous namespaces, access + override + virtual + pure virtual, destructors, class fields, templates, throw/try with catch types, dot/arrow/qualified calls, scoped enums, typedefs, C++ structs, header forward decls, and regression check against Java-tagged nodes leaking into C++ files - [x] Targeted unit tests: forward class declaration, `catch (...)`, nil template list, `recordAccessSpecifier` outside class context
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.




Summary
Stacked on #669 (clike shared helpers).
Converts tree-sitter C AST nodes into
graph.Nodeobjects. After this PR aC project can be scanned end-to-end via
Initialize()and the resultingCodeGraphcontains every category of node the C parser owns: functiondefinitions, forward declarations, structs, enums, typedefs, variable
declarations, includes, and call expressions.
Files
graph/parser_c.go— new, ~600 lines, single file in packagegraphmatching the existing
parser_python.go/parser_golang.goconvention.Organised into labelled sections (functions / types / decls / calls /
includes / helpers) with one
Node.Typeconstant per produced shape.graph/parser.go— modified. Two existing cases (function_definition,call_expression) gained a C branch in front of the existing Python /Go branches; five new cases added for
struct_specifier,enum_specifier,type_definition,declaration,preproc_include.Java / Python paths are untouched.
graph/parser_c_test.go—TestParseCEndToEndparses the newtestdata/c/fixture viaInitialize()and validates every nodecategory. Two focused unit tests cover the call-shape branches and
the
isCpp=truepath that the integration fixture cannot exercisebefore PR-04 lands.
graph/testdata/c/{example.c,buffer.h}— single small projectexercising every node type.
Design choices
declaration(notfunction_definition) for prototypes likeint add(int, int);.parseCLikeDeclarationdetects afunction_declaratorchild viaisFunctionPrototypeand routes toemitFunctionDeclaration, whichproduces a
function_definitionnode withMetadata[\"is_declaration\"] = true. Rule writers find everycallable function under one
Type; the prototype/definition split issurfaced as metadata, not as a separate node category.
struct Buffer*parameteris not a struct declaration.
parseCStructSpecifierandparseCEnumSpecifiershort-circuit when the body field is nil, sothese only record actual definitions.
int a = 1, b = 2, c;becomes threevariable_declarationnodes viachildrenByFieldName, whichiterates the full child list (the stdlib
ChildByFieldNamereturnsonly the first match).
isCppflag.parseCLikeDeclarationandparseCLikeIncludeaccept anisCppflag so PR-04 can call themdirectly without duplicating logic — the only difference is the
Languagetag on produced nodes.nodeType*andmeta*constantsat the top of
parser_c.gomean rules and the call-graph builder canreference values by symbol rather than re-typing the string.
Test plan
go build ./...— cleango vet ./...— cleangolangci-lint run ./graph/...— 0 issuesgo test ./... -count=1— all packages pass, zero regressions inJava / Python / Go tests
TestParseCEndToEnd— 9 sub-tests covering every produced node type:TestParseCCallExpression_MethodAndQualified— arrow-method andC++ qualified-call shapes
TestParseCLikeDeclaration_IsCppFlag— Language="cpp" branch