Skip to content

feat(graph): C parser — function defs, types, decls, calls, includes#670

Merged
shivasurya merged 1 commit intomainfrom
shiva/c-parser
May 3, 2026
Merged

feat(graph): C parser — function defs, types, decls, calls, includes#670
shivasurya merged 1 commit intomainfrom
shiva/c-parser

Conversation

@shivasurya
Copy link
Copy Markdown
Owner

Summary

Stacked on #669 (clike shared helpers).

Converts tree-sitter C AST nodes into graph.Node objects. After this PR a
C project can be scanned end-to-end via Initialize() and the resulting
CodeGraph contains every category of node the C parser owns: function
definitions, forward declarations, structs, enums, typedefs, variable
declarations, includes, and call expressions.

Files

  • graph/parser_c.go — new, ~600 lines, single file in package graph
    matching the existing parser_python.go / parser_golang.go convention.
    Organised into labelled sections (functions / types / decls / calls /
    includes / helpers) with one Node.Type constant per produced shape.
  • graph/parser.go — modified. Two existing cases (function_definition,
    call_expression) gained a C branch in front of the existing Python /
    Go branches; five new cases added for struct_specifier,
    enum_specifier, type_definition, declaration, preproc_include.
    Java / Python paths are untouched.
  • graph/parser_c_test.goTestParseCEndToEnd parses the new
    testdata/c/ fixture via Initialize() and validates every node
    category. Two focused unit tests cover the call-shape branches and
    the isCpp=true path that the integration fixture cannot exercise
    before PR-04 lands.
  • graph/testdata/c/{example.c,buffer.h} — single small project
    exercising every node type.

Design choices

  • Forward declarations. tree-sitter emits declaration (not
    function_definition) for prototypes like int add(int, int);.
    parseCLikeDeclaration detects a function_declarator child via
    isFunctionPrototype and routes to emitFunctionDeclaration, which
    produces a function_definition node with
    Metadata[\"is_declaration\"] = true. Rule writers find every
    callable function under one Type; the prototype/definition split is
    surfaced as metadata, not as a separate node category.
  • Type reference vs type declaration. A struct Buffer* parameter
    is not a struct declaration. parseCStructSpecifier and
    parseCEnumSpecifier short-circuit when the body field is nil, so
    these only record actual definitions.
  • Multi-declarator support. int a = 1, b = 2, c; becomes three
    variable_declaration nodes via childrenByFieldName, which
    iterates the full child list (the stdlib ChildByFieldName returns
    only the first match).
  • Shared with C++ via isCpp flag. parseCLikeDeclaration and
    parseCLikeInclude accept an isCpp flag so PR-04 can call them
    directly without duplicating logic — the only difference is the
    Language tag on produced nodes.
  • Constants over magic strings. nodeType* and meta* constants
    at the top of parser_c.go mean rules and the call-graph builder can
    reference values by symbol rather than re-typing the string.

Test plan

  • go build ./... — clean
  • go vet ./... — clean
  • golangci-lint run ./graph/... — 0 issues
  • go test ./... -count=1 — all packages pass, zero regressions in
    Java / Python / Go tests
  • TestParseCEndToEnd — 9 sub-tests covering every produced node type:
    • function_definitions (name, return, params, modifiers)
    • forward_declaration_marked (Metadata["is_declaration"])
    • struct_declaration (fields populated)
    • enum_declaration (enumerators preserved with values)
    • type_definition_unsigned_long (DataType = "unsigned long")
    • type_definition_anonymous_struct
    • variable_declarations (globals + multi-declarator + function-local Scope)
    • includes_system_vs_local (Metadata["system_include"])
    • call_expressions_linked_to_caller (OutgoingEdges from function)
  • TestParseCCallExpression_MethodAndQualified — arrow-method and
    C++ qualified-call shapes
  • TestParseCLikeDeclaration_IsCppFlag — Language="cpp" branch

@shivasurya shivasurya self-assigned this May 2, 2026
@safedep
Copy link
Copy Markdown

safedep Bot commented May 2, 2026

SafeDep Report Summary

Green Malicious Packages Badge Green Vulnerable Packages Badge Green Risky License Badge

No dependency changes detected. Nothing to scan.

View complete scan results →

This report is generated by SafeDep Github App

@shivasurya shivasurya added enhancement New feature or request go Pull requests that update go code labels May 2, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 2, 2026

Code Pathfinder Security Scan

Pass Critical High Medium Low Info

No security issues detected.

Metric Value
Files Scanned 5
Rules 205

Powered by Code Pathfinder

@codecov
Copy link
Copy Markdown

codecov Bot commented May 2, 2026

Codecov Report

❌ Patch coverage is 85.89744% with 44 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.13%. Comparing base (c6d2a46) to head (d4aae16).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
sast-engine/graph/parser_c.go 84.77% 26 Missing and 18 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #670      +/-   ##
==========================================
+ Coverage   85.10%   85.13%   +0.03%     
==========================================
  Files         176      177       +1     
  Lines       25240    25550     +310     
==========================================
+ Hits        21480    21752     +272     
- Misses       2956     2978      +22     
- Partials      804      820      +16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Owner Author

shivasurya commented May 3, 2026

Merge activity

  • May 3, 1:15 PM UTC: A user started a stack merge that includes this pull request via Graphite.
  • May 3, 1:19 PM UTC: Graphite rebased this pull request as part of a merge.
  • May 3, 1:20 PM UTC: @shivasurya merged this pull request with Graphite.

@shivasurya shivasurya changed the base branch from shiva/cpp-clike-helpers to graphite-base/670 May 3, 2026 13:16
@shivasurya shivasurya changed the base branch from graphite-base/670 to main May 3, 2026 13:18
…cludes

Convert tree-sitter C AST nodes into graph.Node objects. After this PR,
scanning a C project produces a populated CodeGraph for every node type
the C parser is responsible for: function definitions, forward declarations,
structs, enums, typedefs, variable declarations, includes, and call
expressions.

# parser_c.go (new)

Single file in package graph (matching parser_python.go / parser_golang.go
convention). Organised into clearly-marked sections — function definitions,
struct/enum/typedef, variable declarations, call expressions, includes,
and a small block of internal helpers. Two functions —
parseCLikeDeclaration and parseCLikeInclude — accept an isCpp flag so
parser_cpp.go (PR-04) can reuse them without duplicating logic.

All AST extraction (function metadata, type strings, parameters, struct
fields, call info) goes through graph/clike (PR-02). The parser is
essentially a thin layer that turns clike's structured info into
graph.Node objects with the right Type, Language, Metadata, and
SourceLocation.

Notable design choices:
- Forward declarations: tree-sitter emits `declaration` (not
  `function_definition`) for prototypes such as `int add(int, int);`.
  parseCLikeDeclaration detects function_declarator children via
  isFunctionPrototype and routes them to emitFunctionDeclaration, which
  produces a function_definition node with Metadata["is_declaration"] =
  true. This means rule writers can find every callable function under a
  single Type, with the declaration/definition distinction surfaced as
  metadata.
- Type-reference vs type-declaration: `struct Buffer*` in a parameter is
  not a struct declaration. parseCStructSpecifier and parseCEnumSpecifier
  short-circuit when the body field is nil, leaving the variable_declaration
  / parameter to carry the type information.
- Multi-declarator declarations: `int a = 1, b = 2, c;` produces three
  variable_declaration nodes — one per init_declarator child reached via
  childrenByFieldName (which iterates field-name matches, since
  ChildByFieldName returns only the first).
- Constants for Node.Type, Node.Language, and Metadata keys are declared
  at the top of the file so consumers (rules, call-graph builder) can
  reference them by symbol rather than string literal.

# parser.go (modified)

Two existing cases gained a C branch:
- function_definition: C dispatch first, Python second
- call_expression: C dispatch first, Go second

Five new cases for C/C++ specific node types:
- struct_specifier (C only at top level — C++ uses class_specifier)
- enum_specifier
- type_definition
- declaration
- preproc_include

Java and Python paths are untouched; existing tests pass with zero changes.

# Tests

testdata/c/example.c covers every node type the parser handles, plus a
neighbouring buffer.h with two forward declarations. parser_c_test.go
runs Initialize() against the directory and asserts:
- function definitions emit correct Name, ReturnType, params, modifiers
- forward declarations carry Metadata["is_declaration"] = true
- struct fields appear in MethodArgumentsType as "name: type" pairs
- enum enumerators appear in Metadata["enumerators"] preserving values
- typedefs capture both the alias name and the underlying type
- multi-declarator declarations emit one node per variable
- function-local variables carry their enclosing function name as Scope
- system vs project includes are tagged correctly via Metadata
- call expressions are linked to their enclosing function via OutgoingEdges

Two focused unit tests cover the call-shape branches (arrow method,
qualified call) and the isCpp=true path on parseCLikeDeclaration that
the integration fixture cannot exercise yet — those branches go live
when parser_cpp.go (PR-04) starts dispatching from .cpp files.

Co-Authored-By: Claude <noreply@anthropic.com>
@shivasurya shivasurya merged commit a7d99f7 into main May 3, 2026
6 checks passed
@shivasurya shivasurya deleted the shiva/c-parser branch May 3, 2026 13:20
shivasurya added a commit that referenced this pull request May 3, 2026
…low (#671)

## Summary

Stacked on **#670** (C parser).

Adds the C++ parser. After this PR a `.cpp` project produces a fully
populated `CodeGraph` with `Language=\"cpp\"` on every node, plus the C++-
only constructs the security analysis layer needs: classes with
inheritance, methods with access modifiers, namespaces (named and
anonymous, including nesting), templates, and exception flow
(throw/try/catch).

## Files

- **`graph/parser_cpp.go`** — new, ~890 lines. Class / namespace /
  template / field / throw / try / call / struct / enum / typedef. Uses
  `currentContext` to detect class membership and propagate namespace
  PackageName through the AST recursion.
- **`graph/parser_cpp_test.go`** — `TestParseCppEndToEnd` (15 sub-tests
  covering every gap-analysis point from the tech spec) plus four
  targeted unit tests for defensive paths.
- **`graph/testdata/cpp/{example.cpp,buffer.hpp}`** — single-project
  fixture exercising every C++ construct the parser handles.
- **`graph/parser.go`** — modified. Two existing cases gained a C++ branch
  (`function_definition`, `call_expression`); five new cases for
  C++-only node types; existing `struct_specifier` / `enum_specifier` /
  `type_definition` cases now dispatch to the C-flavour or C++-flavour
  parse function based on file type. **Java-only handlers** (block,
  if/while/do/for, yield, binary_expression, class_declaration,
  block_comment) gated by `isJavaSourceFile` to fix cross-language
  pollution that produced Java-tagged nodes inside C/C++ files.
- **`graph/parser_c.go`** — minor: `parseCLikeDeclaration` now routes
  destructor-shaped declarations to the C++ helper when in class context;
  `childrenByFieldName` renamed to `childDeclarators` (linter caught
  unused generality).
- **`graph/graph_test.go`** — two existing tests updated to reflect the
  now-correct reality: `.cpp` files are parsed (not ignored), and the
  Java `BlockStmt` leak that inflated Python-test node counts is fixed.

## Design choices

- **Separate files per language.** `parser_c.go` and `parser_cpp.go` are
  independent. Where the AST shape is genuinely identical
  (`declaration`, `preproc_include`), the existing C functions take an
  `isCpp` flag and emit the right Language tag. Where the shape differs
  (classes, namespaces, templates, throw/try, methods inside class
  bodies), each language has its own parse function. Where the shape is
  similar but C++ adds features (struct inheritance, scoped enums), the
  C++ flavour is a separate function so future C++ extensions don't
  ripple into the C path.
- **Method dispatch via `currentContext`.** `parseCppFunctionDefinition`
  detects class membership via `classFromContext(currentContext)` and
  emits `method_declaration` instead of `function_definition` when
  inside a class body. Same primitive disambiguates field_declaration:
  `int x;` becomes a data member, `void bar();` becomes a method
  declaration with `is_declaration=true`.
- **Access specifier as side-channel state.** tree-sitter emits
  `access_specifier` as a sibling preceding the fields/methods it
  governs. `recordAccessSpecifier` mutates the class node's
  `Metadata[current_access]`; subsequent handlers read it. This avoids
  a separate AST pre-pass while keeping the graph nodes
  context-independent (each field/method carries its own Modifier).
- **Constants over magic strings.** `nodeType*` and `meta*` declared
  next to the parser that emits them. C++-only constants live in
  `parser_cpp.go`; shared constants stay in `parser_c.go`.
- **Pre-existing bugs fixed.** Java-only parsers were producing
  Java-tagged nodes for non-Java files. Each gate is a single-line
  `if isJavaSourceFile {}` wrap — no Java parser internals touched.

## Test plan

- [x] `go build ./...` — clean
- [x] `go vet ./...` — clean
- [x] `golangci-lint run ./graph/...` — 0 issues
- [x] `go test ./... -count=1` — all 25 packages pass, zero regressions
- [x] `TestParseCppEndToEnd` — 15 sub-tests covering every gap-analysis
      point: inheritance, namespace propagation, anonymous namespaces,
      access + override + virtual + pure virtual, destructors, class
      fields, templates, throw/try with catch types, dot/arrow/qualified
      calls, scoped enums, typedefs, C++ structs, header forward decls,
      and regression check against Java-tagged nodes leaking into C++
      files
- [x] Targeted unit tests: forward class declaration, `catch (...)`,
      nil template list, `recordAccessSpecifier` outside class context
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request go Pull requests that update go code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant