Skip to content

feat(python-sdk): @c_rule / @cpp_rule decorators with language scoping#678

Merged
shivasurya merged 5 commits intomainfrom
shiva/cpp-python-sdk-decorators
May 3, 2026
Merged

feat(python-sdk): @c_rule / @cpp_rule decorators with language scoping#678
shivasurya merged 5 commits intomainfrom
shiva/cpp-python-sdk-decorators

Conversation

@shivasurya
Copy link
Copy Markdown
Owner

Summary

Adds C and C++ rule decorators that mirror the existing @go_rule / @python_rule contract. Rules can now be authored with @c_rule(...) / @cpp_rule(...); the decorators inject language="c" / language="cpp" into dataflow IR so DataflowExecutor scopes flows() rules to the right language. Pure calls() matchers stay language-agnostic — same documented contract as @go_rule.

This is the SDK glue half of the C/C++ language support track. Example security rules will land in a follow-up PR.

What's in this PR

  • codepathfinder.{c,cpp}_decorators@c_rule / @cpp_rule with full metadata (id/name/severity/cwe/cve/owasp/tags/message), atexit auto-output for python3 rule.py smoke runs, and clear_*_rules() helpers for test isolation.
  • codepathfinder.{c,cpp}_ircompile_c_rules() / compile_cpp_rules() emit JSON IR with "language" in both rule metadata and matcher dict.
  • python-sdk/rules/ shims preserve the existing import path style (from rules.c_decorators import c_rule keeps working).
  • dsl/loader.go decorator detector now recognises @c_rule( and @cpp_rule( so the loader can early-filter rule files like it does for @go_rule(.
  • 33 unit tests covering metadata, language injection contract (dataflow vs call_matcher), JSON serialisation, registry isolation between C and C++ registries, default-message + default-name behavior, and invalid-matcher error paths.

Validation

End-to-end smoke runs (binary + locally-installed pip install -e python-sdk):

project C++ functions call sites rules loaded detections
tiny C smoke (4 unsafe calls) 1 4 unique sites
sglang sgl-kernel/csrc/cpu 337 9,594 2 5 unique sites
proxygen full tree 11,835 66,010 4 35 unique sites
  • Class-qualified function names extracted correctly on real code: HTTP1xCodec::generateBody, CAresResolver::Query::queryCallback, TestAsyncTransport::WriteEvent::newEvent, H3DatagramAsyncSocket::deliverDatagram.
  • No parse errors against 9.4 MB of production C++ (templates, nested classes, lambdas).
  • Wildcard call-matchers (e.g. *memcpy) correctly match qualified C++ calls (std::memcpy) — same contract as @go_rule.

Test plan

  • python -m pytest python-sdk/tests/ — 441 passed
  • go test ./sast-engine/dsl/ — pass
  • golangci-lint run ./dsl/... — 0 issues
  • gradle buildGo — clean
  • Manual scan: tiny C project (@c_rule + calls()) produces detections with correct file/line/function
  • Manual scan: sglang and proxygen — @cpp_rule rules load and detect, function FQNs accurate
  • Forbidden-files gate clean (no docs/plans/research files)

@shivasurya shivasurya added enhancement New feature or request go Pull requests that update go code python labels May 3, 2026
@shivasurya shivasurya self-assigned this May 3, 2026
@shivasurya shivasurya added enhancement New feature or request go Pull requests that update go code python labels May 3, 2026
@safedep
Copy link
Copy Markdown

safedep Bot commented May 3, 2026

SafeDep Report Summary

Green Malicious Packages Badge Green Vulnerable Packages Badge Green Risky License Badge

No dependency changes detected. Nothing to scan.

View complete scan results →

This report is generated by SafeDep Github App

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 3, 2026

CLA assistant check
All committers have signed the CLA.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 3, 2026

Code Pathfinder Security Scan

Pass Critical High Medium Low Info

No security issues detected.

Metric Value
Files Scanned 14
Rules 205

Powered by Code Pathfinder

@codecov
Copy link
Copy Markdown

codecov Bot commented May 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.43%. Comparing base (afb74f1) to head (20cb8ce).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #678      +/-   ##
==========================================
+ Coverage   85.41%   85.43%   +0.01%     
==========================================
  Files         187      187              
  Lines       27276    27278       +2     
==========================================
+ Hits        23298    23305       +7     
+ Misses       3086     3082       -4     
+ Partials      892      891       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Owner Author

shivasurya commented May 3, 2026

Merge activity

  • May 3, 1:15 PM UTC: A user started a stack merge that includes this pull request via Graphite.
  • May 3, 1:35 PM UTC: Graphite rebased this pull request as part of a merge.
  • May 3, 1:36 PM UTC: @shivasurya merged this pull request with Graphite.

@shivasurya shivasurya changed the base branch from shiva/cpp-cfg-scan to graphite-base/678 May 3, 2026 13:33
@shivasurya shivasurya changed the base branch from graphite-base/678 to main May 3, 2026 13:34
shivasurya and others added 4 commits May 3, 2026 13:35
Mirrors the @go_rule contract for C and C++ security rules. The decorators
inject language="c"/"cpp" into dataflow IR so DataflowExecutor scopes
flows() to the right language; pure calls() rules remain language-agnostic
(same Gap 1 / Gap 4 documented contract as @go_rule).

- codepathfinder.{c,cpp}_decorators with metadata dataclasses, atexit
  auto-output, registry helpers, and clear_*_rules for test isolation.
- codepathfinder.{c,cpp}_ir compilers emit JSON IR with the language tag
  in both rule metadata and matcher dict.
- python-sdk/rules/ shims preserve the existing import path style.
- dsl/loader.go decorator detector now recognises @c_rule / @cpp_rule
  alongside @go_rule for early file filtering.
- Unit tests cover registration, metadata, language injection contract
  (dataflow vs call_matcher), JSON serialisation, and registry isolation
  between the C and C++ registries.

Verified end-to-end against tiny C/C++ smoke projects, sglang
(337 C++ functions, 9.6k call sites) and proxygen (11.8k C++ functions,
66k call sites): @c_rule / @cpp_rule rule files load via the DSL loader,
rules execute on parsed C/C++ functions, and detections include correct
file/line/class-qualified function names.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The result-collection step in Initialize iterated each worker's local
edges and called codeGraph.AddEdge(edge.From, edge.To). The worker had
already populated edge.From.OutgoingEdges via localGraph.AddEdge, and
node pointers are shared across the local and global graphs, so the
collector's second AddEdge appended a fresh Edge struct onto the same
OutgoingEdges slice — every entry ended up duplicated.

Why only C/C++ rules were affected: Go's buildParentMap and Python's
path collapse duplicates via map insertion, but the C/C++ call-graph
builders (PR-07/08) iterate fnNode.OutgoingEdges linearly, emitting one
CallSiteInternal per edge. That surfaced as 2× detections on every
@c_rule / @cpp_rule run.

The fix transfers Edge structs from localGraph.Edges to codeGraph.Edges
without re-attaching them to OutgoingEdges. Two regression tests guard
the contract:

- TestInitialize_NoDuplicateOutgoingEdges: a function with two distinct
  calls must end up with exactly two outgoing edges.
- TestInitialize_PreservesDistinctSameLineCalls: same-line distinct
  calls (printf + strdup), nested same-target calls
  (strcpy(c, strcpy(a, b))), and three-call lines must all stay visible.
  This guards against an over-eager dedup that would silently collapse
  legitimately distinct sites.

Verified end-to-end against /tmp/cpf-c-smoke (4 sites → 4 detections,
was 8) and proxygen full tree (6 unique findings, was 12).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
resolution-report previously only fed Go (and whatever Python it picked up
via callgraph.InitializeCallGraph) into the unified call graph. C/C++ call
sites — every function/method/qualified call surfaced by PR-07 and PR-08 —
were absent from the report, so the Top Unresolved tables and Failure
Breakdown showed no C/C++ data even on heavily C/C++ projects.

Mirror scan.go's buildClikeCallGraphs gating: only invoke a builder when
the parsed CodeGraph contains nodes for that language. Renamed the local
'registry' var (line 51) to 'modReg' to free the package-name namespace
for the new BuildCModuleRegistry / BuildCppModuleRegistry calls.

Verified regression-free on Python (simple_project, stdlib_chaining,
imports_test) and Go (simple_project, security_flows, type_tracking)
fixtures — bit-for-bit identical resolution-report output before/after.
On proxygen (~30k C/C++ call sites previously invisible) the report now
surfaces the expected long tail of std::move, VLOG/XLOG, gtest macros,
folly helpers, and STL container methods — all correctly classified as
external_or_unresolved (Phase 2 stdlib registry territory).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The dedicated C/C++ wiring block was duplicating scan.go's
buildClikeCallGraphs / hasLanguageNodes helpers. Reuse them directly so
both commands stay aligned and the path inherits scan_test.go's coverage
for buildClikeCallGraphs (TestBuildClikeCallGraphs_NoNodes /
CFunctionsMerged / CppFunctionsMerged / MixedProject).

Behaviour-preserving — single helper call replaces 22 lines of inline
gate-and-merge logic. Also drops the now-unused registry package import.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@shivasurya shivasurya force-pushed the shiva/cpp-python-sdk-decorators branch from cdea585 to 20cb8ce Compare May 3, 2026 13:35
@shivasurya shivasurya merged commit 90305d9 into main May 3, 2026
6 checks passed
@shivasurya shivasurya deleted the shiva/cpp-python-sdk-decorators branch May 3, 2026 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request go Pull requests that update go code python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants