feat(python-sdk): @c_rule / @cpp_rule decorators with language scoping#678
Merged
shivasurya merged 5 commits intomainfrom May 3, 2026
Merged
feat(python-sdk): @c_rule / @cpp_rule decorators with language scoping#678shivasurya merged 5 commits intomainfrom
shivasurya merged 5 commits intomainfrom
Conversation
SafeDep Report SummaryNo dependency changes detected. Nothing to scan. This report is generated by SafeDep Github App |
Code Pathfinder Security ScanNo security issues detected.
Powered by Code Pathfinder |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #678 +/- ##
==========================================
+ Coverage 85.41% 85.43% +0.01%
==========================================
Files 187 187
Lines 27276 27278 +2
==========================================
+ Hits 23298 23305 +7
+ Misses 3086 3082 -4
+ Partials 892 891 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This was referenced May 3, 2026
Owner
Author
This was referenced May 3, 2026
Owner
Author
Merge activity
|
Mirrors the @go_rule contract for C and C++ security rules. The decorators
inject language="c"/"cpp" into dataflow IR so DataflowExecutor scopes
flows() to the right language; pure calls() rules remain language-agnostic
(same Gap 1 / Gap 4 documented contract as @go_rule).
- codepathfinder.{c,cpp}_decorators with metadata dataclasses, atexit
auto-output, registry helpers, and clear_*_rules for test isolation.
- codepathfinder.{c,cpp}_ir compilers emit JSON IR with the language tag
in both rule metadata and matcher dict.
- python-sdk/rules/ shims preserve the existing import path style.
- dsl/loader.go decorator detector now recognises @c_rule / @cpp_rule
alongside @go_rule for early file filtering.
- Unit tests cover registration, metadata, language injection contract
(dataflow vs call_matcher), JSON serialisation, and registry isolation
between the C and C++ registries.
Verified end-to-end against tiny C/C++ smoke projects, sglang
(337 C++ functions, 9.6k call sites) and proxygen (11.8k C++ functions,
66k call sites): @c_rule / @cpp_rule rule files load via the DSL loader,
rules execute on parsed C/C++ functions, and detections include correct
file/line/class-qualified function names.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The result-collection step in Initialize iterated each worker's local edges and called codeGraph.AddEdge(edge.From, edge.To). The worker had already populated edge.From.OutgoingEdges via localGraph.AddEdge, and node pointers are shared across the local and global graphs, so the collector's second AddEdge appended a fresh Edge struct onto the same OutgoingEdges slice — every entry ended up duplicated. Why only C/C++ rules were affected: Go's buildParentMap and Python's path collapse duplicates via map insertion, but the C/C++ call-graph builders (PR-07/08) iterate fnNode.OutgoingEdges linearly, emitting one CallSiteInternal per edge. That surfaced as 2× detections on every @c_rule / @cpp_rule run. The fix transfers Edge structs from localGraph.Edges to codeGraph.Edges without re-attaching them to OutgoingEdges. Two regression tests guard the contract: - TestInitialize_NoDuplicateOutgoingEdges: a function with two distinct calls must end up with exactly two outgoing edges. - TestInitialize_PreservesDistinctSameLineCalls: same-line distinct calls (printf + strdup), nested same-target calls (strcpy(c, strcpy(a, b))), and three-call lines must all stay visible. This guards against an over-eager dedup that would silently collapse legitimately distinct sites. Verified end-to-end against /tmp/cpf-c-smoke (4 sites → 4 detections, was 8) and proxygen full tree (6 unique findings, was 12). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
resolution-report previously only fed Go (and whatever Python it picked up via callgraph.InitializeCallGraph) into the unified call graph. C/C++ call sites — every function/method/qualified call surfaced by PR-07 and PR-08 — were absent from the report, so the Top Unresolved tables and Failure Breakdown showed no C/C++ data even on heavily C/C++ projects. Mirror scan.go's buildClikeCallGraphs gating: only invoke a builder when the parsed CodeGraph contains nodes for that language. Renamed the local 'registry' var (line 51) to 'modReg' to free the package-name namespace for the new BuildCModuleRegistry / BuildCppModuleRegistry calls. Verified regression-free on Python (simple_project, stdlib_chaining, imports_test) and Go (simple_project, security_flows, type_tracking) fixtures — bit-for-bit identical resolution-report output before/after. On proxygen (~30k C/C++ call sites previously invisible) the report now surfaces the expected long tail of std::move, VLOG/XLOG, gtest macros, folly helpers, and STL container methods — all correctly classified as external_or_unresolved (Phase 2 stdlib registry territory). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The dedicated C/C++ wiring block was duplicating scan.go's buildClikeCallGraphs / hasLanguageNodes helpers. Reuse them directly so both commands stay aligned and the path inherits scan_test.go's coverage for buildClikeCallGraphs (TestBuildClikeCallGraphs_NoNodes / CFunctionsMerged / CppFunctionsMerged / MixedProject). Behaviour-preserving — single helper call replaces 22 lines of inline gate-and-merge logic. Also drops the now-unused registry package import. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cdea585 to
20cb8ce
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.




Summary
Adds C and C++ rule decorators that mirror the existing
@go_rule/@python_rulecontract. Rules can now be authored with@c_rule(...)/@cpp_rule(...); the decorators injectlanguage="c"/language="cpp"into dataflow IR soDataflowExecutorscopesflows()rules to the right language. Purecalls()matchers stay language-agnostic — same documented contract as@go_rule.This is the SDK glue half of the C/C++ language support track. Example security rules will land in a follow-up PR.
What's in this PR
codepathfinder.{c,cpp}_decorators—@c_rule/@cpp_rulewith full metadata (id/name/severity/cwe/cve/owasp/tags/message), atexit auto-output forpython3 rule.pysmoke runs, andclear_*_rules()helpers for test isolation.codepathfinder.{c,cpp}_ir—compile_c_rules()/compile_cpp_rules()emit JSON IR with"language"in both rule metadata and matcher dict.python-sdk/rules/shims preserve the existing import path style (from rules.c_decorators import c_rulekeeps working).dsl/loader.godecorator detector now recognises@c_rule(and@cpp_rule(so the loader can early-filter rule files like it does for@go_rule(.Validation
End-to-end smoke runs (binary + locally-installed
pip install -e python-sdk):sgl-kernel/csrc/cpuHTTP1xCodec::generateBody,CAresResolver::Query::queryCallback,TestAsyncTransport::WriteEvent::newEvent,H3DatagramAsyncSocket::deliverDatagram.*memcpy) correctly match qualified C++ calls (std::memcpy) — same contract as@go_rule.Test plan
python -m pytest python-sdk/tests/— 441 passedgo test ./sast-engine/dsl/— passgolangci-lint run ./dsl/...— 0 issuesgradle buildGo— clean@c_rule+calls()) produces detections with correct file/line/function@cpp_rulerules load and detect, function FQNs accurate