path-wildcards: anchored trailing-* + per-endpoint port + R0040 args#323
path-wildcards: anchored trailing-* + per-endpoint port + R0040 args#323entlein wants to merge 2 commits into
Conversation
Signed-off-by: entlein <einentlein@gmail.com>
📝 WalkthroughWalkthroughThis PR refactors the dynamic path detector package to support per-prefix collapse configurations, improves endpoint deduplication with wildcard folding and Internal field protection, optimizes path matching with memoization and pooled buffers, and adds comprehensive post-analysis consolidation of dynamic paths. ChangesDynamic Path Detector Refactoring
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@CodeRabbit review this please together with kubescape/node-agent#807 |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chainI'll fetch both PRs to conduct a combined review. 🏁 Script executed: Length of output: 144241 Now let me fetch the companion node-agent PR. 🏁 Script executed: Length of output: 29684 Let me verify the cross-repo dependency before writing the full review. 🏁 Script executed: Length of output: 632 🏁 Script executed: Length of output: 3221 🏁 Script executed: Length of output: 10791
🔴 Compilation Blockers (both PRs together won't build)1.
|
| # | Repo | Severity | File | Finding |
|---|---|---|---|---|
| 1 | storage#323 | 🔴 Blocker | — | CompareExecArgs not defined anywhere in the package |
| 2 | node-agent#807 | 🔴 Blocker | projection_types.go |
ExecsByPath field absent from ProjectedContainerProfile |
| 3 | storage#323 | 🟠 Design | types.go |
DefaultCollapseConfig exported mutable var |
| 4 | storage#323 | 🟠 Design | analyzer.go |
compareSegments exponential worst-case for adversarial * patterns |
| 5 | storage#323 | 🟠 Design | analyze_endpoints.go |
wildcardKey / getEndpointKey format coupling |
| 6 | storage#323 | 🟡 Minor | analyze_opens.go |
consolidateOpens O(n²) unacknowledged |
| 7 | node-agent#807 | 🟡 Minor | open.go |
Pattern false-negative gap not documented |
| 8 | node-agent#807 | 🟡 Minor | exec.go |
absent vs empty ExecsByPath asymmetry not commented |
| 9 | storage#323 | 🟢 Informational | analyze_endpoints.go |
removeEndpoint in-place mutation |
| 10 | storage#323 | 🟢 Informational | analyzer.go |
hasPrefixAtBoundary("") unguarded edge |
The storage PR's core logic (anchored trailing-*, per-endpoint port isolation, CollapseConfig per-prefix, zero-alloc processSegments) is solid and the test coverage is excellent. The main gap is that the R0040 args machinery in node-agent#807 is calling into infrastructure (CompareExecArgs, ExecsByPath) that hasn't been committed yet on either side.
|
@CodeRabbit also take into account kubescape/node-agent#805 and #322 kubescape/node-agent#808 kubescape/node-agent#806 #325 #324 . advise claude how to better split the code up, so all these PRs become more atomic (cough cough) and let it know the merge order, thx |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 19879
Dependency map across all 8 PRsgraph TD
s323["storage `#323`\npath-wildcards core\n(WildcardIdentifier, CollapseConfig API,\nanchored-*, per-port key)"]
s322["storage `#322`\nCompareExecArgs\n(compare_exec_args.go)"]
s324["storage `#324`\nnetworkmatch pkg\n(DNS/IP/CIDR)"]
s325["storage `#325`\nCollapseConfig CRD\n(+generated code, processors)"]
na805["node-agent `#805`\nresolveExecPath\n3-arg CEL parse.get_exec_path()"]
na806["node-agent `#806`\nnetwork wildcards CEL\n(DNS/CIDR/IP)"]
na807["node-agent `#807`\nR0040 args consumer\n(wasExecutedWithArgs,\nCompareExecArgs call,\nExecsByPath read)"]
na808["node-agent `#808`\nBig: ExecsByPath field +\ntamper detection (R1016) +\nnon-blocking fanout +\nNN resource support"]
s323 -->|"WildcardIdentifier"| s322
s323 -->|"FindConfigForPath,\nDefaultCollapseConfigs"| s325
s322 -->|"go.mod bump"| na805
s322 -->|"go.mod bump"| na807
s324 -->|"go.mod bump"| na806
s325 -->|"go.mod bump"| na808
na808 -->|"ExecsByPath on\nProjectedContainerProfile"| na807
What's wrong with atomicity today
Recommended atomic splitstorage sidenode-agent sideMerge order (safe, assuming the splits above)
Quick checklist for Claude
🐇 🗂️ |
…opy) CodeRabbit upstream PR kubescape#323 finding #3. The package-level `DefaultCollapseConfig` was an exported mutable var: any caller doing `dynamicpathdetector.DefaultCollapseConfig.Threshold = 1` would silently corrupt every analyzer constructed thereafter. The threshold surface is security-sensitive (governs how aggressively profile paths collapse into wildcards), so accidental cross-caller leakage is a real risk. Replaces the exported var with an unexported `defaultCollapseConfig` and an accessor `DefaultCollapseConfig()` that returns a value copy. Pattern mirrors the existing `defaultCollapseConfigs` / `DefaultCollapseConfigs()` defensive-copy accessor for the slice form. Updates 6 callsites across the test suite to use the accessor. Adds TestDefaultCollapseConfig_DefensiveCopy pinning the contract: a caller mutating the returned struct does not affect subsequent reads.
…#6, #9, #10) Four small docstring / defensive-guard improvements from CodeRabbit upstream PR kubescape#323: #5 wildcardKey / getEndpointKey format coupling — extract buildEndpointKey as the single source of truth for the lookup key shape. Both call-sites now route through it, removing the "must stay in sync" risk that allowed past drift. New test TestBuildEndpointKey_SharedFormat pins end-to-end behaviour via MergeDuplicateEndpoints: a wildcard-port entry and a specific- port sibling collapse into one row with merged Methods. #6 consolidateOpens O(n²) — acknowledged in the function's docstring along with the threshold (≤ ~10k entries) above which a trie-based replacement would be worthwhile. Behaviour unchanged. #9 removeEndpoint in-place mutation — added a NOTE block in the docstring describing the backing-array shift, why the sole caller is safe today, and the swap-to-copy guidance for any future caller that stores intermediate slice references. #10 hasPrefixAtBoundary("") edge — added an explicit empty-prefix branch returning true. Without it the function fell through to pathPrefix[0] == '/', which is true for any absolute path — effectively treating "" as a root-matching prefix incidentally. The explicit branch makes the invariant load-bearing. New test TestHasPrefixAtBoundary_EmptyPrefix pins the surface via FindConfigForPath. All existing tests still pass. Local run on this arm64 machine: go test ./pkg/registry/file/... ok in 0.93s
…opy, nitpicks) Signed-off-by: entlein <einentlein@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
pkg/registry/file/dynamicpathdetector/tests/coverage_test.go (1)
715-723: ⚡ Quick winAdd an explicit “different concrete ports stay distinct” assertion.
This case validates
:0wildcard folding, but it doesn’t pin the per-endpoint-port isolation contract (e.g.:8080vs:8443should not collapse). Please add a sibling assertion/test for two concrete ports to prevent regressions.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/registry/file/dynamicpathdetector/tests/coverage_test.go` around lines 715 - 723, The test TestBuildEndpointKey_SharedFormat currently verifies folding of a specific port into a wildcard but lacks an assertion that two distinct concrete ports remain separate; update this test to include inputs like ":8080/api/v1/users" and ":8443/api/v1/users" (or another different concrete port) passed to dynamicpathdetector.MergeDuplicateEndpoints and assert that the merged result contains two distinct entries for those concrete ports (using assert.Len/other assertions) to ensure concrete ports do not collapse into one another.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/registry/file/dynamicpathdetector/tests/consolidate_opens_test.go`:
- Around line 109-113: Replace the non-fatal assertion with a fatal check so the
test stops before indexing: change the assert.Len(t, result, 1, ...) call to
require.Len(t, result, 1, ...) (keeping the same message) and leave the
subsequent checks (result[0].Path and result[0].Flags assertions) unchanged so
we don't risk panicking when accessing result[0].
In `@pkg/registry/file/dynamicpathdetector/tests/profile_test.go`:
- Around line 68-103: After successfully creating cpuFile and starting the CPU
profile (cpuFile, pprof.StartCPUProfile), add a deferred cleanup that calls
pprof.StopCPUProfile() and closes cpuFile (checking/handling the close error via
t.Logf or t.Fatalf as appropriate) so that any later t.Fatalf does not leak the
profiling state; ensure the defer is placed immediately after the successful
StartCPUProfile call and refers to the same cpuFile and pprof.StopCPUProfile/
cpuFile.Close operations used later in the test.
---
Nitpick comments:
In `@pkg/registry/file/dynamicpathdetector/tests/coverage_test.go`:
- Around line 715-723: The test TestBuildEndpointKey_SharedFormat currently
verifies folding of a specific port into a wildcard but lacks an assertion that
two distinct concrete ports remain separate; update this test to include inputs
like ":8080/api/v1/users" and ":8443/api/v1/users" (or another different
concrete port) passed to dynamicpathdetector.MergeDuplicateEndpoints and assert
that the merged result contains two distinct entries for those concrete ports
(using assert.Len/other assertions) to ensure concrete ports do not collapse
into one another.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 53c1c899-007e-4f39-b1e0-c666818f2f9d
📒 Files selected for processing (12)
pkg/registry/file/dynamicpathdetector/analyze_endpoints.gopkg/registry/file/dynamicpathdetector/analyze_endpoints_internal_test.gopkg/registry/file/dynamicpathdetector/analyze_opens.gopkg/registry/file/dynamicpathdetector/analyzer.gopkg/registry/file/dynamicpathdetector/tests/analyze_endpoints_test.gopkg/registry/file/dynamicpathdetector/tests/analyze_opens_test.gopkg/registry/file/dynamicpathdetector/tests/benchmark_test.gopkg/registry/file/dynamicpathdetector/tests/compare_dynamic_memoise_test.gopkg/registry/file/dynamicpathdetector/tests/consolidate_opens_test.gopkg/registry/file/dynamicpathdetector/tests/coverage_test.gopkg/registry/file/dynamicpathdetector/tests/profile_test.gopkg/registry/file/dynamicpathdetector/types.go
| assert.Len(t, result, 1, "all three /tmp paths should consolidate to one pattern") | ||
| assert.Equal(t, "/tmp/⋯", result[0].Path) | ||
| assert.Contains(t, result[0].Flags, "O_RDONLY") | ||
| assert.Contains(t, result[0].Flags, "O_WRONLY") | ||
| assert.Contains(t, result[0].Flags, "O_RDWR") |
There was a problem hiding this comment.
Use require.Len before indexing result[0].
If consolidation regresses, assert.Len won’t stop execution and result[0] can panic, obscuring the real failure.
Proposed fix
import (
"slices"
"testing"
mapset "github.com/deckarep/golang-set/v2"
types "github.com/kubescape/storage/pkg/apis/softwarecomposition"
"github.com/kubescape/storage/pkg/registry/file/dynamicpathdetector"
"github.com/stretchr/testify/assert"
+ "github.com/stretchr/testify/require"
)
@@
- assert.Len(t, result, 1, "all three /tmp paths should consolidate to one pattern")
+ require.Len(t, result, 1, "all three /tmp paths should consolidate to one pattern")
assert.Equal(t, "/tmp/⋯", result[0].Path)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/registry/file/dynamicpathdetector/tests/consolidate_opens_test.go` around
lines 109 - 113, Replace the non-fatal assertion with a fatal check so the test
stops before indexing: change the assert.Len(t, result, 1, ...) call to
require.Len(t, result, 1, ...) (keeping the same message) and leave the
subsequent checks (result[0].Path and result[0].Flags assertions) unchanged so
we don't risk panicking when accessing result[0].
| cpuFile, err := os.Create(cpuPath) | ||
| if err != nil { | ||
| t.Fatalf("create cpu profile: %v", err) | ||
| } | ||
| if err := pprof.StartCPUProfile(cpuFile); err != nil { | ||
| t.Fatalf("start cpu profile: %v", err) | ||
| } | ||
|
|
||
| // Force a clean GC baseline so MemStats numbers reflect only the | ||
| // measured section. | ||
| runtime.GC() | ||
| var before, after runtime.MemStats | ||
| runtime.ReadMemStats(&before) | ||
|
|
||
| for i := 0; i < *profileIters; i++ { | ||
| if _, err := analyzer.AnalyzePath(paths[i%len(paths)], identifier); err != nil { | ||
| pprof.StopCPUProfile() | ||
| if cerr := cpuFile.Close(); cerr != nil { | ||
| t.Logf("close cpu profile after error: %v", cerr) | ||
| } | ||
| t.Fatalf("AnalyzePath iter %d: %v", i, err) | ||
| } | ||
| } | ||
|
|
||
| // Read memstats immediately after the measured loop, BEFORE stopping | ||
| // the CPU profile and closing the output file. Both of those do | ||
| // non-trivial internal allocations (buffer flush, file finalization) | ||
| // that would otherwise land in `after.TotalAlloc` / `after.Mallocs` | ||
| // and inflate the reported per-call numbers — material noise for a | ||
| // zero-alloc target. | ||
| runtime.ReadMemStats(&after) | ||
|
|
||
| pprof.StopCPUProfile() | ||
| if err := cpuFile.Close(); err != nil { | ||
| t.Fatalf("close cpu profile: %v", err) | ||
| } |
There was a problem hiding this comment.
Guard CPU profile cleanup with defer after startup.
If a later t.Fatalf fires, manual cleanup is skipped. Add deferred cleanup immediately after successful StartCPUProfile/file create to avoid leaked profile state during the rest of the test run.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/registry/file/dynamicpathdetector/tests/profile_test.go` around lines 68
- 103, After successfully creating cpuFile and starting the CPU profile
(cpuFile, pprof.StartCPUProfile), add a deferred cleanup that calls
pprof.StopCPUProfile() and closes cpuFile (checking/handling the close error via
t.Logf or t.Fatalf as appropriate) so that any later t.Fatalf does not leak the
profiling state; ensure the defer is placed immediately after the successful
StartCPUProfile call and refers to the same cpuFile and pprof.StopCPUProfile/
cpuFile.Close operations used later in the test.
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 45421
✅ Confirmed Atomic
|
analyze_endpoints: per-endpoint port handling (Internal/External merge key now includes the port so endpoints exposing the same path on different ports stay distinct)analyze_opens: anchored trailing-*only matches when the wildcard is at the end of the pattern (no recursive-*accidents)analyzer:DefaultCollapseConfigsaccessor andFindConfigForPathvalue-return (used by node-agent's CEL helpers)CompareDynamic perf note: the current implementation allocates two []string slices per call. matthyx's earlier review of PR #316 line 358 called this out and indicated he'd own a zero-alloc rewrite; remains untouched
Summary by CodeRabbit
Release Notes
Bug Fixes
Performance
New Features