Skip to content

path-wildcards: anchored trailing-* + per-endpoint port + R0040 args#323

Open
entlein wants to merge 2 commits into
kubescape:mainfrom
k8sstormcenter:upstream-pr/sbob-path-wildcards
Open

path-wildcards: anchored trailing-* + per-endpoint port + R0040 args#323
entlein wants to merge 2 commits into
kubescape:mainfrom
k8sstormcenter:upstream-pr/sbob-path-wildcards

Conversation

@entlein
Copy link
Copy Markdown

@entlein entlein commented May 15, 2026

  • analyze_endpoints: per-endpoint port handling (Internal/External merge key now includes the port so endpoints exposing the same path on different ports stay distinct)
  • analyze_opens: anchored trailing-* only matches when the wildcard is at the end of the pattern (no recursive-* accidents)
  • analyzer: DefaultCollapseConfigs accessor and FindConfigForPath value-return (used by node-agent's CEL helpers)

CompareDynamic perf note: the current implementation allocates two []string slices per call. matthyx's earlier review of PR #316 line 358 called this out and indicated he'd own a zero-alloc rewrite; remains untouched

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Fixed incorrect merging of unrelated endpoint paths when wildcard ports are present.
    • Improved deduplication logic to respect internal endpoint flags, preventing improper consolidation.
    • Enhanced path consolidation to preserve SBOM-listed paths from incorrect subsumption.
  • Performance

    • Optimized path analysis with byte-slice pooling and memoized pattern matching for faster processing.
  • New Features

    • Added per-prefix dynamic path collapse configuration for improved customization of path analysis behavior.

Review Change Stack

Signed-off-by: entlein <einentlein@gmail.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

📝 Walkthrough

Walkthrough

This PR refactors the dynamic path detector package to support per-prefix collapse configurations, improves endpoint deduplication with wildcard folding and Internal field protection, optimizes path matching with memoization and pooled buffers, and adds comprehensive post-analysis consolidation of dynamic paths.

Changes

Dynamic Path Detector Refactoring

Layer / File(s) Summary
Configuration & Threshold System
pkg/registry/file/dynamicpathdetector/types.go
Introduces CollapseConfig struct and package-level default configs with defensive-copy accessors; adds WildcardIdentifier constant; updates PathAnalyzer struct to store per-prefix override configs and explicit fallback config instead of a single threshold field.
Endpoint Deduplication with Wildcard Folding
pkg/registry/file/dynamicpathdetector/analyze_endpoints.go, pkg/registry/file/dynamicpathdetector/analyze_endpoints_internal_test.go
AnalyzeEndpoints copies range variables before taking addresses to avoid aliasing. MergeDuplicateEndpoints now uses splitEndpointPortAndPath to deterministically parse canonical :<port><path> strings, changes deduplication keys to include Internal field, sweeps and folds specific-port entries into wildcard (:0) entries that match on (path, direction, Internal), and replaces stdout error printing with debug-level logging. New test validates defensive parsing of port/path inputs.
Path Analysis Engine Refactor
pkg/registry/file/dynamicpathdetector/analyzer.go
Replaces string builder with pooled byte-slice buffers for efficiency; adds NewPathAnalyzerWithConfigs for configuration-driven threshold selection using longest-prefix matching. processSegments computes dual thresholds (insert vs collapse scope), threads collapse threshold into updateNodeStats, and introduces threshold-1 immediate wildcard emission. CompareDynamic rewritten to use memoized DP matching for multi-wildcard patterns and recursive matching otherwise, with explicit * semantics (trailing vs mid-path). New helpers include FindConfigForPath (longest-prefix config lookup), hasPrefixAtBoundary (strict prefix matching), and CollapseAdjacentDynamicIdentifiers (public string wrapper).
Open Path Consolidation
pkg/registry/file/dynamicpathdetector/analyze_opens.go
AnalyzeOpens now sorts results deterministically and applies new consolidateOpens post-processing pass that removes literals covered by wildcard/dynamic-identifier patterns, merges flags into covering patterns, and preserves SBOM-listed paths even if subsumed.
Comprehensive Test Coverage
pkg/registry/file/dynamicpathdetector/tests/analyze_endpoints_test.go, analyze_opens_test.go, consolidate_opens_test.go, coverage_test.go, benchmark_test.go, compare_dynamic_memoise_test.go, profile_test.go
Extensive new and refactored tests validating endpoint wildcard-port isolation and Internal field protection across ordering scenarios; open path consolidation subsumption, flag aggregation, and SBOM safeguards; dynamic matching semantics for consecutive wildcards, zero-segment wildcard consumption, and mid-path * zero-or-more behavior; threshold boundary correctness and multi-level collapse; defensive-copy contracts for config accessors; and performance via benchmarks (table-driven, warm/cold variants) and profiling with CPU/memory/goroutine sampling.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • matthyx

Poem

🐰 Collapsing paths with careful grace,
Wildcards fold in their rightful place,
Memoized matching speeds the way,
Per-prefix configs save the day—
Dynamic detectors now stand tall! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 64.60% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title specifically references three main changes: anchored trailing-* behavior, per-endpoint port handling, and R0040 args support, which align with the PR objectives and file summaries.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@entlein
Copy link
Copy Markdown
Author

entlein commented May 16, 2026

@CodeRabbit review this please together with kubescape/node-agent#807

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

I'll fetch both PRs to conduct a combined review.


🏁 Script executed:

#!/bin/bash
# Fetch PR `#323` details and diff
echo "=== PR `#323` (kubescape/storage) ==="
gh pr view 323 --repo kubescape/storage --json title,body,files,additions,deletions,commits
echo ""
echo "=== PR `#323` diff ==="
gh pr diff 323 --repo kubescape/storage

Length of output: 144241


Now let me fetch the companion node-agent PR.


🏁 Script executed:

#!/bin/bash
echo "=== node-agent PR `#807` ==="
gh pr view 807 --repo kubescape/node-agent --json title,body,files,additions,deletions,commits
echo ""
echo "=== node-agent PR `#807` diff ==="
gh pr diff 807 --repo kubescape/node-agent

Length of output: 29684


Let me verify the cross-repo dependency before writing the full review.


🏁 Script executed:

#!/bin/bash
echo "=== Check CompareExecArgs in storage branch ==="
gh api repos/kubescape/storage/git/trees/HEAD --jq '.tree[] | select(.path | contains("dynamicpath"))' 2>/dev/null || true

# Check if CompareExecArgs exists anywhere in the storage repo on this branch
gh api repos/kubescape/storage/contents/pkg/registry/file/dynamicpathdetector --jq '.[].name' 2>/dev/null || true

echo ""
echo "=== Check the analyzer.go on the PR branch for CompareExecArgs ==="
gh api repos/kubescape/storage/contents/pkg/registry/file/dynamicpathdetector/analyzer.go \
  --jq '.content' | base64 -d 2>/dev/null | grep -n "CompareExec" || echo "CompareExecArgs NOT found in analyzer.go"

echo ""
echo "=== Check all files in dynamicpathdetector for CompareExecArgs ==="
# List all .go files in the package
gh api "repos/kubescape/storage/contents/pkg/registry/file/dynamicpathdetector?ref=5a80da34af2f45ed2eda1e435470bcb0d1e5715e" \
  --jq '.[].name' 2>/dev/null || true

echo ""
echo "=== Check node-agent for ExecsByPath field definition ==="
gh api repos/kubescape/node-agent/contents/pkg/objectcache/applicationprofile.go \
  --jq '.content' | base64 -d 2>/dev/null | grep -n "ExecsByPath" || echo "ExecsByPath not found in applicationprofile.go"

echo ""
echo "=== Search node-agent for ExecsByPath ==="
gh search code --repo kubescape/node-agent "ExecsByPath" --json path,textMatches 2>/dev/null | head -50 || true

echo ""
echo "=== Search storage for CompareExecArgs ==="
gh search code --repo kubescape/storage "CompareExecArgs" --json path,textMatches 2>/dev/null | head -50 || true

Length of output: 632


🏁 Script executed:

#!/bin/bash
echo "=== utils.go content on PR branch ==="
gh api "repos/kubescape/storage/contents/pkg/registry/file/dynamicpathdetector/utils.go?ref=5a80da34af2f45ed2eda1e435470bcb0d1e5715e" \
  --jq '.content' | base64 -d 2>/dev/null

echo ""
echo "=== node-agent PR branch: list objectcache files ==="
gh api "repos/kubescape/node-agent/contents/pkg/objectcache?ref=5c0bba0b4fce798049cc61d32d0a337d9d83c078" \
  --jq '.[].name' 2>/dev/null || true

echo ""
echo "=== node-agent PR branch objectcache types ==="
# Try to find ExecsByPath definition
for f in applicationprofile.go types.go projected.go projectedcontainerprofile.go containerprofile.go; do
  echo "--- $f ---"
  gh api "repos/kubescape/node-agent/contents/pkg/objectcache/${f}?ref=5c0bba0b4fce798049cc61d32d0a337d9d83c078" \
    --jq '.content' 2>/dev/null | base64 -d 2>/dev/null | grep -n "ExecsByPath" || echo "not in $f"
done

echo ""
echo "=== node-agent: full objectcache listing ==="
gh api "repos/kubescape/node-agent/contents/pkg/objectcache?ref=5c0bba0b4fce798049cc61d32d0a337d9d83c078" \
  --jq '.[].name' 2>/dev/null || true

echo ""
echo "=== node-agent exec.go full content on branch ==="
gh api "repos/kubescape/node-agent/contents/pkg/rulemanager/cel/libraries/applicationprofile/exec.go?ref=5c0bba0b4fce798049cc61d32d0a337d9d83c078" \
  --jq '.content' | base64 -d 2>/dev/null | head -60

Length of output: 3221


🏁 Script executed:

#!/bin/bash
echo "=== node-agent projection_types.go ==="
gh api "repos/kubescape/node-agent/contents/pkg/objectcache/projection_types.go?ref=5c0bba0b4fce798049cc61d32d0a337d9d83c078" \
  --jq '.content' | base64 -d 2>/dev/null

echo ""
echo "=== node-agent exec.go FULL ==="
gh api "repos/kubescape/node-agent/contents/pkg/rulemanager/cel/libraries/applicationprofile/exec.go?ref=5c0bba0b4fce798049cc61d32d0a337d9d83c078" \
  --jq '.content' | base64 -d 2>/dev/null

echo ""
echo "=== storage: all files in dynamicpathdetector on PR branch ==="
gh api "repos/kubescape/storage/contents/pkg/registry/file/dynamicpathdetector?ref=5a80da34af2f45ed2eda1e435470bcb0d1e5715e" \
  --jq '.[] | select(.type=="file") | .name' 2>/dev/null

echo ""
echo "=== storage: check for exec-related file ==="
for f in exec.go exec_args.go compare_exec.go compare.go; do
  echo "--- $f ---"
  gh api "repos/kubescape/storage/contents/pkg/registry/file/dynamicpathdetector/${f}?ref=5a80da34af2f45ed2eda1e435470bcb0d1e5715e" \
    --jq '.content' 2>/dev/null | base64 -d 2>/dev/null && echo "" || echo "not found"
done

echo ""
echo "=== node-agent: containerprofilecache interface ==="
gh api "repos/kubescape/node-agent/contents/pkg/objectcache/containerprofilecache_interface.go?ref=5c0bba0b4fce798049cc61d32d0a337d9d83c078" \
  --jq '.content' | base64 -d 2>/dev/null | grep -n "ExecsByPath" || echo "Not in containerprofilecache_interface.go"

echo ""
echo "=== node-agent: objectcache common.go ==="
gh api "repos/kubescape/node-agent/contents/pkg/objectcache/common.go?ref=5c0bba0b4fce798049cc61d32d0a337d9d83c078" \
  --jq '.content' | base64 -d 2>/dev/null | grep -n "ExecsByPath" || echo "Not in common.go"

Length of output: 10791


@entlein — combined review across #323 and kubescape/node-agent#807 below.


🔴 Compilation Blockers (both PRs together won't build)

1. CompareExecArgs is missing from storage

exec.go (node-agent#807) calls dynamicpathdetector.CompareExecArgs(profileArgs, runtimeArgs) in two places, but that function does not exist in the storage package. The storage PR branch has exactly six files in dynamicpathdetector/ (analyze_endpoints.go, analyze_opens.go, analyzer.go, types.go, utils.go, and the new internal test) — none of them defines CompareExecArgs. This is a link-time error; neither PR can ship alone.

2. cp.ExecsByPath field is missing from ProjectedContainerProfile

exec.go (node-agent#807) reads cp.ExecsByPath[pathStr] and cp.ExecsByPath[execPath], but ProjectedContainerProfile in projection_types.go has no such field:

type ProjectedContainerProfile struct {
    Opens, Execs, Endpoints, Capabilities, Syscalls,
    EgressDomains, EgressAddresses,
    IngressDomains, IngressAddresses ProjectedField
    SpecHash, SyncChecksum string
    PolicyByRuleId map[string]v1beta1.RulePolicy
    CallStackTree  *callstackcache.CallStackSearchTree
    // ← no ExecsByPath
}

ProjectedField itself is All / Values / Patterns / PrefixHits / SuffixHits — no composite-key carrier for argv. The projection comment says "Composite-key carriers (flags, args, methods, ports) are out of scope for v1," so this is either an intentional slip or two more pieces — a map[string][]string field on ProjectedContainerProfile and a projection step that populates it — are being added in a companion commit that hasn't landed yet.

Recommendation: Add CompareExecArgs to the storage package (probably analyzer.go alongside CompareDynamic), and add + populate ExecsByPath map[string][]string on ProjectedContainerProfile before these PRs can be merged.


🟠 Design / Correctness

3. DefaultCollapseConfig is an exported mutable var (types.go)

defaultCollapseConfigs is correctly unexported with a defensive-copy accessor DefaultCollapseConfigs(). DefaultCollapseConfig breaks that pattern:

var DefaultCollapseConfig = CollapseConfig{Prefix: "/", Threshold: OpenDynamicThreshold}

Any caller can write dynamicpathdetector.DefaultCollapseConfig.Threshold = 1 and silently corrupt every analyzer created afterward. Consider:

var defaultCollapseConfig = CollapseConfig{...}   // unexported

func DefaultCollapseConfig() CollapseConfig {
    return defaultCollapseConfig  // value copy
}

The test suite already uses dynamicpathdetector.DefaultCollapseConfig.Threshold as a sentinel — that one reference would need updating, but the protection is worth it given the security sensitivity of threshold tuning.

4. CompareDynamic — exponential worst-case for adversarial * patterns (analyzer.go)

The recursive compareSegments for mid-path * tries every offset from 0 to len(regular) and recurses:

for i := 0; i <= len(regular); i++ {
    if compareSegments(dynamic[1:], regular[i:]) {
        return true
    }
}

For a trie-generated path this is safe because collapseAdjacentDynamic squashes consecutive * before they reach the matcher. For a user-authored ApplicationProfile the input is not guaranteed to go through the trie, so /*/*/*/* against a deep path produces O(n^k) calls. Given this runs inside R0002's hot path on every file-open syscall, a crafted profile could cause measurable latency. A memoisation table keyed on (di, ri) index pair would cap it at O(dynamic × regular).

5. wildcardKey / getEndpointKey format coupling (analyze_endpoints.go)

MergeDuplicateEndpoints constructs the lookup key inline:

wildcardKey := fmt.Sprintf(":0%s|%s|%t", pathPart, endpoint.Direction, endpoint.Internal)

This must stay in sync with getEndpointKey:

return fmt.Sprintf("%s|%s|%t", endpoint.Endpoint, endpoint.Direction, endpoint.Internal)

There is no compile-time guarantee. A small refactor would help:

func makeWildcardKey(pathPart, direction string, internal bool) string {
    return getEndpointKey(&HTTPEndpoint{
        Endpoint:  ":0" + pathPart,
        Direction: direction,
        Internal:  internal,
    })
}

6. consolidateOpens — O(n²) inner loop (analyze_opens.go)

For each non-SBOM literal open, the function iterates patternOrder (sorted by path length). For a result set with many patterns and many literal survivors the cost is O(patterns × literals). In practice the trie keeps both counts small, but a comment noting the bound would help reviewers who see the nested loop.


🟡 node-agent#807 specific

7. wasPathOpenedWithSuffix/Prefix — false-negative gap deepens

Removing the Patterns scan (the correct fix for the false-positive the CodeRabbit note flagged) means that any concrete path that the storage trie collapsed into a pattern (e.g. /var/log/pods/⋯/volumes/…) will no longer contribute to suffix/prefix answers even when the runtime path would have matched. The existing v1-scope comment in wasPathOpenedWithFlags covers flags; a parallel sentence in the suffix/prefix functions — "Patterns are skipped because wildcard text doesn't safely answer suffix/prefix questions; see wasPathOpened for wildcard-aware matching" — would prevent future readers from "restoring" the scan.

8. wasExecutedWithArgs — asymmetry between absent and empty ExecsByPath entry

The logic is:

  • ExecsByPath key absent → back-compat always-match
  • ExecsByPath key present with empty []stringCompareExecArgs([], runtimeArgs) — presumably matches only when runtimeArgs is also empty

That asymmetry is correct (absent means "no argv constraint recorded yet"; empty means "profiled with zero args"). Worth a short inline comment so a future maintainer doesn't collapse the two branches.


🟢 Minor / Informational

9. removeEndpoint modifies the backing array (analyze_endpoints.go)

return append(s[:i], s[i+1:]...)

This shifts elements left in the original array, leaving a stale pointer at s[len-1] outside the returned slice. The caller replaces newEndpoints with the return value immediately, so there is no live alias, but a comment noting the in-place mutation would protect future refactors that store intermediate references.

10. hasPrefixAtBoundary("") edge (analyzer.go)

If prefix == "" is ever added to configs, the function reaches pathPrefix[len(prefix)] == '/' which evaluates pathPrefix[0] == '/'. For absolute paths that is true, meaning "" would be treated as a root-matching prefix. None of the current configs trigger this, but a guard if prefix == "" { return true } (or a validation step in NewPathAnalyzerWithConfigs) would make the invariant explicit.


Summary table

# Repo Severity File Finding
1 storage#323 🔴 Blocker CompareExecArgs not defined anywhere in the package
2 node-agent#807 🔴 Blocker projection_types.go ExecsByPath field absent from ProjectedContainerProfile
3 storage#323 🟠 Design types.go DefaultCollapseConfig exported mutable var
4 storage#323 🟠 Design analyzer.go compareSegments exponential worst-case for adversarial * patterns
5 storage#323 🟠 Design analyze_endpoints.go wildcardKey / getEndpointKey format coupling
6 storage#323 🟡 Minor analyze_opens.go consolidateOpens O(n²) unacknowledged
7 node-agent#807 🟡 Minor open.go Pattern false-negative gap not documented
8 node-agent#807 🟡 Minor exec.go absent vs empty ExecsByPath asymmetry not commented
9 storage#323 🟢 Informational analyze_endpoints.go removeEndpoint in-place mutation
10 storage#323 🟢 Informational analyzer.go hasPrefixAtBoundary("") unguarded edge

The storage PR's core logic (anchored trailing-*, per-endpoint port isolation, CollapseConfig per-prefix, zero-alloc processSegments) is solid and the test coverage is excellent. The main gap is that the R0040 args machinery in node-agent#807 is calling into infrastructure (CompareExecArgs, ExecsByPath) that hasn't been committed yet on either side.

@entlein
Copy link
Copy Markdown
Author

entlein commented May 16, 2026

@CodeRabbit also take into account kubescape/node-agent#805 and #322 kubescape/node-agent#808 kubescape/node-agent#806 #325 #324 . advise claude how to better split the code up, so all these PRs become more atomic (cough cough) and let it know the merge order, thx

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
for pr in 805 808 806; do
  echo "=== node-agent PR #${pr} ==="
  gh pr view $pr --repo kubescape/node-agent --json title,body,files,additions,deletions 2>/dev/null
  echo ""
done

for pr in 322 324 325; do
  echo "=== storage PR #${pr} ==="
  gh pr view $pr --repo kubescape/storage --json title,body,files,additions,deletions 2>/dev/null
  echo ""
done

Length of output: 19879


@entlein — here is the full dependency map and atomicity advisory for Claude (and your team).


Dependency map across all 8 PRs

graph TD
    s323["storage `#323`\npath-wildcards core\n(WildcardIdentifier, CollapseConfig API,\nanchored-*, per-port key)"]
    s322["storage `#322`\nCompareExecArgs\n(compare_exec_args.go)"]
    s324["storage `#324`\nnetworkmatch pkg\n(DNS/IP/CIDR)"]
    s325["storage `#325`\nCollapseConfig CRD\n(+generated code, processors)"]
    na805["node-agent `#805`\nresolveExecPath\n3-arg CEL parse.get_exec_path()"]
    na806["node-agent `#806`\nnetwork wildcards CEL\n(DNS/CIDR/IP)"]
    na807["node-agent `#807`\nR0040 args consumer\n(wasExecutedWithArgs,\nCompareExecArgs call,\nExecsByPath read)"]
    na808["node-agent `#808`\nBig: ExecsByPath field +\ntamper detection (R1016) +\nnon-blocking fanout +\nNN resource support"]

    s323 -->|"WildcardIdentifier"| s322
    s323 -->|"FindConfigForPath,\nDefaultCollapseConfigs"| s325
    s322 -->|"go.mod bump"| na805
    s322 -->|"go.mod bump"| na807
    s324 -->|"go.mod bump"| na806
    s325 -->|"go.mod bump"| na808
    na808 -->|"ExecsByPath on\nProjectedContainerProfile"| na807
Loading

What's wrong with atomicity today

PR Problem
storage/#323 Bundles 3 independent concerns: (a) analyze_opens.go anchored-* bugfix, (b) analyze_endpoints.go per-port bugfix, (c) CollapseConfig API surface (DefaultCollapseConfigs(), FindConfigForPath value-return) needed by #322 and #325
node-agent/#808 Bundles 4+ independent concerns: (a) ExecsByPath field — 7 lines in projection_types.go — the sole thing blocking #807 from compiling, (b) tamper detection R1016, (c) non-blocking notification fan-out, (d) NetworkNeighborhood resource via pod labels
node-agent/#807 Blocked by two unrelated large PRs (#322 for CompareExecArgs, #808 for ExecsByPath), creating a compound dependency that can't merge until both are done
storage/#325 Mostly fine, but the cleanup.go annotation-based user-lifecycle marker is a policy change orthogonal to the CRD plumbing and could be a standalone

Recommended atomic split

storage side

storage/#323a  (new) ─ analyze_opens.go anchored-* fix
                       analyze_endpoints.go per-port key fix
                       ← pure bugfixes, zero deps, merge first

storage/#323b  (= current `#323` minus above) ─ types.go / analyzer.go
                       DefaultCollapseConfigs(), FindConfigForPath value-return,
                       NewPathAnalyzerWithConfigs
                       ← CollapseConfig API, depends on `#323a` (or squash together)

storage/#322   ✓ already atomic ─ compare_exec_args.go (depends on `#323b`)

storage/#324   ✓ already atomic ─ networkmatch pkg (independent)

storage/#325   ─ CollapseConfig CRD (depends on `#323b`)
  optional split: extract cleanup.go annotation lifecycle marker → own PR

node-agent side

node-agent/#808a  (new, tiny) ─ projection_types.go ExecsByPath field
                                 projection step that populates it
                                 ← unblocks `#807` immediately

node-agent/#807   ─ R0040 args consumer (depends on storage/#322 go.mod + `#808a`)

node-agent/#805   ✓ already atomic ─ resolveExecPath + 3-arg CEL
                                     (depends on storage/#322 go.mod bump)

node-agent/#806   ✓ already atomic ─ network wildcards CEL
                                     (depends on storage/#324 go.mod bump)

node-agent/#808b  ─ tamper detection R1016 (independent of exec args)
node-agent/#808c  ─ non-blocking notification fan-out (pure infra)
node-agent/#808d  ─ NetworkNeighborhood resource via pod labels
  (or keep `#808` as-is once `#808a` is extracted — the remaining 3 concerns
   are at least all "profile-compaction" themed)

node-agent/#808   ─ CollapseConfig projection overlay (depends on storage/#325 go.mod)

Merge order (safe, assuming the splits above)

Tier 0 — independent, merge in any order
  storage/#323a   (anchored-* + per-port bugfixes)
  storage/#324    (networkmatch package)

Tier 1 — depends on Tier 0
  storage/#323b   (CollapseConfig API surface)
  node-agent/#806 (network wildcards CEL; needs storage/#324 go.mod bump)

Tier 2 — depends on Tier 1
  storage/#322    (CompareExecArgs; needs `#323b`)
  storage/#325    (CollapseConfig CRD; needs `#323b`)
  node-agent/#808a (ExecsByPath field; needs concepts from `#323b/`#325)

Tier 3 — depends on Tier 2
  node-agent/#805  (resolveExecPath; needs storage/#322 go.mod bump)
  node-agent/#807  (R0040 consumer; needs storage/#322 go.mod + `#808a`)
  node-agent/#808b/c/d (remaining `#808` pieces; need storage/#325 go.mod bump)

Tier 4 — full profile-compaction integration
  node-agent/#808  (CollapseConfig overlay; needs storage/#325 go.mod bump)

The single highest-leverage action: extract ExecsByPath out of node-agent/#808 into a micro-PR (7 lines in projection_types.go + the matching projection step). That one move unblocks node-agent/#807 from waiting on a 1300-line PR.


Quick checklist for Claude

  1. Create storage/#323a — move analyze_opens.go (anchored-*) and analyze_endpoints.go (per-port) changes there; leave only the CollapseConfig API changes in #323b (or keep current #323 as the CollapseConfig API PR and move bugfixes to a new prerequisite).
  2. Create node-agent/#808a — extract pkg/objectcache/projection_types.go ExecsByPath field addition and its population in projection_apply.go; this is all that's needed to let #807 compile.
  3. Update node-agent/#807 to depend only on: (a) storage/allowing wildcards in exec args #322 go.mod bump, (b) #808a.
  4. Add CompareExecArgs to storage/allowing wildcards in exec args #322's export — already done there ✓, but confirm node-agent's go.mod points to the right storage tag/commit before merging #807.
  5. Document the ExecsByPath absent-vs-empty asymmetry in node-agent/#807's wasExecutedWithArgs (from the earlier review, finding #8).

🐇 🗂️

entlein pushed a commit to k8sstormcenter/storage that referenced this pull request May 16, 2026
…opy)

CodeRabbit upstream PR kubescape#323 finding #3. The package-level
`DefaultCollapseConfig` was an exported mutable var: any caller doing
`dynamicpathdetector.DefaultCollapseConfig.Threshold = 1` would
silently corrupt every analyzer constructed thereafter. The threshold
surface is security-sensitive (governs how aggressively profile paths
collapse into wildcards), so accidental cross-caller leakage is a
real risk.

Replaces the exported var with an unexported `defaultCollapseConfig`
and an accessor `DefaultCollapseConfig()` that returns a value copy.
Pattern mirrors the existing `defaultCollapseConfigs` / `DefaultCollapseConfigs()`
defensive-copy accessor for the slice form.

Updates 6 callsites across the test suite to use the accessor.
Adds TestDefaultCollapseConfig_DefensiveCopy pinning the contract: a
caller mutating the returned struct does not affect subsequent reads.
entlein pushed a commit to k8sstormcenter/storage that referenced this pull request May 16, 2026
…#6, #9, #10)

Four small docstring / defensive-guard improvements from CodeRabbit
upstream PR kubescape#323:

  #5  wildcardKey / getEndpointKey format coupling — extract
      buildEndpointKey as the single source of truth for the lookup
      key shape. Both call-sites now route through it, removing the
      "must stay in sync" risk that allowed past drift. New test
      TestBuildEndpointKey_SharedFormat pins end-to-end behaviour via
      MergeDuplicateEndpoints: a wildcard-port entry and a specific-
      port sibling collapse into one row with merged Methods.

  #6  consolidateOpens O(n²) — acknowledged in the function's
      docstring along with the threshold (≤ ~10k entries) above which
      a trie-based replacement would be worthwhile. Behaviour
      unchanged.

  #9  removeEndpoint in-place mutation — added a NOTE block in the
      docstring describing the backing-array shift, why the sole
      caller is safe today, and the swap-to-copy guidance for any
      future caller that stores intermediate slice references.

  #10 hasPrefixAtBoundary("") edge — added an explicit empty-prefix
      branch returning true. Without it the function fell through to
      pathPrefix[0] == '/', which is true for any absolute path —
      effectively treating "" as a root-matching prefix incidentally.
      The explicit branch makes the invariant load-bearing.
      New test TestHasPrefixAtBoundary_EmptyPrefix pins the surface
      via FindConfigForPath.

All existing tests still pass. Local run on this arm64 machine:
  go test ./pkg/registry/file/... ok in 0.93s
…opy, nitpicks)

Signed-off-by: entlein <einentlein@gmail.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
pkg/registry/file/dynamicpathdetector/tests/coverage_test.go (1)

715-723: ⚡ Quick win

Add an explicit “different concrete ports stay distinct” assertion.

This case validates :0 wildcard folding, but it doesn’t pin the per-endpoint-port isolation contract (e.g. :8080 vs :8443 should not collapse). Please add a sibling assertion/test for two concrete ports to prevent regressions.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/registry/file/dynamicpathdetector/tests/coverage_test.go` around lines
715 - 723, The test TestBuildEndpointKey_SharedFormat currently verifies folding
of a specific port into a wildcard but lacks an assertion that two distinct
concrete ports remain separate; update this test to include inputs like
":8080/api/v1/users" and ":8443/api/v1/users" (or another different concrete
port) passed to dynamicpathdetector.MergeDuplicateEndpoints and assert that the
merged result contains two distinct entries for those concrete ports (using
assert.Len/other assertions) to ensure concrete ports do not collapse into one
another.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/registry/file/dynamicpathdetector/tests/consolidate_opens_test.go`:
- Around line 109-113: Replace the non-fatal assertion with a fatal check so the
test stops before indexing: change the assert.Len(t, result, 1, ...) call to
require.Len(t, result, 1, ...) (keeping the same message) and leave the
subsequent checks (result[0].Path and result[0].Flags assertions) unchanged so
we don't risk panicking when accessing result[0].

In `@pkg/registry/file/dynamicpathdetector/tests/profile_test.go`:
- Around line 68-103: After successfully creating cpuFile and starting the CPU
profile (cpuFile, pprof.StartCPUProfile), add a deferred cleanup that calls
pprof.StopCPUProfile() and closes cpuFile (checking/handling the close error via
t.Logf or t.Fatalf as appropriate) so that any later t.Fatalf does not leak the
profiling state; ensure the defer is placed immediately after the successful
StartCPUProfile call and refers to the same cpuFile and pprof.StopCPUProfile/
cpuFile.Close operations used later in the test.

---

Nitpick comments:
In `@pkg/registry/file/dynamicpathdetector/tests/coverage_test.go`:
- Around line 715-723: The test TestBuildEndpointKey_SharedFormat currently
verifies folding of a specific port into a wildcard but lacks an assertion that
two distinct concrete ports remain separate; update this test to include inputs
like ":8080/api/v1/users" and ":8443/api/v1/users" (or another different
concrete port) passed to dynamicpathdetector.MergeDuplicateEndpoints and assert
that the merged result contains two distinct entries for those concrete ports
(using assert.Len/other assertions) to ensure concrete ports do not collapse
into one another.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 53c1c899-007e-4f39-b1e0-c666818f2f9d

📥 Commits

Reviewing files that changed from the base of the PR and between 2a689d8 and e13cf58.

📒 Files selected for processing (12)
  • pkg/registry/file/dynamicpathdetector/analyze_endpoints.go
  • pkg/registry/file/dynamicpathdetector/analyze_endpoints_internal_test.go
  • pkg/registry/file/dynamicpathdetector/analyze_opens.go
  • pkg/registry/file/dynamicpathdetector/analyzer.go
  • pkg/registry/file/dynamicpathdetector/tests/analyze_endpoints_test.go
  • pkg/registry/file/dynamicpathdetector/tests/analyze_opens_test.go
  • pkg/registry/file/dynamicpathdetector/tests/benchmark_test.go
  • pkg/registry/file/dynamicpathdetector/tests/compare_dynamic_memoise_test.go
  • pkg/registry/file/dynamicpathdetector/tests/consolidate_opens_test.go
  • pkg/registry/file/dynamicpathdetector/tests/coverage_test.go
  • pkg/registry/file/dynamicpathdetector/tests/profile_test.go
  • pkg/registry/file/dynamicpathdetector/types.go

Comment on lines +109 to +113
assert.Len(t, result, 1, "all three /tmp paths should consolidate to one pattern")
assert.Equal(t, "/tmp/⋯", result[0].Path)
assert.Contains(t, result[0].Flags, "O_RDONLY")
assert.Contains(t, result[0].Flags, "O_WRONLY")
assert.Contains(t, result[0].Flags, "O_RDWR")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use require.Len before indexing result[0].

If consolidation regresses, assert.Len won’t stop execution and result[0] can panic, obscuring the real failure.

Proposed fix
 import (
 	"slices"
 	"testing"
 
 	mapset "github.com/deckarep/golang-set/v2"
 	types "github.com/kubescape/storage/pkg/apis/softwarecomposition"
 	"github.com/kubescape/storage/pkg/registry/file/dynamicpathdetector"
 	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
 )
@@
-	assert.Len(t, result, 1, "all three /tmp paths should consolidate to one pattern")
+	require.Len(t, result, 1, "all three /tmp paths should consolidate to one pattern")
 	assert.Equal(t, "/tmp/⋯", result[0].Path)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/registry/file/dynamicpathdetector/tests/consolidate_opens_test.go` around
lines 109 - 113, Replace the non-fatal assertion with a fatal check so the test
stops before indexing: change the assert.Len(t, result, 1, ...) call to
require.Len(t, result, 1, ...) (keeping the same message) and leave the
subsequent checks (result[0].Path and result[0].Flags assertions) unchanged so
we don't risk panicking when accessing result[0].

Comment on lines +68 to +103
cpuFile, err := os.Create(cpuPath)
if err != nil {
t.Fatalf("create cpu profile: %v", err)
}
if err := pprof.StartCPUProfile(cpuFile); err != nil {
t.Fatalf("start cpu profile: %v", err)
}

// Force a clean GC baseline so MemStats numbers reflect only the
// measured section.
runtime.GC()
var before, after runtime.MemStats
runtime.ReadMemStats(&before)

for i := 0; i < *profileIters; i++ {
if _, err := analyzer.AnalyzePath(paths[i%len(paths)], identifier); err != nil {
pprof.StopCPUProfile()
if cerr := cpuFile.Close(); cerr != nil {
t.Logf("close cpu profile after error: %v", cerr)
}
t.Fatalf("AnalyzePath iter %d: %v", i, err)
}
}

// Read memstats immediately after the measured loop, BEFORE stopping
// the CPU profile and closing the output file. Both of those do
// non-trivial internal allocations (buffer flush, file finalization)
// that would otherwise land in `after.TotalAlloc` / `after.Mallocs`
// and inflate the reported per-call numbers — material noise for a
// zero-alloc target.
runtime.ReadMemStats(&after)

pprof.StopCPUProfile()
if err := cpuFile.Close(); err != nil {
t.Fatalf("close cpu profile: %v", err)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard CPU profile cleanup with defer after startup.

If a later t.Fatalf fires, manual cleanup is skipped. Add deferred cleanup immediately after successful StartCPUProfile/file create to avoid leaked profile state during the rest of the test run.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/registry/file/dynamicpathdetector/tests/profile_test.go` around lines 68
- 103, After successfully creating cpuFile and starting the CPU profile
(cpuFile, pprof.StartCPUProfile), add a deferred cleanup that calls
pprof.StopCPUProfile() and closes cpuFile (checking/handling the close error via
t.Logf or t.Fatalf as appropriate) so that any later t.Fatalf does not leak the
profiling state; ensure the defer is placed immediately after the successful
StartCPUProfile call and refers to the same cpuFile and pprof.StopCPUProfile/
cpuFile.Close operations used later in the test.

@entlein
Copy link
Copy Markdown
Author

entlein commented May 16, 2026

@coderabbitai review

Collective re-review request after applying your atomic-split advisory + rabbit-feedback fixes across the full SBOB upstream PR set. Each PR is now scoped to a single tier per your dependency map:

Tier 0 (opens):

Tier 2 (execs):

Tier 3 (network):

Tier 4 (tamper/sign):

Umbrellas:

Please confirm atomicity per your earlier advisory and flag any remaining cross-PR coupling we missed.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetch all PR titles/bodies/file lists for the new PRs
for pr in 811 812 809 810; do
  echo "=== node-agent PR #${pr} ==="
  gh pr view $pr --repo kubescape/node-agent --json title,body,files,additions,deletions,baseRefName,headRefName 2>/dev/null
  echo ""
done

echo "=== storage PR `#326` ==="
gh pr view 326 --repo kubescape/storage --json title,body,files,additions,deletions,baseRefName,headRefName 2>/dev/null
echo ""

# Also re-check the PRs that should have been updated
echo "=== node-agent PR `#807` (updated) ==="
gh pr view 807 --repo kubescape/node-agent --json title,body,files,additions,deletions 2>/dev/null
echo ""

echo "=== node-agent PR `#812` diff ==="
gh pr diff 812 --repo kubescape/node-agent 2>/dev/null | head -300
echo ""

echo "=== node-agent PR `#811` diff ==="
gh pr diff 811 --repo kubescape/node-agent 2>/dev/null | head -300
echo ""

Length of output: 45421


@entlein — atomicity confirmation below, with the two remaining coupling gaps flagged.


✅ Confirmed Atomic

PR Tier Verdict Notes
storage #323 0 ✅ Atomic Squash lands all three concerns together; CollapseConfig API + bugfixes correctly unified per advisory
node-agent #811 0 ✅ Atomic Two-file PR: ap.go (false-negative doc + cost entries) + open.go (Patterns-scan removal). Directly addresses finding #7 from my earlier review — thorough. See coupling note below.
storage #322 2 ✅ Atomic compare_exec_args.go only, no side effects
node-agent #812 2 ✅ Exemplary Exactly the micro-PR I asked for. extractExecsByPath nil→empty-slice convention, clone invariant contract and tests are all correct. Resolves both compilation blockers (#2 from earlier review) cleanly.
storage #325 2 ✅ Atomic CollapseConfig CRD + processors, already confirmed
node-agent #805 2 ✅ Atomic Already confirmed
storage #324 3 ✅ Atomic networkmatch package, already confirmed
node-agent #806 3 ✅ Atomic Already confirmed
node-agent #809 4 ⚠️ See below
node-agent #808 4 ⚠️ See below
storage #326 umbrella ✅ Correct Unions #323+#322+#325+#324 exactly
node-agent #810 umbrella ✅ Correct Unions all node-agent tiers per advisory

⚠️ Two Remaining Coupling Gaps

1. open.go overlap between node-agent #811 (Tier 0) and #807 (Tier 2)

Both PRs show open.go +24/-12 — identical line counts — and both include the same removal of the Patterns scan from wasPathOpenedWithSuffix / wasPathOpenedWithPrefix. If #811 merges first (correct order: Tier 0 before Tier 2), #807's open.go hunk will either conflict at merge time or silently double-apply.

Fix: remove the open.go changes from #807 entirely and declare a base-on dependency on #811. #807 should then touch only exec.go + exec_test.go (plus the open_test.go line-count churn that tests the exec-adjacent helpers).

2. tamper_alert.go in both node-agent #809 and #808

pkg/objectcache/containerprofilecache/tamper_alert.go appears in #809's file list and also in the #810 umbrella's representation of #808. If both #808 and #809 land independently, the second to merge will conflict on tamper_alert.go.

Fix: decide ownership. tamper_alert.go is logically part of the detection pipeline (#808's R1016 concern), not the signing binary (#809's cmd/sign-object concern). Move it to #808 exclusively and have #809 import whatever it needs from there; or if it genuinely belongs with the cosign adapter, move it out of #808.


Updated Merge Order (incorporating #811→#807 dependency)

graph TD
    classDef tier0 fill:rgba(72,187,120,0.5)
    classDef tier2 fill:rgba(66,153,225,0.5)
    classDef tier3 fill:rgba(237,137,54,0.5)
    classDef tier4 fill:rgba(245,101,101,0.5)
    classDef umbrella fill:rgba(159,122,234,0.5)

    s323["storage `#323`\npath-wildcards core"]:::tier0
    na811["node-agent `#811`\nwas_path_opened* CEL"]:::tier0
    s322["storage `#322`\nCompareExecArgs"]:::tier2
    na812["node-agent `#812`\nExecsByPath micro-PR"]:::tier2
    s325["storage `#325`\nCollapseConfig CRD"]:::tier2
    na805["node-agent `#805`\nresolveExecPath"]:::tier2
    na807["node-agent `#807`\nR0040 args consumer"]:::tier2
    s324["storage `#324`\nnetworkmatch"]:::tier3
    na806["node-agent `#806`\nnetwork wildcards CEL"]:::tier3
    na808["node-agent `#808`\nprojection overlay + fanout + NN"]:::tier4
    na809["node-agent `#809`\ncosign + tamper detection"]:::tier4
    s326["storage `#326`\numbrella"]:::umbrella
    na810["node-agent `#810`\numbrella"]:::umbrella

    s323 --> s322
    s323 --> s325
    s322 --> na805
    s322 --> na807
    na812 --> na807
    na811 --> na807
    s324 --> na806
    s325 --> na808
    na807 --> na810
    na806 --> na810
    na808 --> na810
    na809 --> na810
    s322 --> s326
    s325 --> s326
    s324 --> s326
    na811 --> na810
    na812 --> na810
    na805 --> na810
Loading

The only change from the previous graph: node-agent #811#807`` is now an explicit dependency edge.


Quick Fixes Checklist for Claude

  1. #807: strip open.go hunk; rebase on top of #811; open.go should be #811-only.
  2. #808 vs #809: decide tamper_alert.go ownership; one PR owns it, the other imports or has no dependency on it.
  3. #808 still bundles R1016 + non-blocking fanout + NN resource — these three are independent of each other and of #807 now that #812 extracted ExecsByPath. If timeline pressure is low, splitting them reduces blast radius; if you want to keep them together (all "profile compaction" themed), add a short justification comment in the PR body noting the deliberate bundling.

🐇 ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants