Skip to content

CRD-driven CEL admission rules with async validator#374

Open
slashben wants to merge 15 commits into
mainfrom
feature/cel-admission-rules
Open

CRD-driven CEL admission rules with async validator#374
slashben wants to merge 15 commits into
mainfrom
feature/cel-admission-rules

Conversation

@slashben
Copy link
Copy Markdown
Contributor

@slashben slashben commented May 20, 2026

Implements designs-and-proposals#3 (CRD-driven CEL admission rules) (merged).

Summary

Replaces the operator's hardcoded admission rules (R2000 Exec, R2001 Port-Forward) with CRD-defined CEL rules. Same Rules CRD schema used by the node-agent for eBPF events, extended with a new k8s-admission event type.

What's in this PR

Core pipeline

  • admission/cel/AdmissionCEL engine + AdmissionCelEvent adapter (exposes event.Kind, event.UserInfo, etc. to CEL; object/oldObject/options as top-level variables due to ext.NativeTypes() map limitation).
  • admission/rules/cel/CelRuleCreator and CelRuleEvaluator replace the legacy Go rule descriptors and factory.
  • admission/ruleswatcher/ — Dynamic informer over Rules CRDs; filters for enabled rules with at least one k8s-admission expression, calls CelRuleCreator.SyncRules.
  • admission/rulebinding/cache/ — Now takes the RuleCreator as a constructor argument and exposes RefreshRules() for the watcher to invoke.

Performance and safety (per the design doc)

  • Self-pod short-circuit — Validator parses the unverified sub claim from the projected ServiceAccount token at startup and drops any admission request whose UserInfo.Name matches. Hard guarantee against feedback loops from the operator's own writes. Falls back gracefully if the token isn't readable.
  • Kind-of-interest pre-filterCelRuleCreator extracts event.Kind == "X" patterns from loaded expressions at SyncRules time. Validator skips events for Kinds no rule could match. Conservative — falls back to wildcard for expressions with || or no Kind pin.
  • Async validator with bounded worker poolValidate() enqueues onto a 1000-slot buffered channel and returns immediately; 10 workers drain it. The API server never waits on CEL. Queue-full → drop with throttled warning log + atomic counter (exposed via DropCount() for Prometheus).

Bug fix

  • admission/rulebinding/cache/ — Both the RBCache and RulesWatcher adaptors share the dynamic watcher event stream. The RBCache now filters non-RuntimeRuleAlertBinding events before parsing, fixing the "cannot convert int64 to string" spam previously seen after adding the RulesWatcher.

Deleted

  • admission/rules/v1/r2000_exec_to_pod.go (+ test)
  • admission/rules/v1/r2001_portforward.go (+ test)
  • admission/rules/v1/factory.go
  • admission/rules/v1/rule.go

Kept: helpers.go (K8s enrichment functions), failureobject.go (GenericRuleFailure), rule_interface.go.

Validation

End-to-end on a kind cluster with the locally built image:

  • kubectl exec → R2000 fires, alert in operator logs with K8s enrichment populated
  • kubectl port-forward → R2001 fires
  • kubectl apply NetworkPolicy → R2002 fires (additional rule from the kind test), event.UserInfo.Username resolves correctly in the message expression
  • Pod CREATE with no matching rule → no alert (negative test)
  • Operator self-generated request → short-circuited
  • 309 unit tests pass

Dependencies

Dep Status
armoapi-go EventTypeK8sAdmission constant ✅ merged, v0.0.714
rulelibrary R2000/R2001 as CEL YAML TODO — separate PR
helm-charts updates (RBAC, webhook config, image bump) TODO — separate PR, marked DO NOT MERGE

Test plan

  • Unit tests: 309 pass on go test -race ./...
  • Kind-cluster e2e: exec, port-forward, NetworkPolicy creates, negative pod-create, self-request short-circuit
  • Reviewer feedback on the design doc and architecture

Summary by CodeRabbit

Release Notes

  • New Features
    • CEL-based admission rules system enabling dynamic rule definition and evaluation via Kubernetes Custom Resources.
    • Asynchronous admission event processing with worker pool pipeline for improved performance and responsiveness.
    • Flexible expression-based rules for detecting pod exec, port-forward, and other admission events.

Review Change Stack

slashben added 12 commits May 18, 2026 23:37
Introduce AdmissionCelEvent and AdmissionCelUserInfo structs that wrap
admission.Attributes into plain Go structs suitable for CEL evaluation.
NewAdmissionCelEvent extracts kind, name, namespace, operation, user
info, and unstructured object/oldObject/options maps from the attributes.

Docs-exempt: new internal package, no user-facing behavior change yet
Signed-off-by: Ben <ben@armosec.io>
CEL evaluation engine for admission webhook events. Uses cel-go native
types to register AdmissionCelEvent/AdmissionCelUserInfo, with
object/oldObject/options as top-level map variables (cel-go NativeTypes
does not support map[string]interface{} struct fields). Includes
RWMutex-protected program cache, expression compilation with nil-caching
for compile failures, and EvaluateRuleWithContext that implements AND
semantics with short-circuit and event-type filtering.

Docs-exempt: new internal package, no user-facing behavior change yet
Signed-off-by: Ben <ben@armosec.io>
Implements watcher.Adaptor that watches the Rules CRD, extracts rules
with k8s-admission expressions, and syncs them to the CelRuleCreator via
the RuleSyncer interface. Optionally calls RBCacheRefresher after sync.

Adds docs/features/cel-admission-rules.md documenting the full CEL
admission rules architecture.
Signed-off-by: Ben <ben@armosec.io>
…only

- NewCache now accepts a rules.RuleCreator parameter instead of hardcoding rulesv1.NewRuleCreator()
- Added RefreshRules() method to RBCache for rebuilding rule mappings when rules change
- Removed the rulesv1 import from cache package (no longer needed)
- Validator now logs all matching rules and sends alerts without blocking requests
- Updated cache_test.go to pass RuleCreatorMock to NewCache
Signed-off-by: Ben <ben@armosec.io>
Wire AdmissionCEL, CelRuleCreator, and RulesWatcher into main.go so the
admission controller now loads rules from CRDs at runtime. Delete the
hardcoded R2000/R2001 rule files and factory; fix CelRuleCreator to
satisfy the rules.RuleCreator interface with correct return types.
Signed-off-by: Ben <ben@armosec.io>
The dynamic watcher dispatches every event from every watched GVR to every
registered adaptor. After RulesWatcher was added alongside RBCache, the
RBCache started receiving Rules CRD events and tried to parse them as
RuntimeAlertRuleBinding, producing 'cannot convert int64 to string' errors
(the Rules CRD has integer severity; the binding has string severity).

Add an isRuleBinding kind check at the entry of each handler so RBCache
ignores cross-talk from other watched resources.

Signed-off-by: Ben <ben@armosec.io>
Before any rule evaluation, the validator now drops requests whose
UserInfo.Name matches the operator's own ServiceAccount subject. This is
a hard guarantee against positive feedback loops: the operator may write
to the API (status updates, leader election, alert exports), and we never
want those writes to re-enter the rule pipeline.

The subject is read at construction time by parsing the unverified 'sub'
claim from the projected ServiceAccount token at the standard mount path.
If reading fails (e.g. running outside a pod), the validator logs a
warning and continues without the short-circuit — degraded but functional.

Signed-off-by: Ben <ben@armosec.io>
Add a KindFilter to CelRuleCreator built from the loaded rule expressions
at SyncRules time. The validator consults the creator (via a small
KindAcceptor interface) before any evaluation work and drops events whose
Kind no rule could possibly match.

The filter is conservative: expressions with || or no event.Kind ==
constraint collapse the filter to wildcard mode (accept everything),
preserving correctness at the cost of skipping the optimization. Only
expressions tagged EventType k8s-admission are inspected; other event
types are ignored entirely by this operator.

Signed-off-by: Ben <ben@armosec.io>
The validator's Validate() now snapshots the request and enqueues onto a
buffered channel, returning nil immediately so the API server never waits
on CEL evaluation. A pool of worker goroutines (10 by default, 1000-slot
queue) drains the channel and runs the full match + alert pipeline
out-of-band.

When the queue is full the request is dropped and an atomic counter is
incremented; drops are logged at exponentially throttled intervals to
avoid flooding the operator log under burst load. The counter is exposed
via DropCount() so it can be surfaced as a Prometheus metric in a follow-up.

Workers are bound to the webhook's serverContext and exit on cancellation.
Tests cover: enqueue+process, drop-on-full, context-cancel shutdown, and
short-circuit drops (self-pod, kind filter, nil object) staying out of
the queue entirely.

Signed-off-by: Ben <ben@armosec.io>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

📝 Walkthrough

Walkthrough

This PR replaces hardcoded admission rules with a CEL-driven, CRD-backed evaluation system. It introduces a CEL expression engine, dynamic rule loading via CRD watchers, and shifts webhook validation to an async worker pipeline with self-subject detection and kind-based pre-filtering.

Changes

CEL Admission Rules

Layer / File(s) Summary
CEL Engine foundation and event modeling
admission/cel/cel.go, admission/cel/event.go, admission/cel/event_test.go, admission/cel/cel_test.go, admission/cel/integration_test.go
AdmissionCEL encapsulates a CEL environment and thread-safe program cache. AdmissionCelEvent and AdmissionCelUserInfo model admission requests and user identity. NewAdmissionCelEvent translates Kubernetes admission attributes to CEL-evaluable events. Tests validate expression evaluation, caching behavior, and end-to-end rule matching for PodExecOptions and generic Pod events.
Kind filter optimization
admission/rules/cel/kindfilter.go, admission/rules/cel/kindfilter_test.go
KindFilter statically extracts Kubernetes admission event kinds from CEL rule expressions via regex matching and disjunction detection. Enables fast pre-filtering before expensive CEL evaluation. Falls back to wildcard mode when expressions are unresolvable.
Rule creator and per-event evaluator
admission/rules/cel/creator.go, admission/rules/cel/creator_test.go, admission/rules/cel/evaluator.go, admission/rules/cel/evaluator_test.go
CelRuleCreator syncs runtime rules from CRDs, maintains a KindFilter, collects expression strings for cache management, and builds evaluators. CelRuleEvaluator processes admission events, evaluates match/message/unique-id expressions, and optionally enriches alerts with pod/container/node/workload metadata when a Kubernetes cache is available.
Rule binding cache and CRD watcher
admission/rulebinding/cache/cache.go, admission/rulebinding/cache/cache_test.go, admission/rulebinding/cache/helpers.go, admission/rulebinding/cache/helpers_test.go, admission/ruleswatcher/watcher.go, admission/ruleswatcher/watcher_test.go
Cache now accepts injected rule creator. RulesWatcher watches Rules CRDs, lists and filters enabled admission rules, extracts rule specs from CRD objects, and triggers syncs through the rule creator and cache refresh.
Webhook validator async worker pipeline
admission/webhook/validator.go, admission/webhook/validator_test.go
Validate now enqueues evaluation jobs instead of processing synchronously. Workers consume queue, evaluate rules in background, and export alerts via SendAdmissionAlert. Supports self-subject short-circuit for operator requests, kind-based pre-filter, and atomic drop counting with throttled warnings when queue is full.
Self-identity detection and main integration
admission/webhook/selfidentity.go, admission/webhook/selfidentity_test.go, main.go
New helpers extract operator's JWT sub claim from projected service account token for self-request detection. main.go wires CEL engine, rule creator, rules watcher, and async validator together; starts workers and integrates with webhook server lifecycle.
Documentation and dependency updates
docs/features/cel-admission-rules.md, go.mod, admission/exporter/http_exporter_test.go
Feature documentation describes CEL variables, architecture, and CRD format. google/cel-go added as direct dependency. Test simplified to construct GenericRuleFailure directly instead of invoking removed hardcoded rule.
Removal of hardcoded rules
admission/rules/v1/factory.go, admission/rules/v1/r2000_exec_to_pod.go, admission/rules/v1/r2000_exec_to_pod_test.go, admission/rules/v1/r2001_portforward.go, admission/rules/v1/r2001_portforward_test.go, admission/rules/v1/rule.go
Deleted old RuleCreatorImpl, hardcoded R2000/R2001 rule implementations, rule factory, and priority constants; now replaced by dynamic CRD-driven rules.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A CEL-spring blooms where rules once were carved,
CRDs now sing what hardcode had starved.
Workers dance async, filters pre-screen the tide,
Self-awareness guards the operator's pride.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.58% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main architectural change: replacing hardcoded admission rules with CRD-driven CEL rules and an asynchronous validator.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/cel-admission-rules

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

Summary:

  • License scan: success
  • Credentials scan: failure
  • Vulnerabilities scan: failure
  • Unit test: success
  • Go linting: failure

Copy link
Copy Markdown
Contributor

@matthyx matthyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have to explain to me how the async pipeline blocks admission, because now I understand that we never block any admission
ok by design we never block anything, this is only for alerting

Comment thread admission/webhook/validator.go
func NewCelRuleCreator(celEngine *admissioncel.AdmissionCEL) *CelRuleCreator {
return &CelRuleCreator{
celEngine: celEngine,
// Empty creator accepts nothing — the kind filter starts non-wildcard
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it fine to drop all admissions until we have rules? maybe we should only accept HTTP connections after we have rules?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO - yes, I don't want to hold any line up. What if the user doesn't install rules? The whole design idea is that we treat admission webhooks as best-effort event sources

Comment thread admission/rules/cel/evaluator.go Outdated
}

// Enrich with K8s details when a cache is available.
if access != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enrichK8sDetails calls rulesv1.GetControllerDetails(attrs, clientset) which does a live clientset.CoreV1().Pods(namespace).Get(...), it only makes sense for a Pod or a Pod subresource.

Suggested change
if access != nil {
if access != nil && (attrs.GetResource().Resource == "pods" || attrs.GetKind().Kind == "PodExecOptions") {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right!

Comment thread admission/cel/cel.go
// evaluation of the same expression avoids re-compilation.
type AdmissionCEL struct {
env *cel.Env
programCache map[string]cel.Program
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache is never cleared if some rules are dropped

for {
select {
case <-ctx.Done():
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should probably drain the jobs channel before exit, other jobs enqueued just before context cancellation will never get processed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right

Copy link
Copy Markdown
Contributor

@Bezbran Bezbran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see anything hurting my eyes.
At the beginning I wondered why we are not "blocking" but @matthyx asked and answered.

@matthyx matthyx moved this to Needs Reviewer in KS PRs tracking May 20, 2026
slashben added 3 commits May 24, 2026 09:03
enrichK8sDetails resolves a Pod via clientset.CoreV1().Pods(ns).Get(...)
through GetControllerDetails. That call only makes sense for Pod CRUD
and Pod subresources (exec, portforward, attach) — for other kinds
(NetworkPolicy, RoleBinding, Secret, ...) it either returns NotFound or
fetches an unrelated Pod that happens to share the request name.

Add a kind/resource gate so enrichment only runs when applicable. The
admission alert payload remains complete via buildAdmissionAlert; only
the Pod-derived RuntimeAlertK8sDetails fields are now omitted for
non-Pod events.

Reviewed-by: matthyx (PR #374 line review)

Signed-off-by: Ben <ben@armosec.io>
The AdmissionCEL program cache was append-only — once an expression had
been compiled it stayed cached forever, even if no loaded rule still
referenced it. After enough SyncRules cycles (rules added, removed,
edited) the cache would grow without bound.

Add an AdmissionCEL.RetainOnly(activeExpressions) that drops cached
programs whose expression is not in the active set. CelRuleCreator.SyncRules
collects every expression string referenced by the new rule set (each
RuleExpression, plus per-rule Message and UniqueID templates) and calls
RetainOnly after the swap.

Reviewed-by: matthyx (PR #374 line review)

Signed-off-by: Ben <ben@armosec.io>
Previously the worker's select picked between ctx.Done() and av.jobs,
returning immediately on cancel. Jobs enqueued just before cancellation
sat in the buffered channel forever.

On context cancellation each worker now switches to a non-blocking drain
pass that processes everything currently in av.jobs before exiting. Drain
uses a fresh context with a bounded timeout (10s) — the worker ctx is
already canceled and would short-circuit downstream API calls. If the
deadline is hit, remaining jobs are abandoned with a warning log
including the count.

Reviewed-by: matthyx (PR #374 line review)

Signed-off-by: Ben <ben@armosec.io>
@slashben
Copy link
Copy Markdown
Contributor Author

Pushed three fixes addressing @matthyx's review:

# Issue Commit
3 enrichK8sDetails only valid for Pod / pod subresources a2db526 — added enrichmentApplicable gate (Pod, PodExecOptions, PodPortForwardOptions, PodAttachOptions). Other kinds skip enrichment instead of fetching an unrelated Pod by name.
4 CEL program cache leaks for removed rules d3af050 — added AdmissionCEL.RetainOnly(activeExpressions). CelRuleCreator.SyncRules now collects every expression string (RuleExpression + Message + UniqueID) and prunes the cache after each swap.
5 Drain queued jobs on shutdown c2920a8 — on ctx.Done() the worker switches to a bounded (10s) non-blocking drain pass before exiting. If the drain deadline trips, remaining jobs are logged with a count and abandoned.

On the "block admissions until rules are loaded" question (creator.go:26) — I'd like to keep the current behavior. The operator is an alert-only audit path, the events are best-effort information, and gating admission on rule readiness would couple API server availability to operator startup. Cheaper to drop the cold-start events than to add a sync+blocking dance.

Tests for all three fixes are in this push. go test -race ./... is green (322 tests across 21 packages).

// compile and cache for the given rules: each RuleExpression, plus the
// per-rule Message and UniqueID templates.
func collectExpressions(rs []armotypes.RuntimeRule) []string {
out := make([]string, 0, len(rs)*3)
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
admission/exporter/http_exporter_test.go (1)

31-55: ⚡ Quick win

Test no longer validates SendAdmissionAlert behavior directly.

This currently re-implements internal assembly logic instead of exercising the production SendAdmissionAlert path, so regressions in that method can slip through. Please route this assertion through a real SendAdmissionAlert invocation (e.g., via an httptest receiver) and verify ClusterName/ClusterUID in the emitted alert.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@admission/exporter/http_exporter_test.go` around lines 31 - 55, The test
currently reconstructs RuntimeAlert internals instead of exercising
SendAdmissionAlert; update the test to call exporter.SendAdmissionAlert through
an httptest receiver (e.g., start a test HTTP server that captures the posted
alert) using the same ruleFailure input, then assert that the emitted
RuntimeAlert contains exporter.ClusterName and exporter.ClusterUID (check fields
ClusterName and ClusterUID on the received RuntimeAlert.RuntimeAlertK8sDetails).
Ensure you remove the manual k8sDetails injection and use the real
SendAdmissionAlert call path and validate the received payload.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@admission/cel/cel.go`:
- Around line 184-189: The current getOrCreateProgram in AdmissionCEL treats a
cached nil cel.Program as a success (returns nil, nil), which hides prior
compile failures; change the caching to record compile errors explicitly (e.g.,
add a programErrCache map[string]error guarded by cacheMu) and when a lookup
finds an entry, check programErrCache[expression] first and return that error if
present; when compilation fails, store the error into
programErrCache[expression] (and a nil/absent program in programCache), and when
compilation succeeds, store the Program in programCache and clear any entry in
programErrCache so subsequent lookups return the real error or the real program
rather than silently returning nil.

In `@admission/rules/cel/evaluator.go`:
- Around line 47-55: The per-binding parameter map stored via
CelRuleEvaluator.SetParameters/GetParameters (e.parameters) is never added to
the CEL evaluation context; update the code that builds the CEL evaluation
context (the method that currently constructs it from admission attributes) to
merge e.parameters into that context so CEL expressions can reference those
parameters, or if merging conflicts occur namespace the parameters (e.g., under
"params") when injecting into the context; ensure SetParameters/GetParameters
remain consistent with the injected key(s).

In `@admission/ruleswatcher/watcher.go`:
- Around line 41-49: NewRulesWatcher currently allows a nil ruleSyncer which
causes syncAllRules to panic when calling w.ruleSyncer.SyncRules; fix by
guarding that dependency: in NewRulesWatcher check if ruleSyncer == nil and
replace it with a small no-op implementation of RuleSyncer (e.g., noopRuleSyncer
implementing SyncRules returning nil), or alternatively add a nil check at the
start of syncAllRules to return early if w.ruleSyncer is nil; update references
to the RulesWatcher.ruleSyncer field and ensure syncAllRules calls are safe.

In `@admission/webhook/validator.go`:
- Around line 223-229: The loop currently may pick a job even after ctx is
canceled; to fix, check cancellation before attempting to receive from av.jobs:
test ctx.Err() (or do a non-blocking select on <-ctx.Done()) immediately before
the select that reads from av.jobs, call av.drainJobs() and return if canceled,
then proceed to receive and call av.evaluate(ctx, job.attrs); reference
av.drainJobs, av.jobs, av.evaluate and ctx.Done to locate the code to change.

In `@docs/features/cel-admission-rules.md`:
- Around line 12-20: The fenced architecture block starting with "Rules CRD 
──watch──▶  RulesWatcher  ──sync──▶  CelRuleCreator" is unlabeled and triggers
markdownlint MD040; update the opening fence to include a language identifier
(e.g., ```text) so the block becomes "```text" followed by the existing ASCII
diagram and a closing "```", ensuring consistent rendering and lint compliance.

---

Nitpick comments:
In `@admission/exporter/http_exporter_test.go`:
- Around line 31-55: The test currently reconstructs RuntimeAlert internals
instead of exercising SendAdmissionAlert; update the test to call
exporter.SendAdmissionAlert through an httptest receiver (e.g., start a test
HTTP server that captures the posted alert) using the same ruleFailure input,
then assert that the emitted RuntimeAlert contains exporter.ClusterName and
exporter.ClusterUID (check fields ClusterName and ClusterUID on the received
RuntimeAlert.RuntimeAlertK8sDetails). Ensure you remove the manual k8sDetails
injection and use the real SendAdmissionAlert call path and validate the
received payload.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c8ca0e1c-dda9-4123-a89c-e49b86efdeb7

📥 Commits

Reviewing files that changed from the base of the PR and between 37ad382 and c2920a8.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (31)
  • admission/cel/cel.go
  • admission/cel/cel_test.go
  • admission/cel/event.go
  • admission/cel/event_test.go
  • admission/cel/integration_test.go
  • admission/exporter/http_exporter_test.go
  • admission/rulebinding/cache/cache.go
  • admission/rulebinding/cache/cache_test.go
  • admission/rulebinding/cache/helpers.go
  • admission/rulebinding/cache/helpers_test.go
  • admission/rules/cel/creator.go
  • admission/rules/cel/creator_test.go
  • admission/rules/cel/evaluator.go
  • admission/rules/cel/evaluator_test.go
  • admission/rules/cel/kindfilter.go
  • admission/rules/cel/kindfilter_test.go
  • admission/rules/v1/factory.go
  • admission/rules/v1/r2000_exec_to_pod.go
  • admission/rules/v1/r2000_exec_to_pod_test.go
  • admission/rules/v1/r2001_portforward.go
  • admission/rules/v1/r2001_portforward_test.go
  • admission/rules/v1/rule.go
  • admission/ruleswatcher/watcher.go
  • admission/ruleswatcher/watcher_test.go
  • admission/webhook/selfidentity.go
  • admission/webhook/selfidentity_test.go
  • admission/webhook/validator.go
  • admission/webhook/validator_test.go
  • docs/features/cel-admission-rules.md
  • go.mod
  • main.go
💤 Files with no reviewable changes (6)
  • admission/rules/v1/r2000_exec_to_pod_test.go
  • admission/rules/v1/rule.go
  • admission/rules/v1/factory.go
  • admission/rules/v1/r2001_portforward.go
  • admission/rules/v1/r2001_portforward_test.go
  • admission/rules/v1/r2000_exec_to_pod.go

Comment thread admission/cel/cel.go
Comment on lines +184 to +189
func (c *AdmissionCEL) getOrCreateProgram(expression string) (cel.Program, error) {
c.cacheMu.RLock()
if prog, exists := c.programCache[expression]; exists {
c.cacheMu.RUnlock()
return prog, nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not downgrade cached compile failures to success on subsequent evaluations.

nil is used as a cache sentinel for compile failures, and later returned as a non-error path. After the first failure, the same invalid expression becomes silently ignored (false/empty string), which hides rule misconfiguration.

Suggested direction
- programCache map[string]cel.Program
+ type cachedProgram struct {
+   prog cel.Program
+   err  error
+ }
+ programCache map[string]cachedProgram
...
- if prog, exists := c.programCache[expression]; exists {
-   return prog, nil
+ if cp, exists := c.programCache[expression]; exists {
+   return cp.prog, cp.err
  }
...
- c.programCache[expression] = nil
- return nil, fmt.Errorf("compiling expression %q: %w", expression, issues.Err())
+ err := fmt.Errorf("compiling expression %q: %w", expression, issues.Err())
+ c.programCache[expression] = cachedProgram{err: err}
+ return nil, err

Also applies to: 204-213

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@admission/cel/cel.go` around lines 184 - 189, The current getOrCreateProgram
in AdmissionCEL treats a cached nil cel.Program as a success (returns nil, nil),
which hides prior compile failures; change the caching to record compile errors
explicitly (e.g., add a programErrCache map[string]error guarded by cacheMu) and
when a lookup finds an entry, check programErrCache[expression] first and return
that error if present; when compilation fails, store the error into
programErrCache[expression] (and a nil/absent program in programCache), and when
compilation succeeds, store the Program in programCache and clear any entry in
programErrCache so subsequent lookups return the real error or the real program
rather than silently returning nil.

Comment on lines +47 to +55
// SetParameters stores per-binding parameter overrides.
func (e *CelRuleEvaluator) SetParameters(parameters map[string]interface{}) {
e.parameters = parameters
}

// GetParameters returns the per-binding parameter overrides.
func (e *CelRuleEvaluator) GetParameters() map[string]interface{} {
return e.parameters
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Per-binding parameters are never applied during CEL evaluation.

Line 47-55 stores per-binding overrides, but Line 65-74 builds evaluation context only from admission attributes. As written, binding parameters are inert, so parameterized rules won’t evaluate as intended. Please inject e.parameters into the CEL evaluation context (or remove/defer the parameter API until it is truly supported).

Also applies to: 65-74

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@admission/rules/cel/evaluator.go` around lines 47 - 55, The per-binding
parameter map stored via CelRuleEvaluator.SetParameters/GetParameters
(e.parameters) is never added to the CEL evaluation context; update the code
that builds the CEL evaluation context (the method that currently constructs it
from admission attributes) to merge e.parameters into that context so CEL
expressions can reference those parameters, or if merging conflicts occur
namespace the parameters (e.g., under "params") when injecting into the context;
ensure SetParameters/GetParameters remain consistent with the injected key(s).

Comment on lines +41 to +49
func NewRulesWatcher(k8sClient k8sclient.K8sClientInterface, ruleSyncer RuleSyncer, cacheRefresher RBCacheRefresher) *RulesWatcher {
return &RulesWatcher{
k8sClient: k8sClient,
ruleSyncer: ruleSyncer,
cacheRefresher: cacheRefresher,
watchResources: []watcher.WatchResource{
watcher.NewWatchResource(typesv1.RuleGvr, metav1.ListOptions{}),
},
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard against nil ruleSyncer to avoid startup/runtime panic.

syncAllRules unconditionally calls w.ruleSyncer.SyncRules(...). If wiring passes nil, this panics on first event/list sync.

Suggested fix
 func NewRulesWatcher(k8sClient k8sclient.K8sClientInterface, ruleSyncer RuleSyncer, cacheRefresher RBCacheRefresher) *RulesWatcher {
+	if ruleSyncer == nil {
+		panic("ruleswatcher: ruleSyncer must not be nil")
+	}
 	return &RulesWatcher{
 		k8sClient:      k8sClient,
 		ruleSyncer:     ruleSyncer,
 		cacheRefresher: cacheRefresher,
 		watchResources: []watcher.WatchResource{
 			watcher.NewWatchResource(typesv1.RuleGvr, metav1.ListOptions{}),
 		},
 	}
 }

Also applies to: 94-94

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@admission/ruleswatcher/watcher.go` around lines 41 - 49, NewRulesWatcher
currently allows a nil ruleSyncer which causes syncAllRules to panic when
calling w.ruleSyncer.SyncRules; fix by guarding that dependency: in
NewRulesWatcher check if ruleSyncer == nil and replace it with a small no-op
implementation of RuleSyncer (e.g., noopRuleSyncer implementing SyncRules
returning nil), or alternatively add a nil check at the start of syncAllRules to
return early if w.ruleSyncer is nil; update references to the
RulesWatcher.ruleSyncer field and ensure syncAllRules calls are safe.

Comment on lines +223 to +229
select {
case <-ctx.Done():
av.drainJobs()
return
case job := <-av.jobs:
av.evaluate(ctx, job.attrs)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Prioritize cancellation before consuming more jobs.

Once ctx.Done() is ready, this select may still pick job := <-av.jobs, evaluating with an already-canceled context and dropping work due to canceled downstream calls.

Suggested fix
 func (av *AdmissionValidator) runWorker(ctx context.Context) {
 	defer av.wg.Done()
 	for {
+		// Give cancellation priority so post-cancel jobs are handled by drainCtx.
+		if ctx.Err() != nil {
+			av.drainJobs()
+			return
+		}
 		select {
 		case <-ctx.Done():
 			av.drainJobs()
 			return
 		case job := <-av.jobs:
 			av.evaluate(ctx, job.attrs)
 		}
 	}
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
select {
case <-ctx.Done():
av.drainJobs()
return
case job := <-av.jobs:
av.evaluate(ctx, job.attrs)
}
func (av *AdmissionValidator) runWorker(ctx context.Context) {
defer av.wg.Done()
for {
// Give cancellation priority so post-cancel jobs are handled by drainCtx.
if ctx.Err() != nil {
av.drainJobs()
return
}
select {
case <-ctx.Done():
av.drainJobs()
return
case job := <-av.jobs:
av.evaluate(ctx, job.attrs)
}
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@admission/webhook/validator.go` around lines 223 - 229, The loop currently
may pick a job even after ctx is canceled; to fix, check cancellation before
attempting to receive from av.jobs: test ctx.Err() (or do a non-blocking select
on <-ctx.Done()) immediately before the select that reads from av.jobs, call
av.drainJobs() and return if canceled, then proceed to receive and call
av.evaluate(ctx, job.attrs); reference av.drainJobs, av.jobs, av.evaluate and
ctx.Done to locate the code to change.

Comment on lines +12 to +20
```
Rules CRD ──watch──▶ RulesWatcher ──sync──▶ CelRuleCreator
Admission Webhook ──▶ RBCache.ListRulesForObject ──▶ CelRuleEvaluator
AdmissionCEL.EvaluateRuleWithContext
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language identifier to the fenced architecture block.

Line 12 uses an unlabeled fenced block, which triggers markdownlint (MD040) and can reduce renderer/tooling consistency.

📝 Suggested diff
-```
+```text
 Rules CRD  ──watch──▶  RulesWatcher  ──sync──▶  CelRuleCreator
                                                       │
                                                       ▼
 Admission Webhook ──▶ RBCache.ListRulesForObject ──▶ CelRuleEvaluator
                                                       │
                                                       ▼
                                                AdmissionCEL.EvaluateRuleWithContext
</details>

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.22.1)</summary>

[warning] 12-12: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @docs/features/cel-admission-rules.md around lines 12 - 20, The fenced
architecture block starting with "Rules CRD ──watch──▶ RulesWatcher ──sync──▶
CelRuleCreator" is unlabeled and triggers markdownlint MD040; update the opening
fence to include a language identifier (e.g., text) so the block becomes "text" followed by the existing ASCII diagram and a closing "```", ensuring
consistent rendering and lint compliance.


</details>

<!-- fingerprinting:phantom:poseidon:hawk -->

<!-- This is an auto-generated comment by CodeRabbit -->

@github-actions
Copy link
Copy Markdown

Summary:

  • License scan: success
  • Credentials scan: failure
  • Vulnerabilities scan: failure
  • Unit test: success
  • Go linting: failure

@slashben slashben requested a review from matthyx May 24, 2026 07:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Needs Reviewer

Development

Successfully merging this pull request may close these issues.

4 participants