Skip to content

feat: add kosli evaluate input subcommand#743

Merged
tooky merged 26 commits intomainfrom
evaluate-input
Apr 2, 2026
Merged

feat: add kosli evaluate input subcommand#743
tooky merged 26 commits intomainfrom
evaluate-input

Conversation

@tooky
Copy link
Copy Markdown
Contributor

@tooky tooky commented Mar 28, 2026

Summary

  • Adds kosli evaluate input — evaluate a local JSON file (or stdin) against a Rego policy, with no API dependency
  • Enables local policy development and testing without needing a running Kosli server
  • Input shape is opaque to the command — the policy defines what it expects

Prompted by this discussion where we realised kosli evaluate trail always hits the API, so there's no way to iterate on policies locally. Dan raised conftest as the alternative, but our tooling should support this natively — Rego is our language choice, and local testing should be turnkey.

In response to Alex's comparison of Rego vs pipeline controls — as more controls move into kosli evaluate, a fast local feedback loop becomes essential.

Usage

# evaluate a local JSON file against a policy:
kosli evaluate input \
    --input-file trail-data.json \
    --policy policy.rego

# read from stdin:
cat trail-data.json | kosli evaluate input --policy policy.rego

# inspect what the policy sees:
kosli evaluate input \
    --input-file trail-data.json \
    --policy policy.rego \
    --show-input --output json

Test plan

  • All new tests pass (go test -v -run TestEvaluateInputCommandTestSuite ./cmd/kosli/)
  • loadInput(io.Reader) unit tests pass
  • make lint clean
  • Manual: kosli evaluate input --help shows expected output
  • Manual: evaluate a saved --show-input JSON against a policy

🤖 Generated with Claude Code

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 28, 2026

Claude finished @tooky's task in 4m 27s —— View job


PR Review: kosli evaluate input subcommand

  • Run static analysis (build, vet, mod tidy)
  • Read changed files and review code
  • Post review feedback

Static Analysis

All three checks pass cleanly: go build ./..., go vet ./..., and go mod tidy with no diff. No issues here.

Code Review

Overall: This is a well-structured PR. The command follows established codebase patterns, the TDD slice approach is clearly documented in TODO.md, and the review feedback from earlier rounds has been addressed thoughtfully. A few observations:

What's good

  • Clean reuse of evaluateAndPrintResult() — no duplication of the evaluation/output pipeline.
  • Embedding commonEvaluateOptions with hidden flow/attestations flags is a pragmatic solution to share flag registration without exposing API-specific flags.
  • cmd.InOrStdin() makes stdin testable via cmd.SetIn(), and the test harness already sets root.SetIn(new(bytes.Buffer)) (testHelpers.go:66), so the empty-stdin test reliably hits loadInput rather than term.IsTerminal.
  • Good test coverage — 11 command-level test cases covering allow, deny, missing flags, invalid input, policy validation errors, JSON output, and show-input. Plus standalone loadInput unit tests.
  • Help text examples include the fixture-capture workflow (jq '.input'), directly addressing Alex's feedback about documenting how to produce the input file.

Issues to address

  1. invalid.rego test case missing — There are test fixtures for no-package-policy.rego, no-allow-rule.rego, and deny-no-violations.rego, but the existing invalid.rego (syntax error fixture) isn't exercised through evaluate input. A test case proving that a malformed Rego file produces a clear error would round out the policy validation coverage.

  2. evaluateTrail.go / evaluateTrails.go changes not shown in diff — The changed files list mentions +6/-1 in each, but the TODO mentions "Remove 'using OPA' from all evaluate command long descriptions". Worth confirming the OPA references were cleaned up consistently — I see evaluate.go is clean, but couldn't verify the trail commands' long descriptions changed correctly since the diff wasn't available against origin/main.

Minor / non-blocking

  1. loadInputFromFile close error handling (evaluateInput.go:105-108) — The named return with deferred close-error capture is correct but slightly unusual for this codebase. It works fine; just noting it's a more careful pattern than most of the surrounding code uses.

  2. Test name accuracy (evaluateInput_test.go:48) — "missing --input-file reads from stdin (empty stdin fails)" — this test actually exercises the loadInput error path (since bytes.Buffer isn't *os.File, the terminal check is skipped). The name could be slightly more precise, e.g., "missing --input-file with empty stdin fails to parse". Very minor.

  3. Alex's --data suggestion — Tracked externally per his comment, not a blocker for this PR. Worth a TODO.md entry if the team wants to pursue it.

Summary

Clean, well-tested addition that fills a real gap in the local policy development workflow. The code follows existing patterns, static analysis is clean, and the iterative review process addressed all prior feedback. The one actionable item is adding a test case for invalid.rego syntax errors through the evaluate input command path.

@tooky
Copy link
Copy Markdown
Contributor Author

tooky commented Mar 28, 2026

Addressed the review feedback in 841836c:

1. TODO.md duplicate — Removed the unchecked duplicate Slice 3 line.

2. Parent evaluate command description — Broadened the Short from "Evaluate Kosli trail data" to "Evaluate data" and rewrote the Long to mention both API-backed trail evaluation and local input evaluation. kosli evaluate --help now reads:

Use evaluate trail or evaluate trails to fetch data from Kosli and evaluate it.
Use evaluate input to evaluate a local JSON file or stdin without any API calls.

3. Flag registration duplication — Left as-is per the review (non-blocking). Three flags registered by hand is simpler than embedding commonEvaluateOptions and hiding flow/attestations.

4. --input-file in long desc — Left as-is. The flag is already documented in the flag help text and the "When --input-file is omitted" paragraph.

@AlexKantor87
Copy link
Copy Markdown
Contributor

From Alex — feedback on kosli evaluate input

This is a strong addition that directly addresses a gap we've been working around in the agentic SDLC demo. We run 10 control gates in CI, each calling kosli evaluate trails against Rego policies — and every policy iteration currently requires a live Kosli org with real trail data. Our kosli-evaluate skill already documents a workaround using raw opa eval, but having this native in the CLI is much better.

How we'd use this immediately:

  1. Local policy development — capture a fixture with evaluate trail --show-input, then iterate with evaluate input instead of needing API access and an OPA install
  2. CI optimization — fetch trail data once and evaluate all 10 control policies against the same JSON instead of 10 independent API round-trips
  3. Pipeline debugging — test policy fixes against captured data before pushing and re-running the full pipeline

A few suggestions:

  • Add a fixture capture example to the help text. The examples show --input-file trail-data.json but don't show how to produce that file. Something like: kosli evaluate trail TRAIL --policy allow-all.rego --flow FLOW --show-input --output json > trail-data.json would make the workflow self-documenting.

  • Clarify the --show-input wrapper. When you capture with --show-input --output json, the result has an outer structure with allow, violations, and input keys. Should evaluate input receive the full output, or just the .input portion? The examples would benefit from being explicit about this.

  • Consider --data as a follow-up slice. Since this is the offline path, it's a natural place to add external config support (budget thresholds, expected counts, etc.). We currently hardcode these values in Rego because kosli evaluate has no --data flag. Not a blocker for this PR, but worth tracking.

Implementation looks clean — good reuse of evaluateAndPrintResult(), solid test coverage, and the stdin piping support is a nice touch.

tooky added a commit that referenced this pull request Apr 1, 2026
OPA is an implementation detail. The other commands say 'Rego policy'
consistently — align all three evaluate subcommands to match.

Addresses Dan's review comment on PR #743.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
tooky added a commit that referenced this pull request Apr 1, 2026
Tests for missing `package policy`, missing `allow` rule, and
`allow = false` without a `violations` rule. Documents the expected
error messages and behaviour through `evaluate input`.

Addresses Dan's review comments on PR #743.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
tooky added a commit that referenced this pull request Apr 1, 2026
Show how to capture trail data with --show-input and extract .input
with jq for local policy iteration. Clarify that the input file should
contain the raw JSON object, not the --show-input wrapper.

Addresses Alex's review suggestions on PR #743.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tooky tooky enabled auto-merge (squash) April 1, 2026 16:21
tooky and others added 7 commits April 1, 2026 17:29
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
tooky and others added 13 commits April 1, 2026 17:29
Avoid restating the short description; the second sentence now adds
new information (file or stdin) rather than repeating.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
loadInputFromFile now delegates to loadInput after opening the file,
eliminating the duplicated JSON unmarshal logic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove duplicate Slice 3 entry in TODO.md
- Broaden parent evaluate command description to cover both
  API-backed trail evaluation and local input evaluation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
OPA is an implementation detail. The other commands say 'Rego policy'
consistently — align all three evaluate subcommands to match.

Addresses Dan's review comment on PR #743.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests for missing `package policy`, missing `allow` rule, and
`allow = false` without a `violations` rule. Documents the expected
error messages and behaviour through `evaluate input`.

Addresses Dan's review comments on PR #743.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Show how to capture trail data with --show-input and extract .input
with jq for local policy iteration. Clarify that the input file should
contain the raw JSON object, not the --show-input wrapper.

Addresses Alex's review suggestions on PR #743.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the os.Stdin global with an injected io.Reader parameter,
passed via Cobra's InOrStdin(). This makes the stdin path testable
with controlled input via cmd.SetIn().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove flag registration duplication by embedding the shared struct.
Hide flow and attestations flags that don't apply to evaluate input.
Move RequireFlags out of addFlags so each command specifies its own
required flags independently.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Avoids the double-allocation from io.ReadAll + json.Unmarshal.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
name: "missing --input-file reads from stdin (empty stdin fails)",
cmd: "evaluate input --policy testdata/policies/allow-all.rego",
goldenRegex: `failed to parse input:`,
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test at line 48–51 ("missing --input-file reads from stdin (empty stdin fails)") — since the terminal detection was added in Slice 5, does this test still exercise the loadInput error path, or does it now hit the term.IsTerminal check first? In CI the stdin is likely not a terminal so it should still reach loadInput, but the test name and golden regex (failed to parse input) might not match the actual error in all environments.

Consider adding a comment noting this assumption, or explicitly using cmd.SetIn() with an empty bytes.Buffer to guarantee the loadInput path is tested regardless of CI environment.

When --input-file is omitted and stdin is a terminal, return a clear
error instead of silently blocking. Uses golang.org/x/term to detect
the terminal.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment on lines +48 to +51
name: "missing --input-file reads from stdin (empty stdin fails)",
cmd: "evaluate input --policy testdata/policies/allow-all.rego",
goldenRegex: `failed to parse input:`,
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: This test implicitly relies on CI stdin not being a terminal. If CI ever allocates a PTY, it would hit the term.IsTerminal check instead of the loadInput error path. Consider explicitly providing an empty reader via a helper that calls cmd.SetIn(&bytes.Buffer{}) to guarantee this tests the loadInput path regardless of environment.

tooky and others added 2 commits April 1, 2026 17:36
go mod tidy resolved conflict markers left from rebase and updated
golang.org/x/term to v0.41.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Set root.SetIn(new(bytes.Buffer)) in executeCommandC so tests get an
empty buffer as stdin rather than inheriting the process stdin. This
ensures the empty-stdin test exercises loadInput reliably regardless
of whether CI stdin is a terminal.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
name: "deny without violations rule returns DENIED with no violation messages",
cmd: "evaluate input --input-file testdata/evaluate/trail-input.json --policy testdata/policies/deny-no-violations.rego",
goldenRegex: `RESULT:\s+DENIED`,
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a test case for invalid.rego (the syntax-error fixture that already exists in testdata/policies/). This would round out the policy validation coverage alongside the no-package-policy, no-allow-rule, and deny-no-violations cases:

{
    wantError:   true,
    name:        "syntactically invalid policy returns error",
    cmd:         "evaluate input --input-file testdata/evaluate/trail-input.json --policy testdata/policies/invalid.rego",
    goldenRegex: `failed to evaluate policy`,
},

(Adjust the expected regex to match the actual error from evaluate.Evaluate for malformed Rego.)

@tooky tooky merged commit 8687eaa into main Apr 2, 2026
17 checks passed
@tooky tooky deleted the evaluate-input branch April 2, 2026 04:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants