feat(doctor): add --watch mode for continuous diagnostics by itssagarK · Pull Request #63 · optiqor/kerno

itssagarK · 2026-05-13T07:07:54Z

This PR adds a --watch (-w) flag to the doctor command, transforming it from a one-shot diagnostic into a continuous monitoring engine with a live, "sticky" terminal UI and real-time finding diffs.

Fixes #28

Finding Fingerprinting: Implemented Fingerprint() in internal/doctor/finding.go touniquely identify findings across cycles (Rule + Signal + Metric + Process).

Diff Engine: Created internal/doctor/diff.go to categorize findings as New (prefix +), Ongoing (with duration), or Resolved (prefix - with strikethrough). Resolved findings are kept visible for 30s before expiry.

Zero-Dependency UI: Implemented an ANSI-based terminal loop in internal/cli/doctor_watch.go toavoid adding heavy TUI dependencies like bubbletea, ensuring high portability while still delivering a "sticky" live view.

System Pulse: Added a throughput header showing aggregate events/sec (syscalls, scheduling, disk, etc.) to confirm pipeline health.

Testing

N/A — pure docs/refactor
sudo ./bin/bpf-verify --read 5s confirms 6/6 programs still load
./scripts/verify.sh passes (or specific phase: ./scripts/verify.sh quality)

Checklist

PR title follows Conventional Commits (feat(scope): subject)
All commits are DCO-signed (git commit -s)
No unrelated changes pulled in
Documentation updated where user-visible behavior changed
Added/updated tests for new code paths
If a new doctor rule, paired with a chaos scenario in scripts/verify.sh

Signed-off-by: Sagar <sagarkapri321@gmail.com>

btwshivam

the patch doesn't compile. you used // ... unchanged comments as literal code and deleted real fields. rebase on main and re-apply only the new watch-mode additions, then make check.

btwshivam · 2026-05-13T12:21:52Z

-  # Enable AI analysis
-  sudo kerno doctor --ai`,
-		Args: cobra.NoArgs,
+		// ... (Use, Short, Long, Example unchanged)


this // ... (Use, Short, Long, Example unchanged) placeholder is a literal source-level deletion. you actually removed Use, Short, Long, Example, and Args: cobra.NoArgs from the cobra command. same anti-pattern at three more places in this file:

line ~74: // ... (rest of flags unchanged) deleted four flag registrations (--output, --ai, --no-ai, --quiet), leaving the useAI, noAI, quiet vars declared-but-unused (Go compile error).

inside runDoctor: // ... (initial logic unchanged) deleted thresholds, analyzer, opts.duration defaulting, so the next lines reference undefined names.

end of runDoctor: // ... (rest of function unchanged) followed by a } closes the function early, and for { ... } below ends up at file scope.

that's why Lint and Test are red. rebase on main and re-apply only the genuinely new lines: the watch field in the var block, the BoolVarP for --watch, the opts.watch field, and the if opts.watch { ... } branch inside runDoctor. then make check locally before pushing.

btwshivam · 2026-05-13T12:21:52Z

+// the "same" issue. It includes the rule, signal, metric, and process
+// (if any), but excludes the severity and current value.
+func (f *Finding) Fingerprint() string {
+	return fmt.Sprintf("%s|%s|%s|%s|%s", f.Rule, f.Signal, f.Metric, f.Process, f.Title)


including Title in the fingerprint is fragile. doctor rule titles often embed the current metric value, like syscall p99 is 240ms vs syscall p99 is 250ms next cycle. that flips an Ongoing finding into Resolved+New every tick, defeating the whole point of --watch.

key on rule + signal + process. drop Title and Metric (Metric also varies for the same reason):

func (f *Finding) Fingerprint() string { return fmt.Sprintf("%s|%s|%s", f.Rule, f.Signal, f.Process) }

btwshivam · 2026-05-13T12:21:52Z

+		renderWatchFrame(os.Stdout, style, cycle, start, diffed, signals, false)
+
+		// 6. Wait for next interval.
+		waitDuration := opts.interval - time.Since(cycleStart)


opts.interval - time.Since(cycleStart) goes negative whenever the collection window exceeds the interval. with the defaults (duration 30s, watch interval 10s), waitDuration is -20s, so the wait is skipped and the next cycle starts immediately. effective cadence becomes duration, not interval.

user's mental model for kerno doctor --watch --interval 10s is "show me a fresh view every 10s". simplest fix: when --watch is set, use a short collection window (say 2s) and let interval drive the wall-clock cadence. or clamp opts.duration = min(opts.duration, opts.interval / 2) at the top of runDoctorWatch.

btwshivam · 2026-05-13T12:21:52Z

+	build collectorBuildResult,
+	opts doctorOpts,
+	logger *slog.Logger,
+) error {


issue #28 explicitly requires --watch --output json to emit NDJSON (one JSON object per cycle) so the watch loop can pipe into Loki / Logstash. this function ignores opts.output and always renders the terminal UI, so kerno doctor --watch --output json | jq just prints ANSI gibberish.

split at the top:

if opts.output == "json" { return runDoctorWatchJSON(ctx, engine, build, opts, logger) } // existing terminal-UI loop below

the JSON loop is much simpler than the TTY loop. one json.NewEncoder(os.Stdout) and emit the report per cycle.

btwshivam · 2026-05-13T12:21:52Z

+// 3. If finding exists in prev but not in curr, it is StatusResolved.
+// 4. StatusResolved findings are kept for keepResolvedDuration (e.g. 30s)
+//    before being dropped.
+func Diff(prev []DiffedFinding, curr []Finding, keepResolvedDuration time.Duration) []DiffedFinding {


no tests for Diff(). it's the core of the feature and the most subtle piece of logic in the PR. add internal/doctor/diff_test.go with table-driven cases for:

new finding (not in prev)

ongoing finding (same fingerprint two cycles in a row, FirstSeen preserved)

resolved finding within keep window (kept in output with StatusResolved)

resolved finding past keep window (dropped from output)

these are roughly 40 lines of test code and they catch the Title-in-fingerprint issue immediately.

feat(doctor): add --watch mode for continuous diagnostics

d9c9f33

Signed-off-by: Sagar <sagarkapri321@gmail.com>

itssagarK requested a review from btwshivam as a code owner May 13, 2026 07:07

github-actions Bot added the area/doctor Diagnostic engine and rules label May 13, 2026

btwshivam requested changes May 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(doctor): add --watch mode for continuous diagnostics#63

feat(doctor): add --watch mode for continuous diagnostics#63
itssagarK wants to merge 1 commit into
optiqor:mainfrom
itssagarK:feat/watch-mode

itssagarK commented May 13, 2026

Uh oh!

btwshivam left a comment

Uh oh!

btwshivam May 13, 2026

Uh oh!

btwshivam May 13, 2026

Uh oh!

btwshivam May 13, 2026

Uh oh!

btwshivam May 13, 2026

Uh oh!

btwshivam May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

itssagarK commented May 13, 2026

Testing

Checklist

Uh oh!

btwshivam left a comment

Choose a reason for hiding this comment

Uh oh!

btwshivam May 13, 2026

Choose a reason for hiding this comment

Uh oh!

btwshivam May 13, 2026

Choose a reason for hiding this comment

Uh oh!

btwshivam May 13, 2026

Choose a reason for hiding this comment

Uh oh!

btwshivam May 13, 2026

Choose a reason for hiding this comment

Uh oh!

btwshivam May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants