feat: docs/prds/self-improving-feedback.md by jonit-dev · Pull Request #101 · jonit-dev/night-watch-cli

jonit-dev · 2026-05-04T01:33:27Z

Closes #97

Night Watch manages this draft PR automatically so progress is preserved across retries and timeouts.

Status labels:

nw:partial: implementation is in progress and intentionally incomplete
nw:resumable: resume this PR before starting new work
nw:ready-review: implementation is complete and ready for review

jonit-dev · 2026-05-04T01:33:42Z

🤖 Implemented by Codex

github-actions · 2026-05-04T01:34:04Z

DiffGuard AI Analysis

AI Review Summary

🏆 Overall Score: 82/100

This PR implements a comprehensive self-improving feedback loop with outcome recording, pattern detection, and prompt augmentation. The implementation is well-structured with strong test coverage, but has some performance considerations and opportunities for improved observability.

✅ Key Strengths

Comprehensive Feature Implementation: Complete feedback loop from outcome recording → pattern detection → prompt augmentation with proper database schema, migrations, and API routes.
Robust Secret Redaction: Multiple layers of secret detection and redaction across API keys, tokens, passwords, and nested metadata structures.
Strong Test Coverage: Well-organized test suites covering outcome parsing, pattern analysis, prompt augmentation, storage, and API routes with proper database cleanup.

⚠️ Areas for Improvement

Silent Failure Observability: The try-catch blocks that swallow errors (e.g., applyProjectFeedbackPromptEnv, recordJobOutcome) should log warnings at minimum to aid debugging without affecting exit behavior.
Pattern Analysis Query Overhead: analyzeFeedbackOutcome makes multiple sequential queryOutcomes calls to compute streaks; consider consolidating or caching recent outcomes per project/job type.
Config Proliferation: The feedback config is manually added to multiple form states (Settings, Scheduling, JobsTab); consider a centralized helper or schema-driven approach.

🐛 Bugs Found

Bug Name	Affected Files	Description	Confidence
None found	—	—	—

📋 Issues Found

Issue Type	Issue Name	Affected Components	Description	Impact/Severity
Maintainability	Silent Error Swallowing	`packages/cli/src/commands/*.ts`, `packages/core/src/feedback/prompt-augmenter.ts`	Errors caught in feedback recording are silently ignored, making it difficult to diagnose why feedback might not be working.	Medium
Performance	Multiple Streak Queries	`packages/core/src/feedback/pattern-analyzer.ts`	`countRecentStreaks` and `countSuccessStreak` both query outcomes independently; could be optimized with a single query or in-memory caching.	Low
Testing	Edge Case Coverage	`packages/core/src/__tests__/feedback/pattern-analyzer.test.ts`	Missing tests for edge cases like `maxActiveAugmentations: 0`, pattern expiration, and confidence threshold boundaries.	Low

🔚 Conclusion

This is a solid, well-architected feature addition with proper separation of concerns and comprehensive test coverage. The main follow-ups are adding observability for silent failures and considering query optimization for pattern analysis at scale. No blocking bugs found—merge-ready with minor improvements recommended.

Analyzed using z-ai/glm-5

github-actions · 2026-05-04T01:34:05Z

DiffGuard AI Analysis

AI Review Summary

🏆 Overall Score: 88/100

This PR implements a comprehensive self-improving feedback loop system with outcome persistence, pattern detection, and prompt augmentation. The implementation is thorough and well-structured with strong test coverage, but has a few minor consistency and maintainability concerns.

✅ Key Strengths

Comprehensive Feature Implementation: The feedback loop spans the full stack—outcome recording in all CLI commands, SQLite persistence, pattern analysis, prompt augmentation, server API routes, and a full dashboard UI component.
Strong Test Coverage: Includes targeted tests for outcome parsing, pattern detection, prompt augmentation, session outcome storage, feedback API routes, and dashboard UI behavior with realistic fixture data.
Security-Conscious Design: Implements robust secret redaction in both outcome text and metadata (API keys, tokens, private keys) before persistence, preventing sensitive data leakage.
Graceful Failure Handling: Outcome recording is consistently wrapped in try-catch blocks with explicit comments that persistence failures must not affect command exit behavior.

⚠️ Areas for Improvement

Duplicate Function Definition: applyProjectFeedbackPromptEnv is defined identically in both review.ts and run.ts. Extract to shared/feedback.ts alongside recordJobOutcome to avoid duplication and improve maintainability.
Missing Feedback Section in Web Settings Forms: toFormState and toAutomationForm include hardcoded feedback defaults, but JobsTab.tsx already has a proper UI for feedback configuration. Consider removing the hardcoded defaults and relying on the loaded config, or add tests that verify the feedback section renders correctly in Settings.

📋 Issues Found

Issue Type	Issue Name	Affected Components	Description	Impact/Severity
Maintainability	Duplicate Feedback Function	`commands/review.ts`, `commands/run.ts`	`applyProjectFeedbackPromptEnv` is duplicated across two files; should be centralized.	Medium
Testing	Config Form Coverage Gap	`pages/Settings.tsx`	Feedback config fields in JobsTab UI are not covered by existing Settings test fixtures.	Low

🔚 Conclusion

This is a strong, well-designed PR with comprehensive cross-cutting implementation and appropriate safeguards. The feedback loop architecture is sound and the secret redaction is thorough. Addressing the duplicated function is a straightforward refactor that would improve maintainability. Ready for merge with minor cleanup.

Analyzed using z-ai/glm-5

github-actions · 2026-05-04T01:34:14Z

DiffGuard AI Analysis

AI Review Summary

🏆 Overall Score: 82/100

This PR implements a comprehensive self-improving feedback loop system with outcome recording, pattern detection, and prompt augmentation. The implementation is thorough with good test coverage, but has a few correctness and performance concerns to address.

✅ Key Strengths

Comprehensive Feature Implementation: Complete feedback loop from outcome recording through pattern detection to prompt augmentation across all job types.
Robust Secret Redaction: Multiple regex patterns handle various secret formats (API keys, tokens, private keys) before persisting to database.
Well-Structured Test Coverage: Tests for outcome parsing, pattern analysis, repository behavior, API routes, and UI components with proper isolation.
Graceful Degradation: Outcome persistence wrapped in try-catch with explicit comment that failures must not change command exit behavior.

⚠️ Areas for Improvement

Database State in Test Setup: The test at run.test.ts:496-540 manually calls closeDb() and resetRepositories() at the start of the test case, but the afterEach hook also calls these. Consider whether this double-reset is intentional or could mask initialization issues.
Redundant Config Loading: applyProjectFeedbackPromptEnv in run.ts:350 and review.ts:144 calls loadConfig(projectDir) even though the config was already loaded earlier in each command. This could cause issues if config files change mid-execution.
Missing Logging for Silent Failures: All outcome recording errors are silently caught with empty catch blocks. Consider adding debug-level logging to aid troubleshooting when outcome persistence fails.

🐛 Bugs Found

Bug Name	Affected Files	Description	Confidence
Missing Defaults for Partial Feedback Config	`packages/cli/src/commands/shared/feedback.ts:36-42`	If `config.feedback` is defined but has missing properties, `getFeedbackAnalysisOptions` uses `config.feedback ?? {...}` which only provides defaults when feedback is entirely undefined, not for partial configs.	Medium 🟡

📋 Issues Found

Issue Type	Issue Name	Affected Components	Description	Impact/Severity
Performance	Multiple Sequential Queries on Outcome	`packages/core/src/feedback/pattern-analyzer.ts`	`analyzeFeedbackOutcome` executes multiple sequential queries (`listPatterns`, `queryOutcomes` for streak counting, `listActiveAugmentations`) which could increase latency at scale.	Medium
Testing	Test Double-Resets Database	`packages/cli/src/__tests__/commands/run.test.ts:498-500`	Test manually resets database state before the test body, but `afterEach` already does this—could indicate test isolation concerns.	Low
Maintainability	Duplicated Outcome Recording Logic	Multiple command files	While `recordJobOutcome` exists, some commands (review, run) use inline repository calls instead of the shared helper.	Low
Security	Potential Timing Attack on Signature Comparison	`packages/core/src/feedback/outcome-parser.ts`	Failure signatures are compared directly for pattern matching; if signatures ever include sensitive data (unlikely given the redaction), timing attacks could theoretically apply.	Low

🔚 Conclusion

This is a strong implementation of a complex feature with good architecture and comprehensive test coverage. The main concern is the potential bug with partial feedback config handling, which should be verified. The performance concerns are manageable at current scale but worth tracking. Recommend addressing the config defaults issue before merge.

Analyzed using z-ai/glm-5

github-actions · 2026-05-04T01:34:21Z

DiffGuard AI Analysis

AI Review Summary

🏆 Overall Score: 82/100

This PR implements a comprehensive self-improving feedback loop system with outcome recording, pattern detection, and prompt augmentation. The implementation is thorough with good test coverage, but has some code duplication and architectural concerns that should be addressed.

✅ Key Strengths

Comprehensive Feature Implementation: Complete feedback loop from outcome recording → pattern detection → prompt augmentation → dashboard visualization, covering all major job types.
Strong Test Coverage: Well-structured tests across outcome parsing, pattern analysis, repository operations, API routes, and UI components with clear test scenarios.
Security-Conscious Design: Automatic secret redaction for API keys, tokens, and credentials in both outcome parsing and repository layers.

⚠️ Areas for Improvement

Duplicate Function Definitions: applyProjectFeedbackPromptEnv is defined identically in both run.ts and review.ts. This should be consolidated into shared/feedback.ts to avoid maintenance drift.
Config Reload Overhead: The applyProjectFeedbackPromptEnv function calls loadConfig(projectDir) internally, reloading config from disk instead of accepting it as a parameter. This loses any CLI overrides applied to the config and adds unnecessary I/O.
Secret Redaction Duplication: Secret redaction logic exists in both outcome-parser.ts and session-outcome.repository.ts with slightly different implementations. Should consolidate into a shared utility.

🐛 Bugs Found

Bug Name	Affected Files	Description	Confidence
CLI Overrides Ignored in Feedback Prompt	`packages/cli/src/commands/run.ts`, `packages/cli/src/commands/review.ts`	`applyProjectFeedbackPromptEnv` reloads config via `loadConfig(projectDir)`, discarding any CLI overrides that were applied earlier in the command handler. This could cause feedback prompts to use stale configuration values.	Medium 🟡

📋 Issues Found

Issue Type	Issue Name	Affected Components	Description	Impact/Severity
Maintainability	Duplicated Prompt Env Function	`packages/cli/src/commands/run.ts`, `packages/cli/src/commands/review.ts`	Identical `applyProjectFeedbackPromptEnv` function defined in two places	Medium
Performance	Repeated Database Queries	`packages/server/src/routes/feedback.routes.ts`	`buildWindowSummary` queries outcomes multiple times per window; could be consolidated into fewer queries	Low
Maintainability	Hardcoded Magic Numbers	`packages/core/src/feedback/pattern-analyzer.ts`	Constants like `RECENT_WINDOW_MS`, `STALE_WINDOW_MS` are hardcoded without configuration options	Low

🔚 Conclusion

This is a strong implementation of a complex feedback loop feature. The code is well-structured and thoroughly tested. The main concerns are code duplication (especially the applyProjectFeedbackPromptEnv function) and the config reload issue which could lead to subtle bugs with CLI overrides. These should be addressed before merge, but the overall architecture is solid.

Analyzed using z-ai/glm-5

github-actions · 2026-05-04T01:34:42Z

DiffGuard AI Analysis

AI Review Summary

🏆 Overall Score: 82/100

This PR implements a comprehensive self-improving feedback loop system with structured outcome storage, pattern detection, and prompt augmentation. The implementation is thorough and well-architected, with a few areas for improvement around code duplication and performance considerations.

✅ Key Strengths

Comprehensive Secret Redaction: Multiple layers of protection covering API keys, tokens, and auth headers in both outcome-parser.ts and session-outcome.repository.ts, ensuring sensitive data never persists.
Clean Architecture: Well-organized module structure with clear separation between outcome parsing, pattern analysis, and prompt augmentation layers. Repository pattern with dependency injection is properly implemented.
Resilient Error Handling: All feedback recording operations are wrapped in try-catch blocks with explicit comments that outcome persistence must not change command exit behavior.
Thorough Test Coverage: Good integration tests covering outcome storage, pattern activation, prompt rendering, and API routes with proper database setup/teardown.

⚠️ Areas for Improvement

Code Duplication: applyProjectFeedbackPromptEnv is duplicated between run.ts and review.ts. This should be consolidated into shared/feedback.ts alongside recordJobOutcome.
Synchronous Pattern Analysis: Pattern detection runs synchronously after each outcome is recorded via analyzeFeedbackOutcome(), which could add latency to high-frequency job completions. Consider debouncing or making async.
Magic Numbers Without Context: Constants like MAX_PATTERN_TEXT_LENGTH = 180 and RECENT_WINDOW_MS lack comments explaining their derivation. These thresholds impact pattern quality and should be documented.

🐛 Bugs Found

Bug Name	Affected Files	Description	Confidence
None found	N/A	No concrete bugs identified. The implementation handles edge cases well with proper try-catch wrapping and secret redaction.	N/A

📋 Issues Found

Issue Type	Issue Name	Affected Components	Description	Impact/Severity
Maintainability	Duplicate Feedback Prompt Function	`run.ts`, `review.ts`	`applyProjectFeedbackPromptEnv` is copy-pasted between both command files instead of being shared.	Medium
Performance	Synchronous Pattern Analysis	`pattern-analyzer.ts`, `feedback.ts`	`analyzeFeedbackOutcome` runs synchronously, querying recent outcomes and upserting patterns on every job completion.	Low
Maintainability	Undocumented Threshold Constants	`pattern-analyzer.ts`	Magic numbers like `RECENT_WINDOW_MS`, `STALE_WINDOW_MS`, and `MAX_PATTERN_TEXT_LENGTH` lack justification comments.	Low
Testing	Missing Edge Case Tests	`feedback/*.test.ts`	No tests for error scenarios like database locked, circular metadata, or malformed script results.	Low

🔚 Conclusion

This is a strong implementation of a feedback loop system with excellent security practices and clean architecture. The main concern is code duplication that should be refactored before merge. Performance considerations around synchronous pattern analysis are minor but worth tracking for high-frequency deployments. No critical bugs found.

Analyzed using z-ai/glm-5

github-actions · 2026-05-04T01:35:16Z

DiffGuard AI Analysis

AI Review Summary

🏆 Overall Score: 85/100

This PR introduces a comprehensive self-improving feedback loop system that records job outcomes, detects repeated failure patterns, and generates prompt augmentations to improve future runs. The implementation is thorough with solid test coverage and appropriate database schema, though it has some maintainability and optimization opportunities.

✅ Key Strengths

Comprehensive Coverage: Outcome recording is consistently integrated across all 9 job types (executor, reviewer, qa, audit, analytics, planner, pr-resolver, merger, slicer) with proper error isolation.
Security-Conscious Design: Thorough secret redaction across multiple patterns (API keys, tokens, private keys) in both outcome-parser.ts and the repository layer, with tests verifying redaction.
Database Architecture: Proper SQLite schema with well-designed indexes (idx_session_outcomes_lookup, idx_feedback_patterns_lookup, idx_prompt_augmentations_active) supporting efficient time-based and status queries.
Failure Classification: Robust categorization logic in classifyFailure() handling 9 failure categories (typescript, eslint, test, ci, review-score, rate-limit, timeout, conflict, unknown) with signature normalization for pattern matching.

⚠️ Areas for Improvement

Duplicated Function Implementation: applyProjectFeedbackPromptEnv() is implemented identically in both run.ts and review.ts. Consolidate into shared/feedback.ts to prevent divergence and reduce code duplication.
Query Optimization: buildWindowSummary() in feedback.routes.ts executes separate querySummary() calls for each job type (8+ queries per request). Consider a single aggregate query with GROUP BY job_type to reduce database roundtrips.
Secret Pattern Duplication: SECRET_TEXT_PATTERNS regex arrays are defined independently in outcome-parser.ts and session-outcome.repository.ts. Extract to a shared module to ensure consistent redaction behavior.

📋 Issues Found

Issue Type	Issue Name	Affected Components	Description	Impact/Severity
Maintainability	Duplicated Feedback Env Function	`run.ts`, `review.ts`	Same `applyProjectFeedbackPromptEnv()` implementation exists in both files with identical logic.	Medium
Performance	Sequential Job Type Queries	`feedback.routes.ts` → `buildWindowSummary()`	Runs `querySummary()` for each valid job type separately, creating 8+ sequential DB calls per window.	Medium
Maintainability	Duplicated Secret Patterns	`outcome-parser.ts`, `session-outcome.repository.ts`	Same `SECRET_TEXT_PATTERNS` regex array defined in two locations, risking inconsistent updates.	Low
Testing	Outcome Parser Edge Cases	`outcome-parser.test.ts`	Missing tests for complex ANSI stripping, nested metadata redaction, and path extraction edge cases.	Low

🔚 Conclusion

This is a strong, well-structured implementation of a sophisticated feedback loop feature. The core logic is sound with appropriate safeguards for secret handling and error isolation. The identified issues are maintainability and optimization concerns that should be addressed before long-term maintenance but do not block immediate merge. Consolidating the duplicated function and secret patterns would improve code health significantly.

Analyzed using z-ai/glm-5

- Generated by Night Watch QA agent - UI tests: 1 passing, 0 failing - API tests: 3 passing, 0 failing - Artifacts: screenshots, videos Co-Authored-By: Claude <noreply@anthropic.com>

jonit-dev · 2026-05-04T02:24:40Z

Night Watch QA Report

Changes Classification

Type: UI + API
Files changed: 52

Test Results

UI Tests (Playwright)

Status: All passing
Tests: 1 test in 1 file
Command: yarn --cwd web playwright test tests/e2e/qa/qa-feedback-dashboard.spec.ts --reporter=list

Screenshots

Video Recording

Video artifact committed to qa-artifacts/qa-feedback-dashboard.webm — view in the PR's file changes.

API Tests

Status: All passing
Tests: 3 tests in 1 file
Command: yarn --cwd packages/server vitest run src/__tests__/server/feedback-validation.test.ts --reporter=verbose

Additional Verification

Commit hook ran turbo run verify: 5 successful tasks

Night Watch QA Agent

🧪 QA run by Codex

Test User added 4 commits May 3, 2026 17:46

feat(core): add feedback outcome storage

3fd4904

feat(feedback): record outcomes and expose API

4184938

feat(feedback): add pattern-driven prompt guidance

9c643cf

fix(feedback): close PRD coverage gaps

f18a2cb

jonit-dev marked this pull request as ready for review May 4, 2026 01:33

jonit-dev mentioned this pull request May 4, 2026

docs/prds/self-improving-feedback.md #97

Closed

Test User and others added 4 commits May 3, 2026 18:58

fix: restore feedback build compatibility

f903a61

fix: order cli build after workspace packages

e86b5f3

test: make reviewer smoke fixtures hermetic

ce37473

test(qa): add automated QA tests for PR changes

30f50d2

- Generated by Night Watch QA agent - UI tests: 1 passing, 0 failing - API tests: 3 passing, 0 failing - Artifacts: screenshots, videos Co-Authored-By: Claude <noreply@anthropic.com>

jonit-dev added e2e-validated PR acceptance requirements validated by e2e/integration tests ready-to-merge PR is ready to merge labels May 4, 2026

Merge origin/master into feedback PR

7134ca6

jonit-dev merged commit 7f28a4f into master May 5, 2026
5 checks passed

jonit-dev deleted the night-watch/97-docs-prds-self-improving-feedback-md branch May 5, 2026 05:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: docs/prds/self-improving-feedback.md#101

feat: docs/prds/self-improving-feedback.md#101
jonit-dev merged 9 commits into
masterfrom
night-watch/97-docs-prds-self-improving-feedback-md

jonit-dev commented May 4, 2026

Uh oh!

jonit-dev commented May 4, 2026

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

jonit-dev commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jonit-dev commented May 4, 2026

Uh oh!

jonit-dev commented May 4, 2026

Uh oh!

github-actions Bot commented May 4, 2026

DiffGuard AI Analysis

AI Review Summary

✅ Key Strengths

⚠️ Areas for Improvement

🐛 Bugs Found

📋 Issues Found

🔚 Conclusion

Uh oh!

github-actions Bot commented May 4, 2026

DiffGuard AI Analysis

AI Review Summary

✅ Key Strengths

⚠️ Areas for Improvement

📋 Issues Found

🔚 Conclusion

Uh oh!

github-actions Bot commented May 4, 2026

DiffGuard AI Analysis

AI Review Summary

✅ Key Strengths

⚠️ Areas for Improvement

🐛 Bugs Found

📋 Issues Found

🔚 Conclusion

Uh oh!

github-actions Bot commented May 4, 2026

DiffGuard AI Analysis

AI Review Summary

✅ Key Strengths

⚠️ Areas for Improvement

🐛 Bugs Found

📋 Issues Found

🔚 Conclusion

Uh oh!

github-actions Bot commented May 4, 2026

DiffGuard AI Analysis

AI Review Summary

✅ Key Strengths

⚠️ Areas for Improvement

🐛 Bugs Found

📋 Issues Found

🔚 Conclusion

Uh oh!

github-actions Bot commented May 4, 2026

DiffGuard AI Analysis

AI Review Summary

✅ Key Strengths

⚠️ Areas for Improvement

📋 Issues Found

🔚 Conclusion

Uh oh!

jonit-dev commented May 4, 2026

Night Watch QA Report

Changes Classification

Test Results

UI Tests (Playwright)

Screenshots

Video Recording

API Tests

Additional Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant