Skip to content

feat: docs/prds/self-improving-feedback.md#101

Merged
jonit-dev merged 9 commits into
masterfrom
night-watch/97-docs-prds-self-improving-feedback-md
May 5, 2026
Merged

feat: docs/prds/self-improving-feedback.md#101
jonit-dev merged 9 commits into
masterfrom
night-watch/97-docs-prds-self-improving-feedback-md

Conversation

@jonit-dev
Copy link
Copy Markdown
Owner

Closes #97

Night Watch manages this draft PR automatically so progress is preserved across retries and timeouts.

Status labels:

  • nw:partial: implementation is in progress and intentionally incomplete
  • nw:resumable: resume this PR before starting new work
  • nw:ready-review: implementation is complete and ready for review

@jonit-dev jonit-dev added nw:partial Executor started this PR and work is intentionally incomplete nw:resumable Executor should resume this unfinished PR before starting new work nw:ready-review Executor finished implementation and the PR is ready for automated/human review and removed nw:partial Executor started this PR and work is intentionally incomplete nw:resumable Executor should resume this unfinished PR before starting new work labels May 4, 2026
@jonit-dev jonit-dev marked this pull request as ready for review May 4, 2026 01:33
@jonit-dev
Copy link
Copy Markdown
Owner Author

🤖 Implemented by Codex

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

DiffGuard AI Analysis

AI Review Summary

🏆 Overall Score: 82/100

This PR implements a comprehensive self-improving feedback loop with outcome recording, pattern detection, and prompt augmentation. The implementation is well-structured with strong test coverage, but has some performance considerations and opportunities for improved observability.


✅ Key Strengths

  • Comprehensive Feature Implementation: Complete feedback loop from outcome recording → pattern detection → prompt augmentation with proper database schema, migrations, and API routes.
  • Robust Secret Redaction: Multiple layers of secret detection and redaction across API keys, tokens, passwords, and nested metadata structures.
  • Strong Test Coverage: Well-organized test suites covering outcome parsing, pattern analysis, prompt augmentation, storage, and API routes with proper database cleanup.

⚠️ Areas for Improvement

  • Silent Failure Observability: The try-catch blocks that swallow errors (e.g., applyProjectFeedbackPromptEnv, recordJobOutcome) should log warnings at minimum to aid debugging without affecting exit behavior.
  • Pattern Analysis Query Overhead: analyzeFeedbackOutcome makes multiple sequential queryOutcomes calls to compute streaks; consider consolidating or caching recent outcomes per project/job type.
  • Config Proliferation: The feedback config is manually added to multiple form states (Settings, Scheduling, JobsTab); consider a centralized helper or schema-driven approach.

🐛 Bugs Found

Bug Name Affected Files Description Confidence
None found

📋 Issues Found

Issue Type Issue Name Affected Components Description Impact/Severity
Maintainability Silent Error Swallowing packages/cli/src/commands/*.ts, packages/core/src/feedback/prompt-augmenter.ts Errors caught in feedback recording are silently ignored, making it difficult to diagnose why feedback might not be working. Medium
Performance Multiple Streak Queries packages/core/src/feedback/pattern-analyzer.ts countRecentStreaks and countSuccessStreak both query outcomes independently; could be optimized with a single query or in-memory caching. Low
Testing Edge Case Coverage packages/core/src/__tests__/feedback/pattern-analyzer.test.ts Missing tests for edge cases like maxActiveAugmentations: 0, pattern expiration, and confidence threshold boundaries. Low

🔚 Conclusion

This is a solid, well-architected feature addition with proper separation of concerns and comprehensive test coverage. The main follow-ups are adding observability for silent failures and considering query optimization for pattern analysis at scale. No blocking bugs found—merge-ready with minor improvements recommended.


Analyzed using z-ai/glm-5

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

DiffGuard AI Analysis

AI Review Summary

🏆 Overall Score: 88/100

This PR implements a comprehensive self-improving feedback loop system with outcome persistence, pattern detection, and prompt augmentation. The implementation is thorough and well-structured with strong test coverage, but has a few minor consistency and maintainability concerns.


✅ Key Strengths

  • Comprehensive Feature Implementation: The feedback loop spans the full stack—outcome recording in all CLI commands, SQLite persistence, pattern analysis, prompt augmentation, server API routes, and a full dashboard UI component.
  • Strong Test Coverage: Includes targeted tests for outcome parsing, pattern detection, prompt augmentation, session outcome storage, feedback API routes, and dashboard UI behavior with realistic fixture data.
  • Security-Conscious Design: Implements robust secret redaction in both outcome text and metadata (API keys, tokens, private keys) before persistence, preventing sensitive data leakage.
  • Graceful Failure Handling: Outcome recording is consistently wrapped in try-catch blocks with explicit comments that persistence failures must not affect command exit behavior.

⚠️ Areas for Improvement

  • Duplicate Function Definition: applyProjectFeedbackPromptEnv is defined identically in both review.ts and run.ts. Extract to shared/feedback.ts alongside recordJobOutcome to avoid duplication and improve maintainability.
  • Missing Feedback Section in Web Settings Forms: toFormState and toAutomationForm include hardcoded feedback defaults, but JobsTab.tsx already has a proper UI for feedback configuration. Consider removing the hardcoded defaults and relying on the loaded config, or add tests that verify the feedback section renders correctly in Settings.

📋 Issues Found

Issue Type Issue Name Affected Components Description Impact/Severity
Maintainability Duplicate Feedback Function commands/review.ts, commands/run.ts applyProjectFeedbackPromptEnv is duplicated across two files; should be centralized. Medium
Testing Config Form Coverage Gap pages/Settings.tsx Feedback config fields in JobsTab UI are not covered by existing Settings test fixtures. Low

🔚 Conclusion

This is a strong, well-designed PR with comprehensive cross-cutting implementation and appropriate safeguards. The feedback loop architecture is sound and the secret redaction is thorough. Addressing the duplicated function is a straightforward refactor that would improve maintainability. Ready for merge with minor cleanup.


Analyzed using z-ai/glm-5

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

DiffGuard AI Analysis

AI Review Summary

🏆 Overall Score: 82/100

This PR implements a comprehensive self-improving feedback loop system with outcome recording, pattern detection, and prompt augmentation. The implementation is thorough with good test coverage, but has a few correctness and performance concerns to address.


✅ Key Strengths

  • Comprehensive Feature Implementation: Complete feedback loop from outcome recording through pattern detection to prompt augmentation across all job types.
  • Robust Secret Redaction: Multiple regex patterns handle various secret formats (API keys, tokens, private keys) before persisting to database.
  • Well-Structured Test Coverage: Tests for outcome parsing, pattern analysis, repository behavior, API routes, and UI components with proper isolation.
  • Graceful Degradation: Outcome persistence wrapped in try-catch with explicit comment that failures must not change command exit behavior.

⚠️ Areas for Improvement

  • Database State in Test Setup: The test at run.test.ts:496-540 manually calls closeDb() and resetRepositories() at the start of the test case, but the afterEach hook also calls these. Consider whether this double-reset is intentional or could mask initialization issues.
  • Redundant Config Loading: applyProjectFeedbackPromptEnv in run.ts:350 and review.ts:144 calls loadConfig(projectDir) even though the config was already loaded earlier in each command. This could cause issues if config files change mid-execution.
  • Missing Logging for Silent Failures: All outcome recording errors are silently caught with empty catch blocks. Consider adding debug-level logging to aid troubleshooting when outcome persistence fails.

🐛 Bugs Found

Bug Name Affected Files Description Confidence
Missing Defaults for Partial Feedback Config packages/cli/src/commands/shared/feedback.ts:36-42 If config.feedback is defined but has missing properties, getFeedbackAnalysisOptions uses config.feedback ?? {...} which only provides defaults when feedback is entirely undefined, not for partial configs. Medium 🟡

📋 Issues Found

Issue Type Issue Name Affected Components Description Impact/Severity
Performance Multiple Sequential Queries on Outcome packages/core/src/feedback/pattern-analyzer.ts analyzeFeedbackOutcome executes multiple sequential queries (listPatterns, queryOutcomes for streak counting, listActiveAugmentations) which could increase latency at scale. Medium
Testing Test Double-Resets Database packages/cli/src/__tests__/commands/run.test.ts:498-500 Test manually resets database state before the test body, but afterEach already does this—could indicate test isolation concerns. Low
Maintainability Duplicated Outcome Recording Logic Multiple command files While recordJobOutcome exists, some commands (review, run) use inline repository calls instead of the shared helper. Low
Security Potential Timing Attack on Signature Comparison packages/core/src/feedback/outcome-parser.ts Failure signatures are compared directly for pattern matching; if signatures ever include sensitive data (unlikely given the redaction), timing attacks could theoretically apply. Low

🔚 Conclusion

This is a strong implementation of a complex feature with good architecture and comprehensive test coverage. The main concern is the potential bug with partial feedback config handling, which should be verified. The performance concerns are manageable at current scale but worth tracking. Recommend addressing the config defaults issue before merge.


Analyzed using z-ai/glm-5

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

DiffGuard AI Analysis

AI Review Summary

🏆 Overall Score: 82/100

This PR implements a comprehensive self-improving feedback loop system with outcome recording, pattern detection, and prompt augmentation. The implementation is thorough with good test coverage, but has some code duplication and architectural concerns that should be addressed.


✅ Key Strengths

  • Comprehensive Feature Implementation: Complete feedback loop from outcome recording → pattern detection → prompt augmentation → dashboard visualization, covering all major job types.
  • Strong Test Coverage: Well-structured tests across outcome parsing, pattern analysis, repository operations, API routes, and UI components with clear test scenarios.
  • Security-Conscious Design: Automatic secret redaction for API keys, tokens, and credentials in both outcome parsing and repository layers.

⚠️ Areas for Improvement

  • Duplicate Function Definitions: applyProjectFeedbackPromptEnv is defined identically in both run.ts and review.ts. This should be consolidated into shared/feedback.ts to avoid maintenance drift.
  • Config Reload Overhead: The applyProjectFeedbackPromptEnv function calls loadConfig(projectDir) internally, reloading config from disk instead of accepting it as a parameter. This loses any CLI overrides applied to the config and adds unnecessary I/O.
  • Secret Redaction Duplication: Secret redaction logic exists in both outcome-parser.ts and session-outcome.repository.ts with slightly different implementations. Should consolidate into a shared utility.

🐛 Bugs Found

Bug Name Affected Files Description Confidence
CLI Overrides Ignored in Feedback Prompt packages/cli/src/commands/run.ts, packages/cli/src/commands/review.ts applyProjectFeedbackPromptEnv reloads config via loadConfig(projectDir), discarding any CLI overrides that were applied earlier in the command handler. This could cause feedback prompts to use stale configuration values. Medium 🟡

📋 Issues Found

Issue Type Issue Name Affected Components Description Impact/Severity
Maintainability Duplicated Prompt Env Function packages/cli/src/commands/run.ts, packages/cli/src/commands/review.ts Identical applyProjectFeedbackPromptEnv function defined in two places Medium
Performance Repeated Database Queries packages/server/src/routes/feedback.routes.ts buildWindowSummary queries outcomes multiple times per window; could be consolidated into fewer queries Low
Maintainability Hardcoded Magic Numbers packages/core/src/feedback/pattern-analyzer.ts Constants like RECENT_WINDOW_MS, STALE_WINDOW_MS are hardcoded without configuration options Low

🔚 Conclusion

This is a strong implementation of a complex feedback loop feature. The code is well-structured and thoroughly tested. The main concerns are code duplication (especially the applyProjectFeedbackPromptEnv function) and the config reload issue which could lead to subtle bugs with CLI overrides. These should be addressed before merge, but the overall architecture is solid.


Analyzed using z-ai/glm-5

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

DiffGuard AI Analysis

AI Review Summary

🏆 Overall Score: 82/100

This PR implements a comprehensive self-improving feedback loop system with structured outcome storage, pattern detection, and prompt augmentation. The implementation is thorough and well-architected, with a few areas for improvement around code duplication and performance considerations.


✅ Key Strengths

  • Comprehensive Secret Redaction: Multiple layers of protection covering API keys, tokens, and auth headers in both outcome-parser.ts and session-outcome.repository.ts, ensuring sensitive data never persists.
  • Clean Architecture: Well-organized module structure with clear separation between outcome parsing, pattern analysis, and prompt augmentation layers. Repository pattern with dependency injection is properly implemented.
  • Resilient Error Handling: All feedback recording operations are wrapped in try-catch blocks with explicit comments that outcome persistence must not change command exit behavior.
  • Thorough Test Coverage: Good integration tests covering outcome storage, pattern activation, prompt rendering, and API routes with proper database setup/teardown.

⚠️ Areas for Improvement

  • Code Duplication: applyProjectFeedbackPromptEnv is duplicated between run.ts and review.ts. This should be consolidated into shared/feedback.ts alongside recordJobOutcome.
  • Synchronous Pattern Analysis: Pattern detection runs synchronously after each outcome is recorded via analyzeFeedbackOutcome(), which could add latency to high-frequency job completions. Consider debouncing or making async.
  • Magic Numbers Without Context: Constants like MAX_PATTERN_TEXT_LENGTH = 180 and RECENT_WINDOW_MS lack comments explaining their derivation. These thresholds impact pattern quality and should be documented.

🐛 Bugs Found

Bug Name Affected Files Description Confidence
None found N/A No concrete bugs identified. The implementation handles edge cases well with proper try-catch wrapping and secret redaction. N/A

📋 Issues Found

Issue Type Issue Name Affected Components Description Impact/Severity
Maintainability Duplicate Feedback Prompt Function run.ts, review.ts applyProjectFeedbackPromptEnv is copy-pasted between both command files instead of being shared. Medium
Performance Synchronous Pattern Analysis pattern-analyzer.ts, feedback.ts analyzeFeedbackOutcome runs synchronously, querying recent outcomes and upserting patterns on every job completion. Low
Maintainability Undocumented Threshold Constants pattern-analyzer.ts Magic numbers like RECENT_WINDOW_MS, STALE_WINDOW_MS, and MAX_PATTERN_TEXT_LENGTH lack justification comments. Low
Testing Missing Edge Case Tests feedback/*.test.ts No tests for error scenarios like database locked, circular metadata, or malformed script results. Low

🔚 Conclusion

This is a strong implementation of a feedback loop system with excellent security practices and clean architecture. The main concern is code duplication that should be refactored before merge. Performance considerations around synchronous pattern analysis are minor but worth tracking for high-frequency deployments. No critical bugs found.


Analyzed using z-ai/glm-5

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

DiffGuard AI Analysis

AI Review Summary

🏆 Overall Score: 85/100

This PR introduces a comprehensive self-improving feedback loop system that records job outcomes, detects repeated failure patterns, and generates prompt augmentations to improve future runs. The implementation is thorough with solid test coverage and appropriate database schema, though it has some maintainability and optimization opportunities.


✅ Key Strengths

  • Comprehensive Coverage: Outcome recording is consistently integrated across all 9 job types (executor, reviewer, qa, audit, analytics, planner, pr-resolver, merger, slicer) with proper error isolation.
  • Security-Conscious Design: Thorough secret redaction across multiple patterns (API keys, tokens, private keys) in both outcome-parser.ts and the repository layer, with tests verifying redaction.
  • Database Architecture: Proper SQLite schema with well-designed indexes (idx_session_outcomes_lookup, idx_feedback_patterns_lookup, idx_prompt_augmentations_active) supporting efficient time-based and status queries.
  • Failure Classification: Robust categorization logic in classifyFailure() handling 9 failure categories (typescript, eslint, test, ci, review-score, rate-limit, timeout, conflict, unknown) with signature normalization for pattern matching.

⚠️ Areas for Improvement

  • Duplicated Function Implementation: applyProjectFeedbackPromptEnv() is implemented identically in both run.ts and review.ts. Consolidate into shared/feedback.ts to prevent divergence and reduce code duplication.
  • Query Optimization: buildWindowSummary() in feedback.routes.ts executes separate querySummary() calls for each job type (8+ queries per request). Consider a single aggregate query with GROUP BY job_type to reduce database roundtrips.
  • Secret Pattern Duplication: SECRET_TEXT_PATTERNS regex arrays are defined independently in outcome-parser.ts and session-outcome.repository.ts. Extract to a shared module to ensure consistent redaction behavior.

📋 Issues Found

Issue Type Issue Name Affected Components Description Impact/Severity
Maintainability Duplicated Feedback Env Function run.ts, review.ts Same applyProjectFeedbackPromptEnv() implementation exists in both files with identical logic. Medium
Performance Sequential Job Type Queries feedback.routes.tsbuildWindowSummary() Runs querySummary() for each valid job type separately, creating 8+ sequential DB calls per window. Medium
Maintainability Duplicated Secret Patterns outcome-parser.ts, session-outcome.repository.ts Same SECRET_TEXT_PATTERNS regex array defined in two locations, risking inconsistent updates. Low
Testing Outcome Parser Edge Cases outcome-parser.test.ts Missing tests for complex ANSI stripping, nested metadata redaction, and path extraction edge cases. Low

🔚 Conclusion

This is a strong, well-structured implementation of a sophisticated feedback loop feature. The core logic is sound with appropriate safeguards for secret handling and error isolation. The identified issues are maintainability and optimization concerns that should be addressed before long-term maintenance but do not block immediate merge. Consolidating the duplicated function and secret patterns would improve code health significantly.


Analyzed using z-ai/glm-5

Test User and others added 4 commits May 3, 2026 18:58
- Generated by Night Watch QA agent
- UI tests: 1 passing, 0 failing
- API tests: 3 passing, 0 failing
- Artifacts: screenshots, videos

Co-Authored-By: Claude <noreply@anthropic.com>
@jonit-dev
Copy link
Copy Markdown
Owner Author

Night Watch QA Report

Changes Classification

  • Type: UI + API
  • Files changed: 52

Test Results

UI Tests (Playwright)

  • Status: All passing
  • Tests: 1 test in 1 file
  • Command: yarn --cwd web playwright test tests/e2e/qa/qa-feedback-dashboard.spec.ts --reporter=list

Screenshots

Feedback dashboard QA

Video Recording

Video artifact committed to qa-artifacts/qa-feedback-dashboard.webm — view in the PR's file changes.

API Tests

  • Status: All passing
  • Tests: 3 tests in 1 file
  • Command: yarn --cwd packages/server vitest run src/__tests__/server/feedback-validation.test.ts --reporter=verbose

Additional Verification

  • Commit hook ran turbo run verify: 5 successful tasks

Night Watch QA Agent

🧪 QA run by Codex

@jonit-dev jonit-dev added e2e-validated PR acceptance requirements validated by e2e/integration tests ready-to-merge PR is ready to merge labels May 4, 2026
@jonit-dev jonit-dev merged commit 7f28a4f into master May 5, 2026
5 checks passed
@jonit-dev jonit-dev deleted the night-watch/97-docs-prds-self-improving-feedback-md branch May 5, 2026 05:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

e2e-validated PR acceptance requirements validated by e2e/integration tests nw:ready-review Executor finished implementation and the PR is ready for automated/human review ready-to-merge PR is ready to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

docs/prds/self-improving-feedback.md

1 participant