-
Notifications
You must be signed in to change notification settings - Fork 14
Critical: Fix CTE Broadcast Bug and Add Comprehensive Realtime Event Testing #304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
How to use the Graphite Merge QueueAdd either label to this PR to merge it via the merge queue:
You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
| Command | Status | Duration | Result |
|---|---|---|---|
nx affected -t lint typecheck test --parallel -... |
❌ Failed | 6m 8s | View ↗ |
nx affected -t test:e2e --parallel --base=45662... |
❌ Failed | 4m 16s | View ↗ |
☁️ Nx Cloud last updated this comment at 2025-11-03 12:52:40 UTC
Merge activity
|
…Testing (#304) ## Summary This PR addresses **critical bugs** in pgflow's realtime broadcasting system and dramatically improves test coverage for broadcast events. Two unreferenced CTEs in `start_ready_steps()` are never executed due to PostgreSQL's query optimizer, causing: 1. **`step:started`** **events never broadcast** - breaks real-time DAG visualization 2. **`step:completed`** **events for empty maps never broadcast** - breaks observability for edge cases Additionally, existing integration tests were too weak to catch these bugs. This PR adds comprehensive test coverage using new event matchers that verify exact event sequences, payloads, and counts. ## Root Cause: PostgreSQL CTE Optimization PostgreSQL's query optimizer **does not execute unreferenced CTEs with SELECT statements** - even when those SELECTs call functions with side effects like `realtime.send()`. **Critical distinction:** - ✅ **CTEs with INSERT/UPDATE/DELETE** - ALWAYS executed (side effects assumed) - ❌ **CTEs with SELECT only** - SKIPPED if unreferenced (no side effects assumed) In `pkgs/core/schemas/0100_function_start_ready_steps.sql`: ```sql -- This CTE is NEVER executed (unreferenced SELECT) broadcast_events AS ( SELECT realtime.send(...) -- ❌ Never runs! FROM started_step_states ), -- This CTE is ALSO never executed (unreferenced SELECT) broadcast_empty_completed AS ( SELECT realtime.send(...) -- ❌ Never runs! FROM completed_empty_steps ), -- Only this INSERT executes INSERT INTO pgflow.step_tasks (...) SELECT ... FROM sent_messages; -- ✅ Runs, but CTEs above don't ``` ## Bugs Discovered ### Bug 1: `step:started` Events Never Broadcast **Impact:** - Clients never receive `step:started` events during flow execution - DAG visualizations only update when steps complete (not when they begin) - Active step tracking (`flowState.activeStep`) never updates - WebSocket inspection shows only `step:*:completed` events **Affected Code:** `pkgs/core/schemas/0100_function_start_ready_steps.sql:144-162` The `broadcast_events` CTE that sends `step:started` messages is unreferenced and never executes. ### Bug 2: Empty Map `step:completed` Events Never Broadcast **Impact:** - Empty map steps (arrays with length 0) complete silently with no broadcast - Clients miss completion events for edge case flows - Breaks observability for map steps with empty input arrays **Affected Code:** `pkgs/core/schemas/0100_function_start_ready_steps.sql:45-64` The `broadcast_empty_completed` CTE is unreferenced and never executes. ## Historical Context This is the **same bug pattern** that affected `step:failed` events, fixed in commit `2b1ea777` (June 19, 2025): > "fix: address bug where step:failed event was not broadcast due to CTE optimization" **Related commits:** - `2b1ea777` - Fixed `step:failed` broadcast with PERFORM statements - `220c8672` - PR #161: "Fix step:failed events not being broadcast" - `ab17a0c5` - Ensured step:failed events are broadcast - `fe18250a` - Addressed broadcasting bug due to CTE optimization The fix for `step:failed` used explicit `PERFORM` statements to force execution. However, this pattern was **never applied** to `step:started` or empty map broadcasts. ## Why Tests Didn't Catch This The existing integration test in `pkgs/client/__tests__/integration/real-flow-execution.test.ts` only verified that **some** events were received: ```typescript // OLD TEST (too weak) let stepEventCount = 0; step.on('*', () => { stepEventCount++; }); // ... expect(stepEventCount).toBeGreaterThan(0); // ❌ Passes with only 1 event! ``` This passes even if only `step:completed` is broadcast (count = 1), without checking for `step:started`. ## Changes in This PR ### 1\. Test Infrastructure Improvements Added comprehensive event matchers to `pkgs/client/__tests__/helpers/test-utils.ts`: - `toHaveReceivedEvent(type, payload?)` - Verify specific event was received with optional payload matching - `toNotHaveReceivedEvent(type)` - Verify event was NOT received - `toHaveReceivedEventCount(type, count)` - Verify exact event count - `toHaveReceivedTotalEvents(count)` - Verify total event count - `toHaveReceivedEventSequence(types[])` - Verify exact event sequence - `toHaveReceivedInOrder(type1, type2)` - Verify ordering of two events These matchers enable precise verification of broadcast behavior. ### 2\. Client Unit Test Coverage (100+ New Test Cases) Enhanced `pkgs/client/__tests__/FlowRun.test.ts` and `FlowStep.test.ts` with: **FlowRun Event Tests:** - ✅ Comprehensive payload validation for `run:started`, `run:completed`, `run:failed` - ✅ Event lifecycle verification (started → completed, started → failed) - ✅ Duplicate event rejection (same status transitions) - ✅ Foreign-run event protection - ✅ `waitForStatus(Failed)` with timeout/abort support **FlowStep Event Tests:** - ✅ Comprehensive payload validation for all step event types - ✅ Event sequence verification (started → completed, started → failed) - ✅ Empty map edge case (completed ONLY, no started) - ✅ Duplicate event rejection - ✅ Foreign-step event protection - ✅ `waitForStatus(Started)` edge cases for empty maps **Key improvements:** - All event assertions now use event matchers (not just count checks) - Payload validation ensures correct data in events - Event sequence verification catches ordering bugs - Edge case coverage for empty maps and error conditions ### 3\. Integration Test Coverage (4 New Tests) Added critical integration tests to `pkgs/client/__tests__/integration/real-flow-execution.test.ts`: #### Test 1: **CRITICAL -** **`step:started`** **Broadcast Verification** (CURRENTLY FAILING) ```typescript it('CRITICAL: broadcasts step:started events (CTE optimization bug check)', ...) ``` This test **WILL FAIL** until Bug #1 is fixed. It specifically verifies: - `step:started` event is broadcast when `start_ready_steps()` executes - Event payload contains correct `run_id`, `step_slug`, `status` - Event sequence is `['step:started', 'step:completed']` - Both events received exactly once **Why it fails:** The `broadcast_events` CTE is unreferenced and never executes. #### Test 2: Empty Map Steps Edge Case ```typescript it('empty map steps: skip step:started and go straight to step:completed', ...) ``` Verifies the **expected behavior** for empty map steps: - NO `step:started` event (empty maps skip started state) - Only `step:completed` event is broadcast - Event payload has correct status and empty array output **Note:** This test will ALSO FAIL due to Bug #2 (`broadcast_empty_completed` unreferenced). #### Test 3: Enhanced Event Verification ```typescript it('receives broadcast events during flow execution', ...) ``` Updated to use event matchers instead of weak count checks: - Verifies exact event types (`run:completed`, `step:completed`) - Validates event payloads (run_id, flow_slug, status, output) - Counts events to ensure no duplicates #### Test 4: `waitForStatus(Started)` Behavior ```typescript it('waitForStatus(Started): waits for step to reach Started status', ...) ``` Verifies that: - Root steps are started immediately by `start_flow()` - `waitForStatus(Started)` resolves immediately if already started - Step has `started_at` timestamp when in Started status ### 4\. Documentation Updates Updated `.claude/skills/pgtap-testing/SKILL.md` description to be more comprehensive: - Added all common phrasings users might use to trigger testing skill - Emphasized realtime event testing patterns - Made skill activation more reliable ## Failing Tests (To Be Fixed in Follow-up Commit) The following tests are **EXPECTED TO FAIL** until the SQL bug is fixed: 1. ❌ `CRITICAL: broadcasts step:started events` - Fails because `broadcast_events` CTE never executes 2. ❌ `empty map steps: skip step:started and go straight to step:completed` - Fails because `broadcast_empty_completed` CTE never executes ## Solution (Not Included in This PR) The fix will use the same pattern as the `step:failed` fix from commit `2b1ea777`: **Option 1: Reference CTEs to force execution** ```sql broadcast_events AS ( SELECT realtime.send(...) as broadcast_result FROM started_step_states ) INSERT INTO pgflow.step_tasks (...) SELECT ... FROM sent_messages -- Force CTE execution by referencing it WHERE EXISTS (SELECT 1 FROM broadcast_events WHERE false); ``` The `WHERE EXISTS (...WHERE false)` ensures: - CTE **must be evaluated** (referenced) - **Zero performance impact** (WHERE false = no filtering) - **No change to INSERT behavior** **Option 2: Use PERFORM statements (like step:failed fix)** ```sql -- Move broadcasts out of CTE PERFORM realtime.send(...) FROM started_step_states; -- Then do INSERT INSERT INTO pgflow.step_tasks ... ``` ## Testing Strategy This PR follows a **test-first approach**: 1. ✅ **Add comprehensive test infrastructure** (event matchers) 2. ✅ **Add failing tests that document expected behavior** 3. ⬜ **Fix SQL bugs** (follow-up commit) 4. ⬜ **Verify all tests pass** ## Impact Assessment ### Before This PR - **Test Coverage:** Weak event counting (any event = pass) - **Bug Detection:** Failed to catch missing broadcasts - **Observability:** Client applications missing critical events - **Real-time UX:** DAG visualizations only update on completion ### After This PR (Tests Only) - **Test Coverage:** 100+ new test cases with event matchers - **Bug Detection:** Failing tests document exact bugs - **Documentation:** Clear understanding of event lifecycles - **Regression Prevention:** Future CTE bugs will be caught ### After SQL Fix (Follow-up) - **All Tests Pass:** Green CI with comprehensive coverage - **Full Observability:** All broadcast events working correctly - **Real-time UX:** DAG visualizations update during execution - **Production Ready:** Robust event system with test coverage ## Files Changed ### Test Infrastructure - `pkgs/client/__tests__/helpers/test-utils.ts` - Event matchers already exist (no changes) ### Unit Tests (Enhanced) - `pkgs/client/__tests__/FlowRun.test.ts` - Added lifecycle, payload, edge case tests - `pkgs/client/__tests__/FlowStep.test.ts` - Added lifecycle, payload, empty map tests ### Integration Tests (New) - `pkgs/client/__tests__/integration/real-flow-execution.test.ts` - Added 4 new tests ### Documentation - `.claude/skills/pgtap-testing/SKILL.md` - Enhanced skill description ### SQL (Bug Location - Not Fixed in This PR) - `pkgs/core/schemas/0100_function_start_ready_steps.sql` - Contains both bugs (lines 45-64, 144-162) ## Next Steps 1. **Merge this PR** - Establishes test coverage and failing tests 2. **Fix SQL bugs** - Apply CTE reference pattern or PERFORM statements 3. **Verify tests pass** - All new tests should turn green 4. **Audit other SQL functions** - Check for similar unreferenced CTE patterns 5. **Consider linting rule** - Detect unreferenced CTEs with function calls ## Checklist - [x] Added comprehensive event matchers to test utils - [x] Enhanced FlowRun unit tests with event verification - [x] Enhanced FlowStep unit tests with event verification - [x] Added failing integration test for `step:started` broadcast - [x] Added failing integration test for empty map broadcast - [x] Updated documentation (skill descriptions) - [x] Documented root cause and historical context - [ ] Fixed SQL bugs (follow-up commit) - [ ] All tests passing (after SQL fix) ## Related Issues This PR addresses the same class of bug as: - Commit `2b1ea777` - `step:failed` broadcast fix - PR #161 - "Fix step:failed events not being broadcast" ## Breaking Changes None. This PR only adds tests and documentation. ## Migration Required None. SQL changes will come in follow-up commit.

Summary
This PR addresses critical bugs in pgflow's realtime broadcasting system and dramatically improves test coverage for broadcast events. Two unreferenced CTEs in
start_ready_steps()are never executed due to PostgreSQL's query optimizer, causing:step:startedevents never broadcast - breaks real-time DAG visualizationstep:completedevents for empty maps never broadcast - breaks observability for edge casesAdditionally, existing integration tests were too weak to catch these bugs. This PR adds comprehensive test coverage using new event matchers that verify exact event sequences, payloads, and counts.
Root Cause: PostgreSQL CTE Optimization
PostgreSQL's query optimizer does not execute unreferenced CTEs with SELECT statements - even when those SELECTs call functions with side effects like
realtime.send().Critical distinction:
In
pkgs/core/schemas/0100_function_start_ready_steps.sql:Bugs Discovered
Bug 1:
step:startedEvents Never BroadcastImpact:
step:startedevents during flow executionflowState.activeStep) never updatesstep:*:completedeventsAffected Code:
pkgs/core/schemas/0100_function_start_ready_steps.sql:144-162The
broadcast_eventsCTE that sendsstep:startedmessages is unreferenced and never executes.Bug 2: Empty Map
step:completedEvents Never BroadcastImpact:
Affected Code:
pkgs/core/schemas/0100_function_start_ready_steps.sql:45-64The
broadcast_empty_completedCTE is unreferenced and never executes.Historical Context
This is the same bug pattern that affected
step:failedevents, fixed in commit2b1ea777(June 19, 2025):Related commits:
2b1ea777- Fixedstep:failedbroadcast with PERFORM statements220c8672- PR Fix step:failed events not being broadcast #161: "Fix step:failed events not being broadcast"ab17a0c5- Ensured step:failed events are broadcastfe18250a- Addressed broadcasting bug due to CTE optimizationThe fix for
step:failedused explicitPERFORMstatements to force execution. However, this pattern was never applied tostep:startedor empty map broadcasts.Why Tests Didn't Catch This
The existing integration test in
pkgs/client/__tests__/integration/real-flow-execution.test.tsonly verified that some events were received:This passes even if only
step:completedis broadcast (count = 1), without checking forstep:started.Changes in This PR
1. Test Infrastructure Improvements
Added comprehensive event matchers to
pkgs/client/__tests__/helpers/test-utils.ts:toHaveReceivedEvent(type, payload?)- Verify specific event was received with optional payload matchingtoNotHaveReceivedEvent(type)- Verify event was NOT receivedtoHaveReceivedEventCount(type, count)- Verify exact event counttoHaveReceivedTotalEvents(count)- Verify total event counttoHaveReceivedEventSequence(types[])- Verify exact event sequencetoHaveReceivedInOrder(type1, type2)- Verify ordering of two eventsThese matchers enable precise verification of broadcast behavior.
2. Client Unit Test Coverage (100+ New Test Cases)
Enhanced
pkgs/client/__tests__/FlowRun.test.tsandFlowStep.test.tswith:FlowRun Event Tests:
run:started,run:completed,run:failedwaitForStatus(Failed)with timeout/abort supportFlowStep Event Tests:
waitForStatus(Started)edge cases for empty mapsKey improvements:
3. Integration Test Coverage (4 New Tests)
Added critical integration tests to
pkgs/client/__tests__/integration/real-flow-execution.test.ts:Test 1: CRITICAL -
step:startedBroadcast Verification (CURRENTLY FAILING)This test WILL FAIL until Bug #1 is fixed. It specifically verifies:
step:startedevent is broadcast whenstart_ready_steps()executesrun_id,step_slug,status['step:started', 'step:completed']Why it fails: The
broadcast_eventsCTE is unreferenced and never executes.Test 2: Empty Map Steps Edge Case
Verifies the expected behavior for empty map steps:
step:startedevent (empty maps skip started state)step:completedevent is broadcastNote: This test will ALSO FAIL due to Bug #2 (
broadcast_empty_completedunreferenced).Test 3: Enhanced Event Verification
Updated to use event matchers instead of weak count checks:
run:completed,step:completed)Test 4:
waitForStatus(Started)BehaviorVerifies that:
start_flow()waitForStatus(Started)resolves immediately if already startedstarted_attimestamp when in Started status4. Documentation Updates
Updated
.claude/skills/pgtap-testing/SKILL.mddescription to be more comprehensive:Failing Tests (To Be Fixed in Follow-up Commit)
The following tests are EXPECTED TO FAIL until the SQL bug is fixed:
CRITICAL: broadcasts step:started events- Fails becausebroadcast_eventsCTE never executesempty map steps: skip step:started and go straight to step:completed- Fails becausebroadcast_empty_completedCTE never executesSolution (Not Included in This PR)
The fix will use the same pattern as the
step:failedfix from commit2b1ea777:Option 1: Reference CTEs to force execution
The
WHERE EXISTS (...WHERE false)ensures:Option 2: Use PERFORM statements (like step:failed fix)
Testing Strategy
This PR follows a test-first approach:
Impact Assessment
Before This PR
After This PR (Tests Only)
After SQL Fix (Follow-up)
Files Changed
Test Infrastructure
pkgs/client/__tests__/helpers/test-utils.ts- Event matchers already exist (no changes)Unit Tests (Enhanced)
pkgs/client/__tests__/FlowRun.test.ts- Added lifecycle, payload, edge case testspkgs/client/__tests__/FlowStep.test.ts- Added lifecycle, payload, empty map testsIntegration Tests (New)
pkgs/client/__tests__/integration/real-flow-execution.test.ts- Added 4 new testsDocumentation
.claude/skills/pgtap-testing/SKILL.md- Enhanced skill descriptionSQL (Bug Location - Not Fixed in This PR)
pkgs/core/schemas/0100_function_start_ready_steps.sql- Contains both bugs (lines 45-64, 144-162)Next Steps
Checklist
step:startedbroadcastRelated Issues
This PR addresses the same class of bug as:
2b1ea777-step:failedbroadcast fixBreaking Changes
None. This PR only adds tests and documentation.
Migration Required
None. SQL changes will come in follow-up commit.