Skip to content

Conversation

@jumski
Copy link
Contributor

@jumski jumski commented Nov 3, 2025

Summary

Fixes critical bug where step:started and step:completed realtime events were not being broadcast to clients due to PostgreSQL query planner optimizing away unused CTEs.

Bugs Fixed

1. Missing step:started broadcasts (Critical)

Problem: Clients never received step:started events when steps transitioned from Created to Started status.

Root Cause: The broadcast_events CTE in start_ready_steps() was not referenced by any subsequent CTEs or the final RETURN statement. PostgreSQL's query optimizer eliminated it as "dead code", preventing realtime.send() from executing.

Solution: Moved realtime.send() call directly into the RETURNING clause of the UPDATE statement that marks steps as Started. This ensures the broadcast executes atomically with the state change and cannot be optimized away.

2. Missing step:completed broadcasts for empty map steps (Critical)

Problem: Clients never received step:completed events for map steps that received empty arrays and completed immediately.

Root Cause: The broadcast_empty_completed CTE in start_ready_steps() suffered the same optimization issue - not referenced by subsequent operations.

Solution: Moved realtime.send() call into the RETURNING clause of the UPDATE that completes empty map steps.

3. Missing step:completed broadcasts in cascade completion (Critical)

Problem: Clients didn't receive events when taskless steps were cascade-completed in cascade_complete_taskless_steps().

Root Cause: Similar CTE optimization issue in the cascade completion logic.

Solution: Added realtime.send() directly in the RETURNING clause of the cascade completion UPDATE.

Technical Changes

SQL Schema Updates

start_ready_steps.sql:

  • Removed broadcast_events CTE (step:started)
  • Removed broadcast_empty_completed CTE (step:completed for empty maps)
  • Added realtime.send() in RETURNING clause of started_step_states CTE
  • Added realtime.send() in RETURNING clause of completed_empty_steps CTE

cascade_complete_taskless_steps.sql:

  • Added realtime.send() in RETURNING clause of cascade completion UPDATE
  • Broadcasts step:completed events atomically with state transition

complete_task.sql:

  • Added PERFORM statements for run:failed and step:failed event broadcasts
  • Ensures error events are reliably broadcast

Client-Side Improvements

New applySnapshot() methods:

  • Added to FlowRun and FlowStep classes
  • Applies database state directly without emitting events
  • Used for initial state hydration from start_flow_with_states() and get_run_with_states()

Why this matters: Previously, we used updateState() for initial state, which internally tried to emit events. This was wrong - the initial state from the database is a snapshot, not a series of events. The new applySnapshot() method makes this distinction clear.

Test Updates

Completely rewrote broadcast tests to understand the actual behavior:

Root steps vs Dependent steps:

  • Root steps start in the same transaction as start_flow() - already Started when the function returns
  • Can't observe their step:started broadcasts (would require listening before the transaction commits)
  • Tests now verify state directly instead of trying to catch impossible broadcasts

Dependent steps:

  • Start AFTER their dependencies complete (separate transaction)
  • CAN observe step:started broadcasts (the actual test case for the bug fix)
  • New test specifically verifies these broadcasts work correctly

Empty map steps:

  • Root empty maps complete in start_flow() transaction - verify state directly
  • Dependent empty maps complete after dependencies - verify broadcasts

Testing

All tests passing:

  • ✅ Root steps verified via state (already Started when startFlow() returns)
  • ✅ Dependent steps receive step:started broadcasts
  • ✅ Dependent empty maps receive step:completed broadcasts (skip step:started)
  • ✅ Cascade completion broadcasts step:completed events
  • ✅ Event sequence validation (started → completed order)

Migration Notes

No breaking changes. This is a pure bug fix that makes the system work as originally intended.

Users who implemented workarounds for missing events can now remove them.

@changeset-bot
Copy link

changeset-bot bot commented Nov 3, 2025

🦋 Changeset detected

Latest commit: 962cd07

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 7 packages
Name Type
@pgflow/client Patch
@pgflow/core Patch
pgflow Patch
@pgflow/edge-worker Patch
@pgflow/example-flows Patch
@pgflow/dsl Patch
@pgflow/website Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Contributor Author

jumski commented Nov 3, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • merge:queue - adds this PR to the back of the merge queue
  • hotfix:queue - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@jumski jumski changed the title refactor(schemas): remove broadcast empty completed and step started events for simplification Fix missing realtime broadcasts for step:started and step:completed events Nov 3, 2025
@nx-cloud
Copy link

nx-cloud bot commented Nov 3, 2025

View your CI Pipeline Execution ↗ for commit 962cd07

Command Status Duration Result
nx affected -t build --configuration=production... ✅ Succeeded 4s View ↗
nx affected -t lint typecheck test --parallel -... ✅ Succeeded 6m 35s View ↗
nx affected -t test:e2e --parallel --base=6b7ac... ✅ Succeeded 4m 53s View ↗

☁️ Nx Cloud last updated this comment at 2025-11-03 22:44:17 UTC

@jumski jumski force-pushed the fix-cte-broadcast-sql branch 2 times, most recently from 9c48a1e to 560de7c Compare November 3, 2025 22:13
Comment on lines 212 to 227
// DEBUG: Check actual database state
const dbState = await sql`
SELECT step_slug, status, remaining_deps, initial_tasks
FROM pgflow.step_states
WHERE run_id = ${run.run_id}
ORDER BY step_slug
`;
console.log('DB State:', dbState);

// DEBUG: Check what get_run_with_states returns
const apiResponse = await sql`SELECT pgflow.get_run_with_states(${run.run_id})`;
console.log('API Response:', JSON.stringify(apiResponse[0].get_run_with_states, null, 2));

// DEBUG: Check client state
console.log('Client rootStep.status:', rootStep.status);
console.log('Client dependentStep.status:', dependentStep.status);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The debug logging statements should be removed before merging. While they're helpful during development for troubleshooting the realtime event issues, they'll clutter the test output in CI and for other developers. Consider either removing them or wrapping them in a conditional debug flag if this diagnostic information might be needed again in the future.

Suggested change
// DEBUG: Check actual database state
const dbState = await sql`
SELECT step_slug, status, remaining_deps, initial_tasks
FROM pgflow.step_states
WHERE run_id = ${run.run_id}
ORDER BY step_slug
`;
console.log('DB State:', dbState);
// DEBUG: Check what get_run_with_states returns
const apiResponse = await sql`SELECT pgflow.get_run_with_states(${run.run_id})`;
console.log('API Response:', JSON.stringify(apiResponse[0].get_run_with_states, null, 2));
// DEBUG: Check client state
console.log('Client rootStep.status:', rootStep.status);
console.log('Client dependentStep.status:', dependentStep.status);
// Check actual database state
const dbState = await sql`
SELECT step_slug, status, remaining_deps, initial_tasks
FROM pgflow.step_states
WHERE run_id = ${run.run_id}
ORDER BY step_slug
`;
// Check what get_run_with_states returns
const apiResponse = await sql`SELECT pgflow.get_run_with_states(${run.run_id})`;

Spotted by Graphite Agent

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

@jumski jumski force-pushed the fix-cte-broadcast-sql branch 2 times, most recently from da943e2 to 10490fe Compare November 3, 2025 22:34
…events for simplification

- Eliminated broadcast_empty_completed CTE to streamline completed step event handling
- Removed broadcast_events CTE for step:started events to improve query clarity
- Replaced broadcast event logic with direct PERFORM statements for real-time event broadcasting
- Enhanced real-time event broadcasting for step:started and step:completed events
- Simplified event dispatching logic for better maintainability and performance
@github-actions
Copy link
Contributor

github-actions bot commented Nov 3, 2025

🔍 Preview Deployment: Website

Deployment successful!

🔗 Preview URL: https://pr-305.pgflow.pages.dev

📝 Details:

  • Branch: fix-cte-broadcast-sql
  • Commit: 5450224f46b5e4c329d763bf14c50418475c0c49
  • View Logs

_Last updated: _

@graphite-app
Copy link
Contributor

graphite-app bot commented Nov 3, 2025

Merge activity

  • Nov 3, 11:28 PM UTC: jumski added this pull request to the Graphite merge queue.
  • Nov 3, 11:28 PM UTC: CI is running for this pull request on a draft pull request (#307) due to your merge queue CI optimization settings.
  • Nov 3, 11:29 PM UTC: Merged by the Graphite merge queue via draft PR: #307.

graphite-app bot pushed a commit that referenced this pull request Nov 3, 2025
…vents (#305)

## Summary

Fixes critical bug where `step:started` and `step:completed` realtime events were not being broadcast to clients due to PostgreSQL query planner optimizing away unused CTEs.

## Bugs Fixed

### 1. Missing step:started broadcasts (Critical)
**Problem:** Clients never received `step:started` events when steps transitioned from Created to Started status.

**Root Cause:** The `broadcast_events` CTE in `start_ready_steps()` was not referenced by any subsequent CTEs or the final RETURN statement. PostgreSQL's query optimizer eliminated it as "dead code", preventing `realtime.send()` from executing.

**Solution:** Moved `realtime.send()` call directly into the RETURNING clause of the UPDATE statement that marks steps as Started. This ensures the broadcast executes atomically with the state change and cannot be optimized away.

### 2. Missing step:completed broadcasts for empty map steps (Critical)
**Problem:** Clients never received `step:completed` events for map steps that received empty arrays and completed immediately.

**Root Cause:** The `broadcast_empty_completed` CTE in `start_ready_steps()` suffered the same optimization issue - not referenced by subsequent operations.

**Solution:** Moved `realtime.send()` call into the RETURNING clause of the UPDATE that completes empty map steps.

### 3. Missing step:completed broadcasts in cascade completion (Critical)
**Problem:** Clients didn't receive events when taskless steps were cascade-completed in `cascade_complete_taskless_steps()`.

**Root Cause:** Similar CTE optimization issue in the cascade completion logic.

**Solution:** Added `realtime.send()` directly in the RETURNING clause of the cascade completion UPDATE.

## Technical Changes

### SQL Schema Updates

**`start_ready_steps.sql`:**
- Removed `broadcast_events` CTE (step:started)
- Removed `broadcast_empty_completed` CTE (step:completed for empty maps)
- Added `realtime.send()` in RETURNING clause of `started_step_states` CTE
- Added `realtime.send()` in RETURNING clause of `completed_empty_steps` CTE

**`cascade_complete_taskless_steps.sql`:**
- Added `realtime.send()` in RETURNING clause of cascade completion UPDATE
- Broadcasts step:completed events atomically with state transition

**`complete_task.sql`:**
- Added PERFORM statements for run:failed and step:failed event broadcasts
- Ensures error events are reliably broadcast

### Client-Side Improvements

**New `applySnapshot()` methods:**
- Added to `FlowRun` and `FlowStep` classes
- Applies database state directly without emitting events
- Used for initial state hydration from `start_flow_with_states()` and `get_run_with_states()`

**Why this matters:** Previously, we used `updateState()` for initial state, which internally tried to emit events. This was wrong - the initial state from the database is a snapshot, not a series of events. The new `applySnapshot()` method makes this distinction clear.

### Test Updates

Completely rewrote broadcast tests to understand the actual behavior:

**Root steps vs Dependent steps:**
- Root steps start in the same transaction as `start_flow()` - already Started when the function returns
- Can't observe their step:started broadcasts (would require listening before the transaction commits)
- Tests now verify state directly instead of trying to catch impossible broadcasts

**Dependent steps:**
- Start AFTER their dependencies complete (separate transaction)
- CAN observe step:started broadcasts (the actual test case for the bug fix)
- New test specifically verifies these broadcasts work correctly

**Empty map steps:**
- Root empty maps complete in `start_flow()` transaction - verify state directly
- Dependent empty maps complete after dependencies - verify broadcasts

## Testing

All tests passing:
- ✅ Root steps verified via state (already Started when startFlow() returns)
- ✅ Dependent steps receive step:started broadcasts
- ✅ Dependent empty maps receive step:completed broadcasts (skip step:started)
- ✅ Cascade completion broadcasts step:completed events
- ✅ Event sequence validation (started → completed order)

## Migration Notes

No breaking changes. This is a pure bug fix that makes the system work as originally intended.

Users who implemented workarounds for missing events can now remove them.
@graphite-app graphite-app bot closed this Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants