feat: watch improvements by jason-lynch · Pull Request #310 · pgEdge/control-plane

jason-lynch · 2026-03-20T21:22:51Z

Summary

Makes several substantial improvements to the storage.Watch operations:

Watch now does a synchronous Get operation before starting the watch to get the current value(s) and to get the start revision for the watch. Callers no longer have to do this operation themselves.
Watch now restarts itself automatically using the most recently-fetched revision. We were previously repeating this restart logic in every caller. Restarts are rate-limited to 1 per second.
Watch reports errors over an error channel, matching the convention that we've established everywhere.

We had an unused Until operation, so I opted to remove it rather than update it to reflect these changes.

Testing

From the user's perspective, everything should function the same as it did before. The changes in this PR were more focused on improving the watch's usability and edge-case behavior, such as when a Put occurs between the initial Get and the Watch operations, or the recovery behavior after an outage.

coderabbitai · 2026-03-20T21:23:00Z

📝 Walkthrough

Walkthrough

This pull request refactors the watch mechanism across the storage system, removing error-driven watch closure (ErrWatchClosed, ErrWatchUntilTimedOut) and redesigning the WatchOp interface to use error-returning handlers and dedicated error channels. Watch initialization now performs an initial synchronous Get before watching changes. All dependent services (election, migration, scheduler) are updated to use the new API.

Changes

Cohort / File(s)	Summary
Watch Interface & Implementation `server/internal/storage/interface.go`, `server/internal/storage/watch.go`, `server/internal/storage/errors.go`	Replaced `ErrWatchClosed` and `ErrWatchUntilTimedOut` error constants; reworked `WatchOp` interface to use error-returning handlers, removed `Until()` method, added `Error()` channel and `PropagateErrors()` method; rewrote watch startup flow with initial synchronous `Get`, atomic concurrency guards, and error propagation via dedicated channel.
Watch Usage in Services `server/internal/election/candidate.go`, `server/internal/migrate/runner.go`, `server/internal/scheduler/service.go`	Updated watch initialization to call `store.Watch()` upfront, reworked watch callbacks to return errors, removed `ErrWatchClosed` restart logic, added `EventTypeUnknown` handling, and changed watch errors to route via `PropagateErrors()` instead of direct error pushes.
Watch Tests `server/internal/storage/watch_test.go`	Replaced `Until`-based tests with new tests verifying initial event delivery from pre-existing keys, handler error propagation to both `Watch()` result and `Error()` channel, and `Create` event detection; adjusted imports accordingly.
Test & Utility Updates `server/internal/election/candidate_test.go`, `server/internal/testutils/logger.go`	Moved logger factory and election service initialization into subtests for per-test lifecycles; enhanced test logger to include `test_name` field when verbose mode is enabled.
Dependencies `go.mod`	Marked `golang.org/x/time v0.9.0` as a direct dependency by removing the `// indirect` comment.

Poem

🐰 Watch refactored, errors retired with grace,
Error channels now guide each async race,
Initial Gets anchor the watch stream,
PropagateErrors—a cleaner scheme,
Restarts reborn, no more closure dreams! 🎉

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is well-structured with Summary, Changes context, and Testing sections matching the template. However, the Changes section lacks explicit bulleted items and the Checklist section is completely missing.	Add a bulleted Changes section with specific modifications and complete all checklist items (tests, docs, issue link, changelog, breaking changes).
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: watch improvements' clearly summarizes the main changes (watch function improvements) and follows Conventional Commits style as required by the template.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/watch-improvements

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

jason-lynch · 2026-03-23T23:40:23Z

@coderabbitai review

coderabbitai · 2026-03-23T23:40:29Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

tsivaprasad · 2026-03-31T04:36:54Z

server/internal/election/candidate.go

-	}
-
+	c.watchOp.Close()
 	c.done <- struct{}{}


Instead of c.done <- struct{}{}, I think it should be close(c.done)

codacy-production · 2026-03-31T17:28:09Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 4 complexity . 0 duplication

Metric Results

Complexity 4

Duplication 0

View in Codacy

_{TIP This summary will be updated as you push new changes. Give us feedback}

Makes several substantial improvements to the `storage.Watch` operations: - `Watch` now does a synchronous `Get` operation before the starting the watch to get the current value(s) and to get the start revision for the watch. Callers no longer have to do this operation themselves. - `Watch` now restarts itself automatically using the most recently- fetched revision. We were previously repeating this restart logic in every caller. Restarts are rate-limited to 1 per second. - `Watch` reports errors over an error channel, matching the convention that we've established everywhere. We had an unused `Until` operation and I opted to remove it rather than update it for these changes.

coderabbitai

🧹 Nitpick comments (2)

server/internal/storage/watch_test.go (1)

86-102: Consider using unique key for test isolation.

Lines 88 and 97 use a hardcoded "watch-err" key while other tests use uuid.NewString(). For consistency and to prevent potential interference if tests are run with -parallel in the future, consider using a unique key.

♻️ Suggested change for consistency

 	t.Run("delivers error from handler", func(t *testing.T) {
 		ctx := t.Context()
-		watch := storage.NewWatchOp[*TestValue](client, "watch-err")
+		key := uuid.NewString()
+		watch := storage.NewWatchOp[*TestValue](client, key)
 
 		sentinel := assert.AnError
 		handler := func(e *storage.Event[*TestValue]) error {
 			return sentinel
 		}
 		require.NoError(t, watch.Watch(ctx, handler))
 		t.Cleanup(watch.Close)
 
-		err := storage.NewCreateOp(client, "watch-err", &TestValue{SomeField: "v"}).
+		err := storage.NewCreateOp(client, key, &TestValue{SomeField: "v"}).
 			Exec(ctx)
 		require.NoError(t, err)
 
 		require.ErrorIs(t, <-watch.Error(), sentinel)
 	})

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@server/internal/storage/watch_test.go` around lines 86 - 102, The test uses a
hardcoded key "watch-err" for both NewWatchOp and NewCreateOp which can cause
interference when tests run in parallel; replace the literal with a unique key
(e.g., k := uuid.NewString()) and use that variable in
storage.NewWatchOp[*TestValue](client, k) and storage.NewCreateOp(client, k,
...) so NewWatchOp, watch, and NewCreateOp all reference the same per-test
unique key for isolation.

server/internal/storage/watch.go (1)

50-70: Handler invoked while holding mutex during initial load.

The load() method calls handle() at line 62 while holding o.mu. If the handler ever needs to call any watchOp method (e.g., Close()), this would deadlock. The current callers don't do this, but it's worth documenting or restructuring.

Consider releasing the lock before invoking handlers, or document that handlers must not call back into the watch operation.

♻️ Optional: Release lock before calling handlers

 func (o *watchOp[V]) load(ctx context.Context, handle func(e *Event[V]) error) error {
 	o.mu.Lock()
-	defer o.mu.Unlock()
 
 	resp, err := o.client.Get(ctx, o.key, o.options...)
 	if err != nil {
+		o.mu.Unlock()
 		return fmt.Errorf("failed to get initial items for watch: %w", err)
 	}
 
+	o.revision = resp.Header.Revision
+	o.mu.Unlock()
+
 	for _, kv := range resp.Kvs {
 		if err := handle(convertKVToEvent[V](kv)); err != nil {
 			return err
 		}
 	}
 
-	o.revision = resp.Header.Revision
-
 	return nil
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@server/internal/storage/watch.go` around lines 50 - 70, The load() method
currently holds o.mu while calling handle(), risking deadlock if handlers call
back into watchOp; to fix, while holding o.mu build a slice of events by
converting resp.Kvs with convertKVToEvent[V], set o.revision =
resp.Header.Revision, then release o.mu and iterate the collected events calling
handle(event) (returning any handler error). This ensures o.revision is updated
under the lock but handlers run without holding o.mu; update the load()
implementation accordingly (watchOp.load, o.mu, o.revision, convertKVToEvent[V],
handle).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@server/internal/storage/watch_test.go`:
- Around line 86-102: The test uses a hardcoded key "watch-err" for both
NewWatchOp and NewCreateOp which can cause interference when tests run in
parallel; replace the literal with a unique key (e.g., k := uuid.NewString())
and use that variable in storage.NewWatchOp[*TestValue](client, k) and
storage.NewCreateOp(client, k, ...) so NewWatchOp, watch, and NewCreateOp all
reference the same per-test unique key for isolation.

In `@server/internal/storage/watch.go`:
- Around line 50-70: The load() method currently holds o.mu while calling
handle(), risking deadlock if handlers call back into watchOp; to fix, while
holding o.mu build a slice of events by converting resp.Kvs with
convertKVToEvent[V], set o.revision = resp.Header.Revision, then release o.mu
and iterate the collected events calling handle(event) (returning any handler
error). This ensures o.revision is updated under the lock but handlers run
without holding o.mu; update the load() implementation accordingly
(watchOp.load, o.mu, o.revision, convertKVToEvent[V], handle).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e7f1fc90-45d4-468f-93e7-db92ae3c1b2e

📥 Commits

Reviewing files that changed from the base of the PR and between 0b4300c and 8fbc079.

📒 Files selected for processing (10)

go.mod
server/internal/election/candidate.go
server/internal/election/candidate_test.go
server/internal/migrate/runner.go
server/internal/scheduler/service.go
server/internal/storage/errors.go
server/internal/storage/interface.go
server/internal/storage/watch.go
server/internal/storage/watch_test.go
server/internal/testutils/logger.go

💤 Files with no reviewable changes (1)

server/internal/storage/errors.go

jason-lynch force-pushed the feat/watch-improvements branch 2 times, most recently from f3edc39 to 0905973 Compare March 20, 2026 21:40

jason-lynch force-pushed the feat/put-with-updated-version branch from fa785be to 64a9954 Compare March 24, 2026 00:16

jason-lynch force-pushed the feat/watch-improvements branch from 0905973 to fa49e10 Compare March 24, 2026 00:16

jason-lynch requested a review from tsivaprasad March 30, 2026 15:37

tsivaprasad approved these changes Mar 31, 2026

View reviewed changes

jason-lynch force-pushed the feat/put-with-updated-version branch from 64a9954 to 2db5030 Compare March 31, 2026 17:26

jason-lynch force-pushed the feat/watch-improvements branch from fa49e10 to f7cad02 Compare March 31, 2026 17:26

Base automatically changed from feat/put-with-updated-version to main March 31, 2026 18:03

jason-lynch force-pushed the feat/watch-improvements branch from f7cad02 to 8fbc079 Compare March 31, 2026 18:07

coderabbitai bot reviewed Mar 31, 2026

View reviewed changes

jason-lynch merged commit f18292a into main Mar 31, 2026
3 checks passed

jason-lynch deleted the feat/watch-improvements branch March 31, 2026 18:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: watch improvements#310

feat: watch improvements#310
jason-lynch merged 1 commit intomainfrom
feat/watch-improvements

jason-lynch commented Mar 20, 2026

Uh oh!

coderabbitai bot commented Mar 20, 2026 •

edited

Loading

Walkthrough

Changes

Poem

❌ Failed checks (2 warnings)

Uh oh!

jason-lynch commented Mar 23, 2026

Uh oh!

coderabbitai bot commented Mar 23, 2026

Uh oh!

tsivaprasad Mar 31, 2026

Uh oh!

codacy-production bot commented Mar 31, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jason-lynch commented Mar 20, 2026

Summary

Testing

Uh oh!

coderabbitai bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Poem

❌ Failed checks (2 warnings)

Uh oh!

jason-lynch commented Mar 23, 2026

Uh oh!

coderabbitai bot commented Mar 23, 2026

Uh oh!

tsivaprasad Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

codacy-production bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Up to standards ✅

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Mar 20, 2026 •

edited

Loading

codacy-production bot commented Mar 31, 2026 •

edited

Loading