dispatcher: Fix issue with dispatching to a contended route#5179
dispatcher: Fix issue with dispatching to a contended route#5179ultrotter merged 6 commits intoprometheus:mainfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdded two Prometheus counters to DispatcherMetrics for aggregation-group creation contention, adjusted group creation retry control flow to continue via the LoadOrStore path after a successful CAS in races, and incremented the new metrics during retry/give-up conditions. Added a concurrency regression test exercising the CAS-recovery path. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: Guido Trotter <guido@hudson-trading.com>
Signed-off-by: Guido Trotter <guido@hudson-trading.com>
Signed-off-by: Guido Trotter <guido@hudson-trading.com>
3052269 to
29b8863
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dispatch/dispatch_test.go`:
- Around line 842-846: The comment is wrong because NewDispatcher leaves d.state
at DispatcherStateUnknown; instead of leaving it, set the dispatcher state
explicitly to WaitingToStart after creating it (e.g., call
dispatcher.state.Store(WaitingToStart)) so groupAlert falls into the explicit
no-op branch and avoids the warn logs, and update/remove the inaccurate comment
referencing WaitingToStart vs DispatcherStateUnknown.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: ed5cb43e-c682-4ce6-9358-775805b0d823
📒 Files selected for processing (4)
config/notifiers.godispatch/dispatch.godispatch/dispatch_test.gonotify/sns/sns.go
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
dispatch/dispatch_test.go (1)
843-845:⚠️ Potential issue | 🟡 MinorSet the dispatcher state explicitly or fix this comment.
NewDispatcherleaves the state asDispatcherStateUnknown; since this test skipsRun,groupAlerthits the default switch branch and logs"unknown state detected"for created groups. Considerdispatcher.state.Store(DispatcherStateWaitingToStart)before callinggroupAlert, or update the comment to describe the actual state.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dispatch/dispatch_test.go` around lines 843 - 845, The test assumes the dispatcher is in WaitingToStart but NewDispatcher leaves dispatcher.state as DispatcherStateUnknown, causing groupAlert to hit the default branch; either set the dispatcher state explicitly by calling dispatcher.state.Store(DispatcherStateWaitingToStart) before invoking groupAlert (or before assigning routeGroupsSlice) so the created groups see the correct state, or change the comment to accurately state that the dispatcher remains in DispatcherStateUnknown and will log "unknown state detected" when groupAlert runs; reference dispatcher.state.Store, DispatcherStateWaitingToStart, NewDispatcher, and groupAlert when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dispatch/dispatch_test.go`:
- Around line 847-890: Change the final assertion so the test is skipped (not
failed) when no contention was observed: after the loop that increments rounds
and before asserting retries, check
testutil.ToFloat64(metrics.aggrGroupCreationRetries) and if it is zero call
t.Skipf with a message mentioning rounds and that the contended CAS path wasn't
exercised (so CI/single-thread runs don't fail); otherwise keep the existing
require.Positive assertion to ensure the contended path is tested when retries >
0. Target the symbols metrics.aggrGroupCreationRetries, rounds, and t (the
*testing.T in dispatch_test.go) where the current require.Positive call is made.
---
Duplicate comments:
In `@dispatch/dispatch_test.go`:
- Around line 843-845: The test assumes the dispatcher is in WaitingToStart but
NewDispatcher leaves dispatcher.state as DispatcherStateUnknown, causing
groupAlert to hit the default branch; either set the dispatcher state explicitly
by calling dispatcher.state.Store(DispatcherStateWaitingToStart) before invoking
groupAlert (or before assigning routeGroupsSlice) so the created groups see the
correct state, or change the comment to accurately state that the dispatcher
remains in DispatcherStateUnknown and will log "unknown state detected" when
groupAlert runs; reference dispatcher.state.Store,
DispatcherStateWaitingToStart, NewDispatcher, and groupAlert when making the
change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 791bde71-072c-4bc2-a973-977e5a53c264
📒 Files selected for processing (2)
dispatch/dispatch.godispatch/dispatch_test.go
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Guido Trotter <ultrotter@gmail.com>
Signed-off-by: Guido Trotter <guido@hudson-trading.com>
Signed-off-by: Guido Trotter <guido@hudson-trading.com>
* Fix bug in dispatching to contended route * Add metrics for dispatching to contended routes * Add test about dispatching to contended routes * Skip requiring retries > 0 if a single concurrent goroutine is run Signed-off-by: Guido Trotter <ultrotter@gmail.com> Co-authored-by: Guido Trotter <guido@hudson-trading.com> Signed-off-by: Solomon Jacobs <solomonjacobs@protonmail.com>
When dispatching to a route by multiple goroutines, we hit an issue where if we can't swap the value in the map because another goroutine beat us to it, we should try to reload from the map, rather than swap again, and thus set loaded = false.
We also add metrics showing how many retries are happening, and whether the "giving up" situation is ever reached.
The test is a little convoluted, to make sure we are able to force that path, and that we never reach the "give up" state. If it becomes flaky we can remove it, or simplify it and also not call require.Positive(t, testutil.ToFloat64(metrics.aggrGroupCreationRetries), "contended CAS path was not exercised in %d rounds — scheduler is unusually serial", rounds), but as is it does show the issue, and the fix.
Pull Request Checklist
Please check all the applicable boxes.
Which user-facing changes does this PR introduce?
Summary by CodeRabbit
New Features
Bug Fixes
Tests