Fix flaky JetStream publish test in clustered environments#130
Conversation
In clustered NATS environments, there's a brief window after stream creation where metadata hasn't propagated to all nodes yet. This causes intermittent 503 NoResponders errors when publishing immediately. Added 100ms delay in test to allow propagation, matching pattern used in other JetStream tests. Also added TODO comments for implementing proper retry logic per ADR-22.
📝 WalkthroughWalkthroughAdds TODO comments documenting planned retry logic for 503 NoResponders errors in clustered environments, referencing ADR-22. Introduces a 100ms test delay to allow stream metadata propagation across cluster nodes before message publishing. No logic or control flow modifications. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes The changes are minimal and involve documentation plus a straightforward timing adjustment. The main consideration is understanding the cluster propagation timing requirement, but there's no complex logic to trace through. Possibly related PRs
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
Fixes a timing race condition in the "JetStream publish basic message" test that was causing intermittent CI failures with
NoStreamResponseerrors.Root Cause
In NATS clustered environments, when a stream is created, there's a brief propagation window before all cluster nodes are aware of the new stream. If a client publishes immediately after stream creation to a node that hasn't received the metadata yet, it receives a 503 No Responders error.
Changes
jetstream_test.zigafter stream creation to allow cluster metadata propagationjetstream.zigfor implementing proper retry logic per ADR-22Test Plan