Skip to content

Fix flaky JetStream publish test in clustered environments#130

Merged
lalinsky merged 1 commit intomainfrom
fix-jetstream-publish-race-condition
Jan 28, 2026
Merged

Fix flaky JetStream publish test in clustered environments#130
lalinsky merged 1 commit intomainfrom
fix-jetstream-publish-race-condition

Conversation

@lalinsky
Copy link
Copy Markdown
Owner

Summary

Fixes a timing race condition in the "JetStream publish basic message" test that was causing intermittent CI failures with NoStreamResponse errors.

Root Cause

In NATS clustered environments, when a stream is created, there's a brief propagation window before all cluster nodes are aware of the new stream. If a client publishes immediately after stream creation to a node that hasn't received the metadata yet, it receives a 503 No Responders error.

Changes

  • Added 100ms delay in jetstream_test.zig after stream creation to allow cluster metadata propagation
  • Added TODO comments in jetstream.zig for implementing proper retry logic per ADR-22
  • Referenced nats.go's approach (250ms backoff, 2 retries by default)

Test Plan

  • Existing test should now pass reliably in CI
  • Delay matches pattern used in other JetStream tests (push, nak, etc.)
  • Full retry logic implementation tracked for future work

In clustered NATS environments, there's a brief window after stream
creation where metadata hasn't propagated to all nodes yet. This causes
intermittent 503 NoResponders errors when publishing immediately.

Added 100ms delay in test to allow propagation, matching pattern used
in other JetStream tests. Also added TODO comments for implementing
proper retry logic per ADR-22.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 28, 2026

📝 Walkthrough

Walkthrough

Adds TODO comments documenting planned retry logic for 503 NoResponders errors in clustered environments, referencing ADR-22. Introduces a 100ms test delay to allow stream metadata propagation across cluster nodes before message publishing. No logic or control flow modifications.

Changes

Cohort / File(s) Summary
Documentation & Planning
src/jetstream.zig
Added 6 lines of TODO comments outlining retry strategy for NoResponders errors in cluster scenarios, with ADR-22 reference
Test Timing Adjustment
tests/jetstream_test.zig
Inserted 100ms sleep after stream creation in "JetStream publish basic message" test to allow metadata replication; includes TODO referencing ADR-22

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

The changes are minimal and involve documentation plus a straightforward timing adjustment. The main consideration is understanding the cluster propagation timing requirement, but there's no complex logic to trace through.

Possibly related PRs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main fix—adding a delay to resolve a flaky test caused by timing issues in clustered NATS environments.
Description check ✅ Passed The description clearly explains the root cause (metadata propagation delay in clustered environments), the fix applied (100ms delay), and references TODO comments for future retry logic work.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-jetstream-publish-race-condition

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@lalinsky lalinsky merged commit 44a8813 into main Jan 28, 2026
3 checks passed
@lalinsky lalinsky deleted the fix-jetstream-publish-race-condition branch January 28, 2026 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant