Fix flaky JetStream publish test in clustered environments by lalinsky · Pull Request #130 · lalinsky/nats.zig

lalinsky · 2026-01-28T07:25:39Z

Summary

Fixes a timing race condition in the "JetStream publish basic message" test that was causing intermittent CI failures with NoStreamResponse errors.

Root Cause

In NATS clustered environments, when a stream is created, there's a brief propagation window before all cluster nodes are aware of the new stream. If a client publishes immediately after stream creation to a node that hasn't received the metadata yet, it receives a 503 No Responders error.

Changes

Added 100ms delay in jetstream_test.zig after stream creation to allow cluster metadata propagation
Added TODO comments in jetstream.zig for implementing proper retry logic per ADR-22
Referenced nats.go's approach (250ms backoff, 2 retries by default)

Test Plan

Existing test should now pass reliably in CI
Delay matches pattern used in other JetStream tests (push, nak, etc.)
Full retry logic implementation tracked for future work

In clustered NATS environments, there's a brief window after stream creation where metadata hasn't propagated to all nodes yet. This causes intermittent 503 NoResponders errors when publishing immediately. Added 100ms delay in test to allow propagation, matching pattern used in other JetStream tests. Also added TODO comments for implementing proper retry logic per ADR-22.

coderabbitai · 2026-01-28T07:25:54Z

📝 Walkthrough

Walkthrough

Adds TODO comments documenting planned retry logic for 503 NoResponders errors in clustered environments, referencing ADR-22. Introduces a 100ms test delay to allow stream metadata propagation across cluster nodes before message publishing. No logic or control flow modifications.

Changes

Cohort / File(s)	Summary
Documentation & Planning `src/jetstream.zig`	Added 6 lines of TODO comments outlining retry strategy for NoResponders errors in cluster scenarios, with ADR-22 reference
Test Timing Adjustment `tests/jetstream_test.zig`	Inserted 100ms sleep after stream creation in "JetStream publish basic message" test to allow metadata replication; includes TODO referencing ADR-22

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

The changes are minimal and involve documentation plus a straightforward timing adjustment. The main consideration is understanding the cluster propagation timing requirement, but there's no complex logic to trace through.

Possibly related PRs

Basic JetStream API support #11: Introduces the JetStream implementation that these TODO comments and test adjustments directly address.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main fix—adding a delay to resolve a flaky test caused by timing issues in clustered NATS environments.
Description check	✅ Passed	The description clearly explains the root cause (metadata propagation delay in clustered environments), the fix applied (100ms delay), and references TODO comments for future retry logic work.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix-jetstream-publish-race-condition

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

lalinsky mentioned this pull request Jan 28, 2026

Implement retry logic for JetStream publish in clustered environments #131

Open

coderabbitai bot approved these changes Jan 28, 2026

View reviewed changes

lalinsky merged commit 44a8813 into main Jan 28, 2026
3 checks passed

lalinsky deleted the fix-jetstream-publish-race-condition branch January 28, 2026 07:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flaky JetStream publish test in clustered environments#130

Fix flaky JetStream publish test in clustered environments#130
lalinsky merged 1 commit intomainfrom
fix-jetstream-publish-race-condition

lalinsky commented Jan 28, 2026

Uh oh!

coderabbitai bot commented Jan 28, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lalinsky commented Jan 28, 2026

Summary

Root Cause

Changes

Test Plan

Uh oh!

coderabbitai bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Jan 28, 2026 •

edited

Loading