feat(loadgen): add room-member add load test (members-sustained, members-capacity)#203
Conversation
Adds the brainstorming spec for two new tools/loadgen subcommands — members-sustained (open-loop throughput) and members-capacity (sequential per-room growth) — covering frontdoor and canonical inject modes, fixture/preset shape, E1/E2 correlation, and testing plan. https://claude.ai/code/session_01Q8Fyx576dmfM9yfbEMtB6t
Scopes v1 to --shape=users only (per spec amendment); plan covers fixtures, publishers (frontdoor + canonical), collector with (roomID, sortedAccounts) E2 correlation, sustained + capacity generators, summary printers, subcommand wiring, Makefile targets, and integration test. https://claude.ai/code/session_01Q8Fyx576dmfM9yfbEMtB6t
Implements the deterministic members-workload fixture builder: pure function of (preset, seed, siteID) producing Users, Rooms, Subscriptions, RoomKeys, and a CandidatePools map (accounts not yet seeded per room) for downstream add-member generators. https://claude.ai/code/session_01Q8Fyx576dmfM9yfbEMtB6t
Implements MemberPublisher interface and canonicalMemberPublisher that publishes AddMembersRequest events directly to JetStream ROOMS stream, bypassing room-service for isolated room-worker benchmarking. https://claude.ai/code/session_01Q8Fyx576dmfM9yfbEMtB6t
Adds OnBroadcast callback to MemberCollector so per-room goroutines in CapacityMembersGenerator can gate each publish on the previous broadcast arriving, enabling closed-loop sequential room growth to a target size. https://claude.ai/code/session_01Q8Fyx576dmfM9yfbEMtB6t
Adds --workload=messages|members (default: messages) to the seed and
teardown subcommands, dispatching to per-workload helpers. The messages
path preserves existing behavior except slog messages now include the
workload label ("seed complete (messages)" / "teardown complete (messages)").
The members path uses BuiltinMembersPreset and BuildMembersFixtures and
logs a candidatePoolTotal stat. Also extracts roomIDsOf helper shared by
both teardown variants.
https://claude.ai/code/session_01Q8Fyx576dmfM9yfbEMtB6t
Wires up the members-sustained CLI subcommand, plumbing together the NATS/JetStream client, metrics HTTP server, E2 broadcast subscription, MemberPublisher (frontdoor and canonical), ConsumerSampler against the ROOMS stream, SustainedMembersGenerator, and end-of-run finalize with summary printing and optional CSV export. Also adds stub for members-capacity (to be implemented in Task 15). https://claude.ai/code/session_01Q8Fyx576dmfM9yfbEMtB6t
Replaces the stub runMembersCapacity with a real implementation that drives each room to --target-size using CapacityMembersGenerator, then reports per-bucket E1/E2 latency via CapacitySummary and computeSizeBuckets. https://claude.ai/code/session_01Q8Fyx576dmfM9yfbEMtB6t
Verifies the full members-sustained pipeline with simulated room-service and room-worker stages via NATS subscriptions, asserting clean exit and non-zero traffic through both simulated stages. https://claude.ai/code/session_01Q8Fyx576dmfM9yfbEMtB6t
Replaces fragile error-string comparison with the typed sentinel check per CLAUDE.md's "Never compare errors by string" rule. https://claude.ai/code/session_01Q8Fyx576dmfM9yfbEMtB6t
- Deduplicate snapshotLatencies: delete Collector.snapshotLatenciesLocked and route both Collector + MemberCollector through the package-level snapshotLatencies helper. - Extract ParseInjectMode helper; collapse three duplicate inject parse blocks (runRun, runMembersSustained, runMembersCapacity) into a single call site each. - Remove dead SiteID field from SustainedMembersConfig + CapacityMembersConfig (never read inside the generators; subjects are baked into the publisher). - NewMemberCollector now takes InjectMode (typed) instead of raw string. - Single time.Now() per publish in both generators (was called twice, causing skew in E1 latency at high rates). - Cache string(Inject) / string(Shape) labels on generator construction rather than reconverting every publish. - Trim spec-doc path from ErrPoolsExhausted and shape-validation error messages; the path will rot before the messages do. https://claude.ai/code/session_01Q8Fyx576dmfM9yfbEMtB6t
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThis PR adds a complete load-test workload for benchmarking room member-addition operations. It introduces two execution modes (sustained rate-limited publishing and capacity-driven per-room growth), support for two request paths (frontdoor NATS request/reply and canonical JetStream), E1/E2 event correlation, and size-bucketed reporting. ChangesMembers workload benchmark
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 12
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/superpowers/specs/2026-05-19-load-test-room-members-design.md`:
- Around line 283-285: The docs use two different test filenames —
`collector_member_test.go` and `members_collector_test.go` — causing
inconsistency; pick the canonical name (use `members_collector_test.go` to match
the layout) and replace all occurrences of `collector_member_test.go` in this
spec (including the testing section and the other spot around the later mention)
with `members_collector_test.go` so the spec and implementation tasks reference
the same filename.
- Around line 30-34: The doc claims v1 implements only shape=users but later
sections describe multi-shape behavior; update the spec to make the v1 contract
explicit by either (A) adding a clear parse-time rejection for non-users shapes
(referencing the Shape enum and the --shape flag and validation logic) and
calling out that the request builder (which currently only emits user-account
lists) will error on orgs/channels/mixed, or (B) marking the multi-shape
behavior in sections referencing runtime/tests (lines describing multi-shape) as
"post-v1" only; pick one approach and apply it consistently to the Shape enum,
--shape flag/validation, and the request builder description so there are no
unsupported runtime paths implied.
- Around line 12-19: Several fenced code blocks (for example the block that
begins with "client →
chat.user.{account}.request.room.{roomID}.{siteID}.member.add →
room-service.handleAddMembers ..." and the other blocks at ranges 50-55,
238-251, 299-315) are missing language identifiers; edit those fenced code
blocks in docs/superpowers/specs/2026-05-19-load-test-room-members-design.md and
add an appropriate language tag (e.g., `text`, `bash`, or `go`) after the
opening backticks so each block is annotated for markdownlint/MD040 and tooling.
In `@tools/loadgen/main.go`:
- Around line 384-387: The shutdown code currently discards errors from
metricsSrv.Shutdown(shutCtx) and nc.Drain(); instead, capture and handle those
errors explicitly: call metricsSrv.Shutdown(shutCtx) and check its error (log
via the existing logger or return it), and do the same for nc.Drain() (log error
or propagate), and avoid discarding cancelShut() results if it can return an
error; update both occurrences (the Shutdown/Drain calls around
metricsSrv.Shutdown, cancelShut, nc.Drain and the repeated calls at the other
location flagged) to surface failures (use process logger, ctx-aware messages,
and/or return non-nil error) so cleanup failures are not silently ignored.
- Around line 631-637: The code currently computes a single global E1/E2 via
ComputePercentiles(c.E1Samples()) and assigns those same percentiles to every
non-empty bucket in the out slice; instead, compute percentiles per-bucket: for
each bucket index i where out[i].Count > 0, gather or request that bucket's
samples (e.g., use a bucket-aware samples accessor or filter
c.E1Samples()/c.E2Samples() by bucket index) and call ComputePercentiles for
that bucket, then assign those bucket-specific e1/e2 to out[i].E1 and out[i].E2;
do not reuse the global e1/e2 variables. Ensure you only compute percentiles
when out[i].Count > 0 to avoid unnecessary work.
- Around line 376-377: Replace the blind time.Sleep calls (the ones before
collector.DiscardBefore(warmupDeadline) and the similar sleep at the second
location) with an event-driven drain: stop using time.Sleep and instead wait for
a deterministic quiescence signal from the collector or an in-flight counter.
Add or use a method like collector.WaitUntilEmpty(ctx, timeout) or expose an
in-flight counter/channel on collector (e.g., collector.InFlight() + a channel
closed when zero) and block on that (or a context timeout) before calling
collector.DiscardBefore(warmupDeadline); ensure both occurrences (the sleep
before DiscardBefore and the other sleep at the second location) are replaced
with this wait and that a timeout/ctx is used to avoid indefinite blocking.
In `@tools/loadgen/members_integration_test.go`:
- Around line 57-67: The callback registered in rsSub (nc.Subscribe callback) is
silently discarding errors from json.Unmarshal, m.Respond, and js.Publish which
can hide pipeline failures; update the callback to check and handle each error
returned by m.Respond and js.Publish (and the outer nc.Subscribe err) — log or
t.Errorf/fatal the error or increment a failure counter so the test surface
fails when any Respond/Publish fails, and add a short comment only if you
intentionally choose to ignore an error; apply the same treatment to the other
simulation callbacks referenced around lines 80-97 to avoid silently swallowing
errors.
- Around line 121-126: Replace the fixed time.Sleep with a deterministic
wait-based assertion: remove the 500ms Sleep and instead use a polling/assertion
that retries until both counters are > 0 (for example use assert.Eventually or a
small loop with time.After and polling) checking rsCalls.Load() > 0 and
rwCalls.Load() > 0; reference the existing symbols rsCalls.Load() and
rwCalls.Load() and fail the test if the condition does not become true within a
reasonable timeout (e.g. a few seconds) to avoid flakiness.
In `@tools/loadgen/members_publisher_test.go`:
- Around line 20-38: The tests currently start an actual NATS server via
startEmbeddedJetStream which makes them integration tests; either move these
tests into an explicit integration test package/suite (so startEmbeddedJetStream
remains but only used by integration tests) or refactor the publisher
construction to accept an injectable publish client/function (e.g., add a
PublishFunc or JetStreamer interface parameter to the publisher constructor used
by MembersPublisher or NewMembersPublisher) so unit tests can supply a fake/mock
instead of calling startEmbeddedJetStream; update tests in the 40-141 range to
use the fake PublishFunc/JetStreamer for unit tests and keep a separate
integration test that uses startEmbeddedJetStream if end-to-end verification is
needed.
In `@tools/loadgen/members_publisher.go`:
- Around line 79-80: In Close(), don't ignore the result of p.sub.Unsubscribe():
capture the returned error from p.sub.Unsubscribe(), and either return it (or
log it via the publisher's logger) before setting p.sub = nil so teardown
failures are observable; if you intentionally want to discard the error, add a
clear comment stating why. Update the Close method to handle the unsubscribe
error from p.sub.Unsubscribe(), referencing the Close method, the p.sub field
and the Unsubscribe() call.
- Around line 65-68: Guard the p.onReply call inside the nc.Subscribe callback
to avoid panics when no handler is set: in the subscription created around
inboxPrefix+".*" (the callback that computes corr :=
m.Subject[len(inboxPrefix)+1:]) check if p.onReply != nil before calling
p.onReply(corr, m.Data, time.Now()), or alternatively enforce a non-nil handler
in newFrontdoorMemberPublisher by returning an error or setting a no-op handler;
update either the callback or the constructor to ensure replies are handled
safely.
In `@tools/loadgen/README.md`:
- Around line 93-110: The README.md fenced code blocks with the make commands
are missing language identifiers (triggering MD040); update each triple-backtick
fence that wraps the make -C tools/loadgen/deploy commands to use ```bash so all
three blocks (the sustained run block, the capacity-mode block, and the
reset-members block) start with ```bash and retain their existing content,
ensuring the markdown linter no longer flags MD040.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 129bb287-d6c4-4365-8977-af58da83c1e4
📒 Files selected for processing (22)
docs/superpowers/plans/2026-05-19-load-test-room-members.mddocs/superpowers/specs/2026-05-19-load-test-room-members-design.mdpkg/subject/subject.gopkg/subject/subject_test.gotools/loadgen/README.mdtools/loadgen/collector.gotools/loadgen/deploy/Makefiletools/loadgen/generator.gotools/loadgen/main.gotools/loadgen/main_test.gotools/loadgen/members.gotools/loadgen/members_collector.gotools/loadgen/members_collector_test.gotools/loadgen/members_generator.gotools/loadgen/members_generator_test.gotools/loadgen/members_integration_test.gotools/loadgen/members_publisher.gotools/loadgen/members_publisher_test.gotools/loadgen/members_report.gotools/loadgen/members_report_test.gotools/loadgen/members_test.gotools/loadgen/metrics.go
| ``` | ||
| client | ||
| → chat.user.{account}.request.room.{roomID}.{siteID}.member.add | ||
| → room-service.handleAddMembers (auth, capacity, dedup, channel expansion) | ||
| → chat.room.canonical.{siteID}.member.add (ROOMS stream) | ||
| → room-worker (resolve, write subscriptions, emit broadcast + outbox) | ||
| → chat.room.{roomID}.event (member_added RoomEvent) | ||
| ``` |
There was a problem hiding this comment.
Add language identifiers to fenced code blocks (MD040).
The fences in these sections are missing language hints. Please annotate them (for example text, bash, or go) to satisfy markdownlint and keep docs tooling consistent.
Also applies to: 50-55, 238-251, 299-315
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 12-12: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/superpowers/specs/2026-05-19-load-test-room-members-design.md` around
lines 12 - 19, Several fenced code blocks (for example the block that begins
with "client → chat.user.{account}.request.room.{roomID}.{siteID}.member.add →
room-service.handleAddMembers ..." and the other blocks at ranges 50-55,
238-251, 299-315) are missing language identifiers; edit those fenced code
blocks in docs/superpowers/specs/2026-05-19-load-test-room-members-design.md and
add an appropriate language tag (e.g., `text`, `bash`, or `go`) after the
opening backticks so each block is annotated for markdownlint/MD040 and tooling.
| The first cut implements `--shape=users` only. The `Shape` enum, flag, | ||
| and validation are in place from the start so adding `orgs` / `channels` / | ||
| `mixed` is a small follow-up plan, but the v1 plan does not seed an org | ||
| pool or a source-channel pool, and the request builder only emits | ||
| user-account lists. |
There was a problem hiding this comment.
Clarify the v1 shape contract to avoid unsupported runtime paths.
Line [30] says v1 is shape=users only, but Lines [140-142] and [279-282] describe active multi-shape runtime behavior/tests. Please make the contract explicit: either reject non-users shapes at parse time in v1, or mark those sections as post-v1 only.
Also applies to: 140-142, 279-282
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/superpowers/specs/2026-05-19-load-test-room-members-design.md` around
lines 30 - 34, The doc claims v1 implements only shape=users but later sections
describe multi-shape behavior; update the spec to make the v1 contract explicit
by either (A) adding a clear parse-time rejection for non-users shapes
(referencing the Shape enum and the --shape flag and validation logic) and
calling out that the request builder (which currently only emits user-account
lists) will error on orgs/channels/mixed, or (B) marking the multi-shape
behavior in sections referencing runtime/tests (lines describing multi-shape) as
"post-v1" only; pick one approach and apply it consistently to the Shape enum,
--shape flag/validation, and the request builder description so there are no
unsupported runtime paths implied.
| - `collector_member_test.go` — E1/E2 correlation join, size-bucketing | ||
| edges, percentile output, `member_added` filter rejects other | ||
| RoomEvent types. |
There was a problem hiding this comment.
Fix inconsistent test filename in the testing section.
Line [283] uses collector_member_test.go, while the layout uses members_collector_test.go. Keep one canonical name to prevent churn and mislinked implementation tasks.
Also applies to: 303-305
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/superpowers/specs/2026-05-19-load-test-room-members-design.md` around
lines 283 - 285, The docs use two different test filenames —
`collector_member_test.go` and `members_collector_test.go` — causing
inconsistency; pick the canonical name (use `members_collector_test.go` to match
the layout) and replace all occurrences of `collector_member_test.go` in this
spec (including the testing section and the other spot around the later mention)
with `members_collector_test.go` so the spec and implementation tasks reference
the same filename.
| time.Sleep(2 * time.Second) // drain trailing replies/broadcasts | ||
| collector.DiscardBefore(warmupDeadline) |
There was a problem hiding this comment.
Replace fixed sleeps with event-driven drain completion.
Line 376 and Line 567 use time.Sleep(2 * time.Second) to wait for trailing events, which is nondeterministic under varying load and can skew measured results. Wait on explicit completion criteria (e.g., in-flight counters, collector quiescence, or context-bounded channel signaling) instead.
As per coding guidelines: “Never use time.Sleep() for goroutine synchronization — use proper sync primitives (channels, sync.WaitGroup, sync.Mutex, context cancellation).”
Also applies to: 567-568
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tools/loadgen/main.go` around lines 376 - 377, Replace the blind time.Sleep
calls (the ones before collector.DiscardBefore(warmupDeadline) and the similar
sleep at the second location) with an event-driven drain: stop using time.Sleep
and instead wait for a deterministic quiescence signal from the collector or an
in-flight counter. Add or use a method like collector.WaitUntilEmpty(ctx,
timeout) or expose an in-flight counter/channel on collector (e.g.,
collector.InFlight() + a channel closed when zero) and block on that (or a
context timeout) before calling collector.DiscardBefore(warmupDeadline); ensure
both occurrences (the sleep before DiscardBefore and the other sleep at the
second location) are replaced with this wait and that a timeout/ctx is used to
avoid indefinite blocking.
| _ = metricsSrv.Shutdown(shutCtx) | ||
| cancelShut() | ||
| _ = nc.Drain() | ||
|
|
There was a problem hiding this comment.
Handle shutdown/drain errors explicitly instead of discarding them.
Line 384/386 and Line 571/573 ignore errors from server shutdown and NATS drain. This hides cleanup failures and can mask dropped metrics or pending publishes during teardown.
Suggested fix
- _ = metricsSrv.Shutdown(shutCtx)
+ if err := metricsSrv.Shutdown(shutCtx); err != nil {
+ slog.Warn("metrics shutdown", "error", err)
+ }
cancelShut()
- _ = nc.Drain()
+ if err := nc.Drain(); err != nil {
+ slog.Warn("nats drain", "error", err)
+ }As per coding guidelines: “Never ignore errors silently — comment if intentionally discarded.”
Also applies to: 571-573
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tools/loadgen/main.go` around lines 384 - 387, The shutdown code currently
discards errors from metricsSrv.Shutdown(shutCtx) and nc.Drain(); instead,
capture and handle those errors explicitly: call metricsSrv.Shutdown(shutCtx)
and check its error (log via the existing logger or return it), and do the same
for nc.Drain() (log error or propagate), and avoid discarding cancelShut()
results if it can return an error; update both occurrences (the Shutdown/Drain
calls around metricsSrv.Shutdown, cancelShut, nc.Drain and the repeated calls at
the other location flagged) to surface failures (use process logger, ctx-aware
messages, and/or return non-nil error) so cleanup failures are not silently
ignored.
| // Allow trailing events to settle before checking counters. | ||
| time.Sleep(500 * time.Millisecond) | ||
|
|
||
| // Both simulated stages must have seen traffic. | ||
| assert.Greater(t, rsCalls.Load(), int64(0), "room-service was never called") | ||
| assert.Greater(t, rwCalls.Load(), int64(0), "room-worker never consumed") |
There was a problem hiding this comment.
Replace fixed sleep with deterministic synchronization.
The fixed delay can cause flakes under slower CI/load and is explicitly discouraged for synchronization in this repo. Use an eventual/wait-based assertion instead.
Suggested fix
- // Allow trailing events to settle before checking counters.
- time.Sleep(500 * time.Millisecond)
-
- // Both simulated stages must have seen traffic.
- assert.Greater(t, rsCalls.Load(), int64(0), "room-service was never called")
- assert.Greater(t, rwCalls.Load(), int64(0), "room-worker never consumed")
+ // Both simulated stages must see traffic.
+ require.Eventually(t, func() bool {
+ return rsCalls.Load() > 0 && rwCalls.Load() > 0
+ }, 5*time.Second, 50*time.Millisecond, "simulated pipeline saw no traffic")As per coding guidelines: “Never use time.Sleep() for goroutine synchronization — use proper sync primitives.”
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Allow trailing events to settle before checking counters. | |
| time.Sleep(500 * time.Millisecond) | |
| // Both simulated stages must have seen traffic. | |
| assert.Greater(t, rsCalls.Load(), int64(0), "room-service was never called") | |
| assert.Greater(t, rwCalls.Load(), int64(0), "room-worker never consumed") | |
| // Both simulated stages must see traffic. | |
| require.Eventually(t, func() bool { | |
| return rsCalls.Load() > 0 && rwCalls.Load() > 0 | |
| }, 5*time.Second, 50*time.Millisecond, "simulated pipeline saw no traffic") |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tools/loadgen/members_integration_test.go` around lines 121 - 126, Replace
the fixed time.Sleep with a deterministic wait-based assertion: remove the 500ms
Sleep and instead use a polling/assertion that retries until both counters are >
0 (for example use assert.Eventually or a small loop with time.After and
polling) checking rsCalls.Load() > 0 and rwCalls.Load() > 0; reference the
existing symbols rsCalls.Load() and rwCalls.Load() and fail the test if the
condition does not become true within a reasonable timeout (e.g. a few seconds)
to avoid flakiness.
| func startEmbeddedJetStream(t *testing.T) (*nats.Conn, jetstream.JetStream) { | ||
| t.Helper() | ||
| opts := &natsserver.Options{ | ||
| Port: -1, | ||
| JetStream: true, | ||
| StoreDir: t.TempDir(), | ||
| } | ||
| ns, err := natsserver.NewServer(opts) | ||
| require.NoError(t, err) | ||
| ns.Start() | ||
| require.True(t, ns.ReadyForConnections(5*time.Second), "nats server did not become ready") | ||
| t.Cleanup(ns.Shutdown) | ||
| nc, err := nats.Connect(ns.ClientURL()) | ||
| require.NoError(t, err) | ||
| t.Cleanup(nc.Close) | ||
| js, err := jetstream.New(nc) | ||
| require.NoError(t, err) | ||
| return nc, js | ||
| } |
There was a problem hiding this comment.
These tests rely on a real NATS server instead of unit-level injection/mocks.
Current coverage is integration-style and violates the unit-test constraint for this repo. Please move these to explicit integration test scope or refactor publisher construction so unit tests can inject publish/subscribe fakes without booting NATS.
As per coding guidelines, “Never connect to real databases, NATS, or external services in unit tests. When a handler publishes to JetStream, inject the publish function as a field so tests can capture data without a real NATS connection.”
Also applies to: 40-141
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tools/loadgen/members_publisher_test.go` around lines 20 - 38, The tests
currently start an actual NATS server via startEmbeddedJetStream which makes
them integration tests; either move these tests into an explicit integration
test package/suite (so startEmbeddedJetStream remains but only used by
integration tests) or refactor the publisher construction to accept an
injectable publish client/function (e.g., add a PublishFunc or JetStreamer
interface parameter to the publisher constructor used by MembersPublisher or
NewMembersPublisher) so unit tests can supply a fake/mock instead of calling
startEmbeddedJetStream; update tests in the 40-141 range to use the fake
PublishFunc/JetStreamer for unit tests and keep a separate integration test that
uses startEmbeddedJetStream if end-to-end verification is needed.
| sub, err := nc.Subscribe(inboxPrefix+".*", func(m *nats.Msg) { | ||
| corr := m.Subject[len(inboxPrefix)+1:] | ||
| p.onReply(corr, m.Data, time.Now()) | ||
| }) |
There was a problem hiding this comment.
Guard onReply before invoking it in the subscription callback.
If newFrontdoorMemberPublisher is called with a nil handler, this callback path panics on the first reply. Add a nil guard (or enforce non-nil in constructor) to keep reply handling fail-safe.
Suggested fix
sub, err := nc.Subscribe(inboxPrefix+".*", func(m *nats.Msg) {
corr := m.Subject[len(inboxPrefix)+1:]
- p.onReply(corr, m.Data, time.Now())
+ if p.onReply != nil {
+ p.onReply(corr, m.Data, time.Now())
+ }
})📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| sub, err := nc.Subscribe(inboxPrefix+".*", func(m *nats.Msg) { | |
| corr := m.Subject[len(inboxPrefix)+1:] | |
| p.onReply(corr, m.Data, time.Now()) | |
| }) | |
| sub, err := nc.Subscribe(inboxPrefix+".*", func(m *nats.Msg) { | |
| corr := m.Subject[len(inboxPrefix)+1:] | |
| if p.onReply != nil { | |
| p.onReply(corr, m.Data, time.Now()) | |
| } | |
| }) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tools/loadgen/members_publisher.go` around lines 65 - 68, Guard the p.onReply
call inside the nc.Subscribe callback to avoid panics when no handler is set: in
the subscription created around inboxPrefix+".*" (the callback that computes
corr := m.Subject[len(inboxPrefix)+1:]) check if p.onReply != nil before calling
p.onReply(corr, m.Data, time.Now()), or alternatively enforce a non-nil handler
in newFrontdoorMemberPublisher by returning an error or setting a no-op handler;
update either the callback or the constructor to ensure replies are handled
safely.
| _ = p.sub.Unsubscribe() | ||
| p.sub = nil |
There was a problem hiding this comment.
Don’t silently discard Unsubscribe() errors in Close().
Handle/log the error (or at least document intentional discard) so teardown failures are observable.
As per coding guidelines, “Never ignore errors silently — comment if intentionally discarded.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tools/loadgen/members_publisher.go` around lines 79 - 80, In Close(), don't
ignore the result of p.sub.Unsubscribe(): capture the returned error from
p.sub.Unsubscribe(), and either return it (or log it via the publisher's logger)
before setting p.sub = nil so teardown failures are observable; if you
intentionally want to discard the error, add a clear comment stating why. Update
the Close method to handle the unsubscribe error from p.sub.Unsubscribe(),
referencing the Close method, the p.sub field and the Unsubscribe() call.
| ``` | ||
| make -C tools/loadgen/deploy up | ||
| make -C tools/loadgen/deploy seed-members PRESET=members-medium | ||
| make -C tools/loadgen/deploy run-sustained PRESET=members-medium RATE=100 DURATION=60s | ||
| ``` | ||
|
|
||
| For capacity-mode growth curves: | ||
|
|
||
| ``` | ||
| make -C tools/loadgen/deploy seed-members PRESET=members-capacity | ||
| make -C tools/loadgen/deploy run-capacity PRESET=members-capacity TARGET_SIZE=500 | ||
| ``` | ||
|
|
||
| Between sustained runs, reset state so candidate pools refill: | ||
|
|
||
| ``` | ||
| make -C tools/loadgen/deploy reset-members PRESET=members-medium | ||
| ``` |
There was a problem hiding this comment.
Add language identifiers to the new fenced code blocks.
The newly added fences are missing a language tag, which triggers MD040.
Suggested fix
-```
+```bash
make -C tools/loadgen/deploy up
make -C tools/loadgen/deploy seed-members PRESET=members-medium
make -C tools/loadgen/deploy run-sustained PRESET=members-medium RATE=100 DURATION=60s@@
- +bash
make -C tools/loadgen/deploy seed-members PRESET=members-capacity
make -C tools/loadgen/deploy run-capacity PRESET=members-capacity TARGET_SIZE=500
@@
-```
+```bash
make -C tools/loadgen/deploy reset-members PRESET=members-medium
</details>
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.22.1)</summary>
[warning] 93-93: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
---
[warning] 101-101: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
---
[warning] 108-108: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @tools/loadgen/README.md around lines 93 - 110, The README.md fenced code
blocks with the make commands are missing language identifiers (triggering
MD040); update each triple-backtick fence that wraps the make -C
tools/loadgen/deploy commands to use ```bash so all three blocks (the sustained
run block, the capacity-mode block, and the reset-members block) start with
longer flags MD040.Round-two simplify pass: - Add presetLabel field to both members generators, set in constructor, used in the metrics WithLabelValues call instead of g.cfg.Preset.Name. Completes the cached-label pattern introduced in the previous pass. - Doc-comment snapshotLatencies to flag that callers must already hold the collector mutex, since the function now serves both Collector and MemberCollector across files. - Add TestParseInjectMode mirroring TestParseShape. https://claude.ai/code/session_01Q8Fyx576dmfM9yfbEMtB6t
…scrape + warn on --allow-concurrent Three small follow-ups from the final ship-readiness review: 1. parseInjectMode (internal, flags.go) consolidated into ParseInjectMode (exported, generator.go). PR #203 added the exported version; this removes the internal duplicate and updates the one caller (dispatch.go:457) + test (flags_test.go:27). The generator_test.go comment that flagged this as a known follow-up is updated to note the consolidation is done. 2. bootstrap_error scrape race: the metric is incremented then Run returns the error and the process exits in <1s. With the default Prometheus scrape interval of 15s, the metric was unobservable in practice. Hold the process open for 20s (ctx-honoring; SIGINT still exits promptly, and tests with short ctx timeouts still pass quickly) so the scraper has a chance to read the counter before exit. 20s > 15s + jitter; well below operator patience for a fast-failing run. Surfaced via slog.Warn so the operator knows why the process is lingering. 3. --allow-concurrent runs share SUT resources (ES index growth, search-service capacity, NATS redelivery budget) even though msgID-keyed correlation prevents cross-credit. Emit a structured slog.Warn when the flag is set so operators don't misread numbers that look like degradation in run B but are actually saturation from run A. msgID isolation comment in CHANGES.md is unchanged (still accurate at the correctness level).
Adds the brainstorming spec for two new tools/loadgen subcommands —
members-sustained (open-loop throughput) and members-capacity
(sequential per-room growth) — covering frontdoor and canonical inject
modes, fixture/preset shape, E1/E2 correlation, and testing plan.
https://claude.ai/code/session_01Q8Fyx576dmfM9yfbEMtB6t
Summary by CodeRabbit
Release Notes
New Features
members-sustainedandmembers-capacitysubcommands to benchmark member-addition operations.Documentation
Tests