perf(tcount): bound thread reply count recount to a constant#370
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (12)
✅ Files skipped from review due to trivial changes (4)
🚧 Files skipped from review as they are similar to previous changes (5)
📝 WalkthroughWalkthroughAdds a shared bounded thread-reply counting helper, switches both writer paths to use it, and updates the related Cassandra, API, design, and implementation-plan documentation to describe the capped ChangesBounded Thread Reply Count
Sequence Diagram(s)sequenceDiagram
participant MessageWorker
participant HistoryService
participant threadcount
participant Cassandra
MessageWorker->>threadcount: Count(ctx, session, threadRoomID)
threadcount->>Cassandra: Query thread_messages_by_thread
Cassandra-->>threadcount: rows up to Cap
threadcount-->>MessageWorker: bounded count
HistoryService->>threadcount: CountAndLatest(ctx, session, threadRoomID)
threadcount->>Cassandra: Query deleted and created_at
Cassandra-->>threadcount: rows up to Cap
threadcount-->>HistoryService: bounded count + latest surviving timestamp
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
769176c to
4fde085
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (2)
pkg/threadcount/count.go (1)
1-6: 🚀 Performance & Scalability | 🔵 TrivialDoc claim of "constant" per-write cost is only true when soft-deletes are sparse.
The early-stop counts non-deleted rows, but with no CQL
LIMITthe iterator must page through every soft-deleted row that clusters ahead of the Cap-th survivor. A thread that accumulates manydeleted = truerows (or fewer thanCaplive replies among a large tombstoned partition) forces a full-partition read on every reply/delete — i.e. cost isO(deleted + Cap), not constant. This is the inherent trade-off of droppingLIMITto stay soft-delete-correct, but the package doc ("per-write cost stays constant regardless of thread size") andCount's "~Cap rows" wording understate it.Consider softening the doc to reflect the
O(survivors-scanned + interspersed deletes)worst case, and confirm soft-deleted rows inthread_messages_by_threadare bounded/compacted so this hot path can't degrade over a thread's lifetime.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/threadcount/count.go` around lines 1 - 6, Update the package comment in threadcount to avoid claiming the per-write cost is constant regardless of thread size, since Count can still scan many tombstoned rows before reaching Cap live replies. Soften the wording around Count’s “~Cap rows” behavior to reflect the worst case when soft-deletes are sparse or clustered, and mention the trade-off introduced by omitting CQL LIMIT for soft-delete correctness.history-service/internal/cassrepo/write_integration_test.go (1)
1679-1695: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winConsider asserting the returned
tlm(currently discarded).This test ignores the second return value (
_), so the over-captlmderivation inCountAndLatestis exercised but never validated. Asserting thattlmequals the newest inserted reply'screated_ateven when row count exceedsCapwould directly cover the clustering-order dependency flagged inwrite.go.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@history-service/internal/cassrepo/write_integration_test.go` around lines 1679 - 1695, The test for countThreadReplies currently drops the second return value, so it never verifies the latest-message timestamp path in CountAndLatest. Update TestRepository_countThreadReplies_CapsAtThreadcountCap to assert the returned tlm from countThreadReplies equals the newest inserted reply’s created_at, using the existing countThreadReplies and threadcount.Cap setup to cover the over-cap clustering-order behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/superpowers/plans/2026-06-21-bounded-thread-reply-count.md`:
- Around line 417-425: Add explicit language identifiers to the fenced examples
in the affected markdown sections so they satisfy markdownlint MD040; update the
code fences around the example snippets in this document to use the appropriate
fence labels (for example, SQL for the schema snippets and Markdown where
applicable), and apply the same fix to the other referenced example blocks in
the same document.
- Line 7: The plan text is inconsistent for history-service: it currently tells
the delete path to use threadcount.Count, but that path must preserve tlm
updates by using threadcount.CountAndLatest instead. Update Task 3 and every
architecture/reference section that mentions history-service delegation so
history-service explicitly calls CountAndLatest while message-worker can still
use Count, and ensure the documented return shape includes the latest timestamp
where needed.
In `@docs/superpowers/specs/2026-06-21-bounded-thread-reply-count-design.md`:
- Around line 125-135: The shared-helper contract is inconsistent with the
delete path: the history-service delete flow still relies on
threadcount.CountAndLatest to recompute tlm, not just threadcount.Count. Update
the spec wording and example API around the delete-path helper to reference
threadcount.CountAndLatest for the delete flow while keeping threadcount.Count
as the shared display-cap counter, so the contract matches the actual writer
behavior.
- Around line 10-12: The fenced CQL snippet in the design doc is missing a
language identifier, so update that markdown block to use the appropriate SQL
fence for the query. Locate the fenced block containing the SELECT statement and
change it to a properly labeled sql code block to satisfy markdown linting.
In `@pkg/threadcount/integration_test.go`:
- Around line 24-30: The flagged `admin.Query(fmt.Sprintf(...))` in
`threadcount` is a false-positive SQL/CQL injection finding because `keyspace`
comes from `testutil.CassandraKeyspace` and cannot be parameterized as an
identifier. Add the scanner’s inline suppression directive with a brief
justification directly above the `fmt.Sprintf` statement in the test setup so
the SAST gate passes, keeping the suppression scoped only to this `CREATE TABLE`
query.
---
Nitpick comments:
In `@history-service/internal/cassrepo/write_integration_test.go`:
- Around line 1679-1695: The test for countThreadReplies currently drops the
second return value, so it never verifies the latest-message timestamp path in
CountAndLatest. Update TestRepository_countThreadReplies_CapsAtThreadcountCap to
assert the returned tlm from countThreadReplies equals the newest inserted
reply’s created_at, using the existing countThreadReplies and threadcount.Cap
setup to cover the over-cap clustering-order behavior.
In `@pkg/threadcount/count.go`:
- Around line 1-6: Update the package comment in threadcount to avoid claiming
the per-write cost is constant regardless of thread size, since Count can still
scan many tombstoned rows before reaching Cap live replies. Soften the wording
around Count’s “~Cap rows” behavior to reflect the worst case when soft-deletes
are sparse or clustered, and mention the trade-off introduced by omitting CQL
LIMIT for soft-delete correctness.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 15182611-7ba4-4617-ba4e-7e61001ab487
📒 Files selected for processing (12)
docs/cassandra_message_model.mddocs/client-api.mddocs/superpowers/plans/2026-06-04-tcount-count-based.mddocs/superpowers/plans/2026-06-21-bounded-thread-reply-count.mddocs/superpowers/specs/2026-06-21-bounded-thread-reply-count-design.mdhistory-service/internal/cassrepo/write.gohistory-service/internal/cassrepo/write_integration_test.gomessage-worker/integration_test.gomessage-worker/store_cassandra.gopkg/threadcount/count.gopkg/threadcount/integration_test.gopkg/threadcount/main_test.go
4fde085 to
e62bb10
Compare
|
Addressed the review feedback in Fixed
Skipped (with reasons)
Separately, the earlier Generated by Claude Code |
e62bb10 to
63b08e5
Compare
Spec and TDD implementation plan for bounding the per-write tcount recount: keep #245's idempotent, soft-delete-aware COUNT+blind-SET but stop the partition scan at a display cap, extracted into a shared pkg/threadcount used by both authoritative writers (message-worker via Count, history-service via CountAndLatest which also recomputes tlm). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HvVrpPq7875QKs9JqmgCpn
CountAndLatest scans thread_messages_by_thread, early-breaking once the non-deleted tally reaches Cap=99 (PageSize(Cap) + n<Cap guard), so the per-write read is bounded to ~Cap surviving rows (plus any soft-deleted rows clustered ahead) rather than the whole partition. No CQL LIMIT — soft-deleted rows are live, interspersed rows, so a hard LIMIT could undercount. It also returns the latest surviving reply's created_at (tlm) for the delete path; the partition's DESC clustering order surfaces the latest survivor first. Count is a thin delegation to CountAndLatest discarding the timestamp — the add path needs only the count, and since created_at is a clustering key, selecting it adds no meaningful read cost while one shared scan keeps the two writers' counts provably identical. Integration tests cover under/over cap, deleted excluded (incl. over-cap interspersed), empty, and latest-survivor cases. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HvVrpPq7875QKs9JqmgCpn
Delegate the add-path countThreadReplies to threadcount.Count, replacing the unbounded partition scan. The blind-SET (now tcount+tlm) and the countAndSetParentTcount caller are unchanged; tlm on the add path stays the new reply's CreatedAt. Adds an integration test (using the existing setupCassandra helper) asserting the count caps at 99. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HvVrpPq7875QKs9JqmgCpn
Delegate the delete-path countThreadReplies to threadcount.CountAndLatest, which bounds the count at the same Cap as the add path (so a reply and a delete can't write divergent values and flip-flop the badge) while still recomputing tlm (latest surviving reply) from the one bounded scan. The capping integration test (using the existing setupCassandra helper) also asserts the over-cap tlm is the newest reply. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HvVrpPq7875QKs9JqmgCpn
Note the 99 cap on the tcount column (messages_by_room, messages_by_id) and on the tcount/newTcount client-api fields, and record in the #245 plan that the bounded-cap design supersedes its COUNTER-table + reconciliation-job future-work item (a counter is not idempotent under JetStream redelivery; the cap stays inside the stateless model). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HvVrpPq7875QKs9JqmgCpn
63b08e5 to
43a3c28
Compare
Summary
On the message hot path, the thread reply-count badge (
tcount) was produced by re-counting the entirethread_messages_by_threadCassandra partition on every reply and every delete — an O(thread-size) scan per write (O(N²) to build a thread), duplicated byte-for-byte inmessage-worker(add path) andhistory-service(delete path), and sitting onmessage-worker's synchronous JetStream-ack path.This bounds that recount to a constant by stopping the scan once the non-deleted tally reaches a display cap of 99, extracted into a shared
pkg/threadcountso the two authoritative writers can never drift.pkg/threadcount— newCount(ctx, *gocql.Session, threadRoomID) (int, error)returningmin(non-deleted replies, Cap)withconst Cap = 99. Early-breaks at the cap (PageSize(Cap)+n < Capguard), so it materializes ~Cap rows instead of the whole partition.countThreadRepliesis now a one-line delegation tothreadcount.Count. The blind-SEToftcountand thecountAndSetParentTcountcallers are unchanged.99(FE renders>= 99as "99+").Why a cap (and not the documented COUNTER table)
tcountwas deliberately moved to COUNT+blind-SET in #245 because it is idempotent under JetStream redelivery and soft-delete-aware — both preserved here (noLIMITis used, since soft-deleted rows are live rows interspersed in the partition; the scan readsdeletedand treats NULL as not-deleted). #245's documented follow-up (a Cassandracountertable + reconciliation job) would re-introduce the non-idempotency it eliminated and add a scheduled job; this design instead stays inside the idempotent, stateless model with no new table, job, or source of truth. The #245 plan's future-work item is updated to point here.No schema/DDL change, no migration, no new dependency.
Changes
pkg/threadcount/— new package + integration tests (under/over cap, deleted excluded, deleted-interspersed-over-cap, empty).message-worker/store_cassandra.go,history-service/internal/cassrepo/write.go— delegate to the helper; capping integration test at each site.cassandra_message_model.md(tcountcolumn),client-api.md(tcount/newTcountfields), and the feat: real-time thread reply fan-out (broadcast-worker) + reply-count badge pipeline #245 plan's superseded-note.Test Plan
make test(full unit suite) — greenmake lint— 0 issuesmake test-integration SERVICE=pkg/threadcount,…SERVICE=message-worker,…SERVICE=history-service. These were written and compile-verified under theintegrationbuild tag but could not be executed in the dev environment (no Docker).govulncheck/semgrepmust run in CI — blocked locally by the network policy (403 to their registries).🤖 Generated with Claude Code
https://claude.ai/code/session_01HvVrpPq7875QKs9JqmgCpn
Generated by Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
tcount/newTcountare capped at 99 (where 99 means “99 or more”).Tests