feat(search-sync-worker): restrictedRooms on user-room + tshow on message index#115
feat(search-sync-worker): restrictedRooms on user-room + tshow on message index#115
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 52 minutes and 37 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (17)
📝 WalkthroughWalkthroughThis PR transforms Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Member Addition
participant RoomWorker as room-worker
participant EventQueue as Event Queue
participant InboxWorker as inbox-worker
participant SearchSync as search-sync-worker
Client->>RoomWorker: Add member request<br/>(with HistoryMode)
RoomWorker->>RoomWorker: Determine hss from HistoryMode<br/>(None → timestamp, All → nil)
RoomWorker->>EventQueue: Publish MemberAddEvent<br/>(HistorySharedSince: *int64 or nil)
EventQueue->>InboxWorker: Consume MemberAddEvent
InboxWorker->>InboxWorker: Check if hss != nil<br/>Convert to UTC time if set
EventQueue->>SearchSync: Consume MemberAddEvent
SearchSync->>SearchSync: Route by nil check:<br/>nil → rooms[],<br/>non-nil → restrictedRooms[]
SearchSync->>SearchSync: Update Elasticsearch<br/>user-room document
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR #115 implements tshow indexing as part of the sync-worker prerequisite work. Update both specs to reflect: - sync-worker-ext spec: add tshow to goals; add pkg/model.Message + MessageSearchIndex field additions; add messages_test.go + model_test.go test cases covering round-trip true and omitempty behavior. - search-service spec: drop tshow from non-goals; restore Clause B1 (tshow=true) in the restricted-room per-room thread-reply branch; remove tshow from MVP parity gaps; update decision log to note the targeted exception. Restricted-user thread-reply parity now matches the old Rocket.Chat semantic. https://claude.ai/code/session_01JbAWPnNedoBwQL3usoPtS3
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/model/message.go`:
- Line 15: The Cassandra INSERT statements in SaveMessage (for messages_by_room
and messages_by_id) and SaveThreadMessage (for messages_by_id) are missing the
tshow column and thus do not bind msg.TShow; update the INSERT column lists to
include tshow and add msg.TShow to the corresponding bound values in both
SaveMessage and SaveThreadMessage so Cassandra persists the TShow field (ensure
the column name tshow matches the model tag and the bound value uses msg.TShow).
In `@room-worker/handler.go`:
- Around line 623-628: The code currently assigns historySharedSincePtr = &v
when req.History.Mode == model.HistoryModeNone which can emit a non-nil pointer
to 0 if req.Timestamp is missing; change the logic so historySharedSincePtr
remains nil for unrestricted history and only set historySharedSincePtr = &v
when req.Timestamp > 0 (i.e., check req.Timestamp before taking its address).
Update the branch that references req.History.Mode, historySharedSincePtr, and
req.Timestamp to guard against emitting &0.
In `@search-sync-worker/spotlight.go`:
- Around line 65-67: The current check in spotlight.go treats any non-nil
payload.HistorySharedSince (including a pointer to 0) as restricted and skips
indexing, which diverges from user_room.go's addRoomScript logic (which treats
hss <= 0 as unrestricted); update the condition that skips spotlight indexing to
explicitly check both non-nil and positive value (e.g., require
payload.HistorySharedSince != nil && *payload.HistorySharedSince > 0) so the
spotlight index aligns with addRoomScript's semantics and avoids mismatches if a
&0 leaks.
In `@search-sync-worker/user_room.go`:
- Around line 116-123: The code silently treats a non-nil HistorySharedSince
pointer that equals 0 (or any non-positive value) as unrestricted by assigning
hss = 0; update the validation in the block handling payload.HistorySharedSince
so that if payload.HistorySharedSince != nil and *payload.HistorySharedSince <=
0 you reject the event (return an error or drop it) instead of assigning hss;
ensure this validation occurs before addRoomScript/any indexing that uses hss so
malformed &0 (or negative) values never map into the unrestricted rooms[] index.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d95e87ed-affa-4b4c-9f4d-f5048adec285
📒 Files selected for processing (16)
inbox-worker/handler.goinbox-worker/handler_test.goinbox-worker/integration_test.gopkg/model/event.gopkg/model/message.gopkg/model/model_test.goroom-worker/handler.goroom-worker/handler_test.gosearch-sync-worker/inbox_integration_test.gosearch-sync-worker/inbox_stream.gosearch-sync-worker/messages.gosearch-sync-worker/messages_test.gosearch-sync-worker/spotlight.gosearch-sync-worker/spotlight_test.gosearch-sync-worker/user_room.gosearch-sync-worker/user_room_test.go
…ntract Addresses CodeRabbit review on PR #115: - message-worker/store_cassandra.go: SaveMessage and SaveThreadMessage now include the `tshow` column in their INSERTs for messages_by_room and messages_by_id. Without this, a message with `TShow=true` would survive in Mongo/ES but be silently lost on Cassandra-sourced reads. (`thread_messages_by_room` schema has no `tshow` column — by design, the flag lives on the parent channel row.) - room-worker/handler.go: when HistoryModeNone is set but `req.Timestamp` is missing/zero, leave `HistorySharedSince` as nil instead of emitting `&0`. The user-room painless script treats `hss <= 0` as unrestricted, so `&0` would misroute a restricted room into `rooms[]`. Logs an error because this is an upstream contract violation that we can't recover. - search-sync-worker/spotlight.go: mirror user_room.go's `hss > 0` sentinel so a leaked `&0` pointer is treated as unrestricted by both indices, preventing divergence (where user-room would index as unrestricted but spotlight would skip, making the room unsearchable by name). Skipped: CodeRabbit's suggestion to reject non-nil `<=0` HSS in user-room — the spec explicitly documents `hss <= 0 → unrestricted` as the Go↔painless sentinel, and the publisher contract is guarded by the publisher unit tests already in this PR. https://claude.ai/code/session_018uJCyUSaHiPnmnyfUDeQUj
PR #115 discovered that message-worker/store_cassandra.go was not binding the tshow column in SaveMessage or SaveThreadMessage, even though the column already existed in the DDL and UDT struct. Without this fix the tshow value is always persisted as NULL, which breaks the downstream ES index population and defeats the Clause B1 restricted-room query branch. Spec updates: - Goal #5 expanded to call out the message-worker binding fix. - New "message-worker/store_cassandra.go — bind tshow in inserts" section with the three concrete INSERT rewrites and integration test coverage. No DDL or UDT struct change — the column was already declared. https://claude.ai/code/session_01JbAWPnNedoBwQL3usoPtS3
Implements the search-service per docs/superpowers/specs/2026-04-21-search-service-design.md. Built on top of PR #115's sync-worker prerequisite (restrictedRooms{} on user-room + tshow on message index). Two endpoints on natsrouter: - chat.user.{account}.request.search.messages — global or room-scoped message search, cross-cluster (CCS) via messages-*,*:messages-* - chat.user.{account}.request.search.rooms — local spotlight search with scope filter (all / channel / dm; app rejected per MVP) Architecture: - main.go wires ES + Valkey + NATS + natsrouter with RequestID / Recovery / Logging middleware. Graceful shutdown via pkg/shutdown. SEARCH_REQUEST_TIMEOUT applied as a per-request context.WithTimeout spanning cache + ES calls; zero disables the cap. - handler.go implements the 2-tier Valkey → ES read with graceful degradation: cache failures log and fall through; only when both cache AND ES prefetch fail does the request surface ErrInternal. On cache miss, the ES-source SET is skipped when the GET already errored (transport is still down, duplicate-failure logs add noise). - query_messages.go builds deterministic ES bodies. Global search uses an ES terms-lookup against the user-room doc so the full rooms[] array never crosses the wire. Scoped search partitions roomIds into unrestricted (bool.filter AND'd with the terms-lookup so callers can't reach rooms they don't belong to by passing arbitrary roomIds) and restricted (Clause A on createdAt >= hss with must_not exists threadParentMessageId, plus Clause B for thread replies via tshow=true OR parent-after-hss). recentWindowToGte emits single-unit ES date-math ("8760h", "48h") — compound forms like Go's Duration.String "8760h0m0s" fail ES parsing. - query_rooms.go scope filters use model.RoomTypeChannel/DM ("channel"/"dm") — the values sync-worker actually writes — not the Rocket.Chat legacy "p"/"d". - store.go defines two narrow consumer interfaces (SearchStore, RestrictedRoomCache) so handler tests can wire fakes without real ES or Valkey. store_es.go wraps pkg/searchengine via a local esEngine interface (Search + GetDoc) to keep store tests purely unit. userRoomIndex falls back to the UserRoomIndex constant via a single resolveUserRoomIndex helper so the default is owned in one place. Error taxonomy (all via natsrouter.RouteError): - empty searchText / negative size|offset / scope=app / unknown scope → ErrBadRequest - cache + ES prefetch both fail → ErrInternal("unable to resolve room access") - ES _search fail → ErrInternal("search backend unavailable") - parse fail → ErrInternal("unexpected search response") - size > MAX silently clamped (no error) Deploy: multi-stage Dockerfile (non-root app user, alpine 3.21 runtime), docker-compose.yml wiring NATS/ES/Valkey, azure-pipelines.yml matching the peer-service pattern (Go 1.25.9 aligned with CI). Spec: docs/superpowers/specs/2026-04-21-search-service-design.md with MD040-compliant fence tokens. Sync-worker: inbox_stream.go and spotlight.go HSS comments clarify the Go↔painless sentinel contract (hss > 0 means restricted; nil / <=0 is unrestricted) — a leaked &0 is treated as unrestricted by both indices. Companion: depends on PR #115 for the restrictedRooms{} shape on the user-room ES doc and tshow on the message index. https://claude.ai/code/session_01J5V5cbBdzmh8NWaBEtrSjq
Follow-up to CodeRabbit review on PR #115: - room-worker/handler.go: collapse 5-line WHAT comment on the HSS pointer construction to 3 lines (keep the WHY, drop the narration). - search-sync-worker/user_room.go: the sentinel-contract comment duplicates the docstring already on addRoomScript — collapsed to one line referring to the script's docstring as the source of truth. - search-sync-worker/spotlight.go: trim the 10-line skip comment to 3 lines that state the MVP skip + why we mirror user-room's > 0 check. - search-sync-worker/inbox_integration_test.go: rename hssPtr → int64Ptr and drop the stale doc (the \"never &0\" contract belongs on the model field, not on a test helper that just builds a pointer). No behavior change. https://claude.ai/code/session_018uJCyUSaHiPnmnyfUDeQUj
Implements the search-service per docs/superpowers/specs/2026-04-21-search-service-design.md. Built on top of PR #115's sync-worker prerequisite (restrictedRooms{} on user-room + tshow on message index). Two endpoints on natsrouter: - chat.user.{account}.request.search.messages — global or room-scoped message search, cross-cluster (CCS) via messages-*,*:messages-* - chat.user.{account}.request.search.rooms — local spotlight search with scope filter (all / channel / dm; app rejected per MVP) Architecture: - main.go wires ES + Valkey + NATS + natsrouter with RequestID / Recovery / Logging middleware. Graceful shutdown via pkg/shutdown. SEARCH_REQUEST_TIMEOUT applied as a per-request context.WithTimeout spanning cache + ES calls; zero disables the cap. - handler.go implements the 2-tier Valkey → ES read with graceful degradation: cache failures log and fall through; only when both cache AND ES prefetch fail does the request surface ErrInternal. On cache miss, the ES-source SET is skipped when the GET already errored (transport is still down, duplicate-failure logs add noise). - query_messages.go builds deterministic ES bodies. Global search uses an ES terms-lookup against the user-room doc so the full rooms[] array never crosses the wire. Scoped search partitions roomIds into unrestricted (bool.filter AND'd with the terms-lookup so callers can't reach rooms they don't belong to by passing arbitrary roomIds) and restricted (Clause A on createdAt >= hss with must_not exists threadParentMessageId, plus Clause B for thread replies via tshow=true OR parent-after-hss). recentWindowToGte emits single-unit ES date-math ("8760h", "48h") — compound forms like Go's Duration.String "8760h0m0s" fail ES parsing. - query_rooms.go scope filters use model.RoomTypeChannel/DM ("channel"/"dm") — the values sync-worker actually writes — not the Rocket.Chat legacy "p"/"d". - store.go defines two narrow consumer interfaces (SearchStore, RestrictedRoomCache) so handler tests can wire fakes without real ES or Valkey. store_es.go wraps pkg/searchengine via a local esEngine interface (Search + GetDoc) to keep store tests purely unit. userRoomIndex falls back to the UserRoomIndex constant via a single resolveUserRoomIndex helper so the default is owned in one place. Error taxonomy (all via natsrouter.RouteError): - empty searchText / negative size|offset / scope=app / unknown scope → ErrBadRequest - cache + ES prefetch both fail → ErrInternal("unable to resolve room access") - ES _search fail → ErrInternal("search backend unavailable") - parse fail → ErrInternal("unexpected search response") - size > MAX silently clamped (no error) Deploy: multi-stage Dockerfile (non-root app user, alpine 3.21 runtime), docker-compose.yml wiring NATS/ES/Valkey, azure-pipelines.yml matching the peer-service pattern (Go 1.25.9 aligned with CI). Spec: docs/superpowers/specs/2026-04-21-search-service-design.md with MD040-compliant fence tokens. Sync-worker: inbox_stream.go and spotlight.go HSS comments clarify the Go↔painless sentinel contract (hss > 0 means restricted; nil / <=0 is unrestricted) — a leaked &0 is treated as unrestricted by both indices. Companion: depends on PR #115 for the restrictedRooms{} shape on the user-room ES doc and tshow on the message index. https://claude.ai/code/session_01J5V5cbBdzmh8NWaBEtrSjq
…ictedRooms map
Extends the user-room ES doc with a `restrictedRooms` map (rid → historySharedSince
millis) alongside the existing `rooms[]`, so search-service can enforce
restricted-room history windows directly from ES without a query-time
MongoDB lookup.
- `pkg/model.InboxMemberEvent` / `MemberAddEvent`: `HistorySharedSince`
changes from `int64` to `*int64` with `omitempty` to disambiguate
"unrestricted" (nil) from "restricted-since-timestamp". Publishers must
emit nil for unrestricted rooms — the Go↔painless sentinel treats any
`hss <= 0` as unrestricted.
- `search-sync-worker/user_room.go`: add-script routes by `params.hss`;
remove-script evicts from both `rooms[]` and `restrictedRooms{}`.
Template gains `restrictedRooms` as `flattened`.
- `search-sync-worker/spotlight.go`: MVP skip lifted from `!= 0` to
`!= nil` — spotlight still does not index restricted rooms.
- `room-worker`: publisher builds `*int64` from `req.Timestamp` only when
`History.Mode == None`.
- `inbox-worker`: handler checks `!= nil` before dereferencing.
- Tests: model round-trip covers nil/non-nil wire shape; publisher tests
assert pointer propagation and wire-omission; integration test updated
to expect restricted bulk lands in `restrictedRooms{}` instead of no-op.
Prerequisite for PR #114 (search-service). No behavior change for
unrestricted rooms.
https://claude.ai/code/session_018uJCyUSaHiPnmnyfUDeQUj
Adds `tshow` to pkg/model.Message and threads it through to MessageSearchIndex so search-service can filter on thread-reply visibility without a separate Cassandra lookup. The ES mapping is `boolean` and the field is omitempty on both the wire and the index doc — unset messages stay out of the inverted index. Closes the MVP parity gap called out in the search-service spec for tshow. hidden/archived/prid remain deferred. https://claude.ai/code/session_018uJCyUSaHiPnmnyfUDeQUj
…ntract Addresses CodeRabbit review on PR #115: - message-worker/store_cassandra.go: SaveMessage and SaveThreadMessage now include the `tshow` column in their INSERTs for messages_by_room and messages_by_id. Without this, a message with `TShow=true` would survive in Mongo/ES but be silently lost on Cassandra-sourced reads. (`thread_messages_by_room` schema has no `tshow` column — by design, the flag lives on the parent channel row.) - room-worker/handler.go: when HistoryModeNone is set but `req.Timestamp` is missing/zero, leave `HistorySharedSince` as nil instead of emitting `&0`. The user-room painless script treats `hss <= 0` as unrestricted, so `&0` would misroute a restricted room into `rooms[]`. Logs an error because this is an upstream contract violation that we can't recover. - search-sync-worker/spotlight.go: mirror user_room.go's `hss > 0` sentinel so a leaked `&0` pointer is treated as unrestricted by both indices, preventing divergence (where user-room would index as unrestricted but spotlight would skip, making the room unsearchable by name). Skipped: CodeRabbit's suggestion to reject non-nil `<=0` HSS in user-room — the spec explicitly documents `hss <= 0 → unrestricted` as the Go↔painless sentinel, and the publisher contract is guarded by the publisher unit tests already in this PR. https://claude.ai/code/session_018uJCyUSaHiPnmnyfUDeQUj
Follow-up to CodeRabbit review on PR #115: - room-worker/handler.go: collapse 5-line WHAT comment on the HSS pointer construction to 3 lines (keep the WHY, drop the narration). - search-sync-worker/user_room.go: the sentinel-contract comment duplicates the docstring already on addRoomScript — collapsed to one line referring to the script's docstring as the source of truth. - search-sync-worker/spotlight.go: trim the 10-line skip comment to 3 lines that state the MVP skip + why we mirror user-room's > 0 check. - search-sync-worker/inbox_integration_test.go: rename hssPtr → int64Ptr and drop the stale doc (the \"never &0\" contract belongs on the model field, not on a test helper that just builds a pointer). No behavior change. https://claude.ai/code/session_018uJCyUSaHiPnmnyfUDeQUj
38b834c to
82cf5d5
Compare
Implements the search-service per docs/superpowers/specs/2026-04-21-search-service-design.md. Built on top of PR #115's sync-worker prerequisite (restrictedRooms{} on user-room + tshow on message index). Two endpoints on natsrouter: - chat.user.{account}.request.search.messages — global or room-scoped message search, cross-cluster (CCS) via messages-*,*:messages-* - chat.user.{account}.request.search.rooms — local spotlight search with scope filter (all / channel / dm; app rejected per MVP) Architecture: - main.go wires ES + Valkey + NATS + natsrouter with RequestID / Recovery / Logging middleware. Graceful shutdown via pkg/shutdown. SEARCH_REQUEST_TIMEOUT applied as a per-request context.WithTimeout spanning cache + ES calls; zero disables the cap. - handler.go implements the 2-tier Valkey → ES read with graceful degradation: cache failures log and fall through; only when both cache AND ES prefetch fail does the request surface ErrInternal. On cache miss, the ES-source SET is skipped when the GET already errored (transport is still down, duplicate-failure logs add noise). - query_messages.go builds deterministic ES bodies. Global search uses an ES terms-lookup against the user-room doc so the full rooms[] array never crosses the wire. Scoped search partitions roomIds into unrestricted (bool.filter AND'd with the terms-lookup so callers can't reach rooms they don't belong to by passing arbitrary roomIds) and restricted (Clause A on createdAt >= hss with must_not exists threadParentMessageId, plus Clause B for thread replies via tshow=true OR parent-after-hss). recentWindowToGte emits single-unit ES date-math ("8760h", "48h") — compound forms like Go's Duration.String "8760h0m0s" fail ES parsing. - query_rooms.go scope filters use model.RoomTypeChannel/DM ("channel"/"dm") — the values sync-worker actually writes — not the Rocket.Chat legacy "p"/"d". - store.go defines two narrow consumer interfaces (SearchStore, RestrictedRoomCache) so handler tests can wire fakes without real ES or Valkey. store_es.go wraps pkg/searchengine via a local esEngine interface (Search + GetDoc) to keep store tests purely unit. userRoomIndex falls back to the UserRoomIndex constant via a single resolveUserRoomIndex helper so the default is owned in one place. Error taxonomy (all via natsrouter.RouteError): - empty searchText / negative size|offset / scope=app / unknown scope → ErrBadRequest - cache + ES prefetch both fail → ErrInternal("unable to resolve room access") - ES _search fail → ErrInternal("search backend unavailable") - parse fail → ErrInternal("unexpected search response") - size > MAX silently clamped (no error) Deploy: multi-stage Dockerfile (non-root app user, alpine 3.21 runtime), docker-compose.yml wiring NATS/ES/Valkey, azure-pipelines.yml matching the peer-service pattern (Go 1.25.9 aligned with CI). Spec: docs/superpowers/specs/2026-04-21-search-service-design.md with MD040-compliant fence tokens. Sync-worker: inbox_stream.go and spotlight.go HSS comments clarify the Go↔painless sentinel contract (hss > 0 means restricted; nil / <=0 is unrestricted) — a leaked &0 is treated as unrestricted by both indices. Companion: depends on PR #115 for the restrictedRooms{} shape on the user-room ES doc and tshow on the message index. https://claude.ai/code/session_01J5V5cbBdzmh8NWaBEtrSjq
Eight in-bounds items from the round-3 review; not pushed yet pending PR #115 rebase on main. search-service/handler.go - searchRooms: sanitize non-RouteError from buildRoomQuery to ErrInternal (parity with searchMessages). RouteError for bad scope / scope=app / unknown still passes through unchanged; a marshal error (unreachable but possible) no longer leaks raw internal text to the client. Adds `errors.As` + the `errors` import. search-service/query_messages.go - termsLookupClause no longer calls resolveUserRoomIndex — buildMessage Query already normalizes at entry and threads the resolved value through the clause graph. Removes the redundant call and notes the contract in the docstring. search-service/integration_test.go - Set HandlerConfig.UserRoomIndex alongside NewESStore's index in the CCS fixture so the integration path exercises the full SEARCH_USER_ROOM_INDEX wiring end-to-end (store + query builder both pick up the configured value). search-service/main.go - Comment on Config struct explaining that ES and Search share the SEARCH_ env prefix; future fields added to either struct must avoid name collisions. search-service/store_es_test.go - stubEngine now captures the Search body argument (copied to avoid caller-buffer aliasing). TestESStore_Search_DelegatesToEngine asserts the body is forwarded unmodified — closes a delegation-gap where a future store wrapper could mutate queries without any test signal. pkg/model/model_test.go - TestSearchRoomsRequestJSON "full" subtest now uses the shared `roundTrip` helper (matches the file-wide pattern). - TestSearchMessagesRequestJSON stays as hand-rolled marshal/ DeepEqual: SearchMessagesRequest contains `[]string` RoomIds, making it non-comparable, so the `roundTrip[T comparable]` generic rejects it. CodeRabbit's suggestion was partially incorrect on that front. docs/superpowers/specs/2026-04-21-search-service-design.md - Room-search restricted-handling row: "spotlight MVP skips `hss != nil`" → "spotlight MVP skips `hss > 0`" + explicit note that `hss <= 0` (nil, &0, negative) is the intentional Go↔painless unrestricted sentinel. Aligns with spotlight.go and inbox_stream.go. .github/workflows/ci.yml - Emit a SINGLE multi-line annotation per integration log (50 lines joined with %0A) instead of 50 individual ::error:: lines. GitHub's Checks UI caps at 10 error annotations per step; the old loop would truncate in-UI even when all 50 land in the annotations API. The single-message form keeps every line visible. - Defensively substitute literal `::` with U+2027 (HYPHENATION POINT) in log lines before embedding so a log line that happens to start with `::cmd::` can't be re-parsed by the runner as a workflow command. Skipped with rationale: - stubEngine / user_room.go / inbox_integration_test.go HSS comment updates — PR #115 territory; that branch owns the files. - Switch startNATS to testcontainers-go/modules/nats — adds go.mod dep for low value; current GenericContainer path works and CI is green. Flag as follow-up when the module is used elsewhere. - Azure-pipelines coverage gate — search-service's 73.7% is below the 80% threshold because main.go is an unreachable-from-unit-tests startup harness. Proper gate needs either a main.go exclusion or cross-service rollout with per-service thresholds; landing it asymmetrically on search-service alone would be noisy. https://claude.ai/code/session_01J5V5cbBdzmh8NWaBEtrSjq
8 items from the round-4 review on commits b248eca + 837aee2. Real bugs: - query_messages.go restrictedRoomClauseB: add `createdAt >= hssISO` as an outer gate on the thread-reply branch. Previously, a pre-HSS reply flagged `tshow=true` would leak restricted-room history the user never had access to because the tshow `should` was evaluated regardless of when the reply itself was written. Both Clause A (non-thread) and Clause B (thread-reply) now enforce the HSS boundary. Updated TestBuildMessageQuery_GlobalWithRestricted to pin the new 4-slot `must` (term + exists + createdAt-range + inner-bool) layout. - query_messages.go recentWindowToGte: drop the millisecond branch. ES date-math has no sub-second unit (supported: y/M/w/d/h/H/m/s). Sub-second durations now ceil up to the next whole second so a misconfigured value widens the window rather than collapsing it (which would silently drop matches). Updated the unit test to expect "2s" for 1.5s and "91s" for 90.5s. - pkg/searchengine/adapter.go: propagate io.ReadAll errors on the five `status != 200` branches (Bulk, UpsertTemplate, GetIndexMapping, Search, GetDoc). Previously the discards silently collapsed transport/body read failures into an empty backend-error string; now the read error is wrapped and returned as the primary error so operators can distinguish "ES returned a 500 with no body" from "we couldn't read the body of whatever ES returned". CI hardening: - .github/workflows/ci.yml: swap `grep -v '^$'` for `sed '/^$/d'` in the failure-tail surfacer. Under `bash -eo pipefail`, an all-blank log would make grep exit 1, aborting the assignment before later log files got annotated. sed drops blanks without flagging exit. - Drop the unused `id: int_search_service` step reference. Repo conventions: - search-service/handler.go: unexport Handler → handler, HandlerConfig → handlerConfig, NewHandler → newHandler. Everything in this file is `package main` — no external consumer — so the uppercase names were leaking internals per CLAUDE.md Section 3 ("Export only what other packages consume"). - search-service/deploy/azure-pipelines.yml: raise the service gate back to 80% and exclude main.go from the coverage profile (awk filter drops any line matching `/main.go:`, preserving the `mode:` header). Service coverage with main.go filtered out: 88.6%, well above the 80% repo minimum. The lower 70% threshold was always a workaround for an unreachable startup harness — proper fix is to exclude the harness, not lower the bar. - docs/...search-service-design.md: add SEARCH_USER_ROOM_INDEX and SEARCH_METRICS_ADDR to the SearchConfig snippet + env-var table so deploy docs match the binary. - search-service/store_es_test.go: replace `_, _, _ = s.GetUserRoomDoc` with require.NoError so the default-index assertion can't pass when the call silently starts failing. Skipped (not in this PR's scope): - room-worker/handler.go split-brain + handler_test.go edge-case test — PR #115 territory. https://claude.ai/code/session_01J5V5cbBdzmh8NWaBEtrSjq
Implements the search-service per docs/superpowers/specs/2026-04-21-search-service-design.md. Built on top of PR #115's sync-worker prerequisite (restrictedRooms{} on user-room + tshow on message index). Two endpoints on natsrouter: - chat.user.{account}.request.search.messages — global or room-scoped message search, cross-cluster (CCS) via messages-*,*:messages-* - chat.user.{account}.request.search.rooms — local spotlight search with scope filter (all / channel / dm; app rejected per MVP) Architecture: - main.go wires ES + Valkey + NATS + natsrouter with RequestID / Recovery / Logging middleware. Graceful shutdown via pkg/shutdown. SEARCH_REQUEST_TIMEOUT applied as a per-request context.WithTimeout spanning cache + ES calls; zero disables the cap. - handler.go implements the 2-tier Valkey → ES read with graceful degradation: cache failures log and fall through; only when both cache AND ES prefetch fail does the request surface ErrInternal. On cache miss, the ES-source SET is skipped when the GET already errored (transport is still down, duplicate-failure logs add noise). - query_messages.go builds deterministic ES bodies. Global search uses an ES terms-lookup against the user-room doc so the full rooms[] array never crosses the wire. Scoped search partitions roomIds into unrestricted (bool.filter AND'd with the terms-lookup so callers can't reach rooms they don't belong to by passing arbitrary roomIds) and restricted (Clause A on createdAt >= hss with must_not exists threadParentMessageId, plus Clause B for thread replies via tshow=true OR parent-after-hss). recentWindowToGte emits single-unit ES date-math ("8760h", "48h") — compound forms like Go's Duration.String "8760h0m0s" fail ES parsing. - query_rooms.go scope filters use model.RoomTypeChannel/DM ("channel"/"dm") — the values sync-worker actually writes — not the Rocket.Chat legacy "p"/"d". - store.go defines two narrow consumer interfaces (SearchStore, RestrictedRoomCache) so handler tests can wire fakes without real ES or Valkey. store_es.go wraps pkg/searchengine via a local esEngine interface (Search + GetDoc) to keep store tests purely unit. userRoomIndex falls back to the UserRoomIndex constant via a single resolveUserRoomIndex helper so the default is owned in one place. Error taxonomy (all via natsrouter.RouteError): - empty searchText / negative size|offset / scope=app / unknown scope → ErrBadRequest - cache + ES prefetch both fail → ErrInternal("unable to resolve room access") - ES _search fail → ErrInternal("search backend unavailable") - parse fail → ErrInternal("unexpected search response") - size > MAX silently clamped (no error) Deploy: multi-stage Dockerfile (non-root app user, alpine 3.21 runtime), docker-compose.yml wiring NATS/ES/Valkey, azure-pipelines.yml matching the peer-service pattern (Go 1.25.9 aligned with CI). Spec: docs/superpowers/specs/2026-04-21-search-service-design.md with MD040-compliant fence tokens. Sync-worker: inbox_stream.go and spotlight.go HSS comments clarify the Go↔painless sentinel contract (hss > 0 means restricted; nil / <=0 is unrestricted) — a leaked &0 is treated as unrestricted by both indices. Companion: depends on PR #115 for the restrictedRooms{} shape on the user-room ES doc and tshow on the message index. https://claude.ai/code/session_01J5V5cbBdzmh8NWaBEtrSjq
Eight in-bounds items from the round-3 review; not pushed yet pending PR #115 rebase on main. search-service/handler.go - searchRooms: sanitize non-RouteError from buildRoomQuery to ErrInternal (parity with searchMessages). RouteError for bad scope / scope=app / unknown still passes through unchanged; a marshal error (unreachable but possible) no longer leaks raw internal text to the client. Adds `errors.As` + the `errors` import. search-service/query_messages.go - termsLookupClause no longer calls resolveUserRoomIndex — buildMessage Query already normalizes at entry and threads the resolved value through the clause graph. Removes the redundant call and notes the contract in the docstring. search-service/integration_test.go - Set HandlerConfig.UserRoomIndex alongside NewESStore's index in the CCS fixture so the integration path exercises the full SEARCH_USER_ROOM_INDEX wiring end-to-end (store + query builder both pick up the configured value). search-service/main.go - Comment on Config struct explaining that ES and Search share the SEARCH_ env prefix; future fields added to either struct must avoid name collisions. search-service/store_es_test.go - stubEngine now captures the Search body argument (copied to avoid caller-buffer aliasing). TestESStore_Search_DelegatesToEngine asserts the body is forwarded unmodified — closes a delegation-gap where a future store wrapper could mutate queries without any test signal. pkg/model/model_test.go - TestSearchRoomsRequestJSON "full" subtest now uses the shared `roundTrip` helper (matches the file-wide pattern). - TestSearchMessagesRequestJSON stays as hand-rolled marshal/ DeepEqual: SearchMessagesRequest contains `[]string` RoomIds, making it non-comparable, so the `roundTrip[T comparable]` generic rejects it. CodeRabbit's suggestion was partially incorrect on that front. docs/superpowers/specs/2026-04-21-search-service-design.md - Room-search restricted-handling row: "spotlight MVP skips `hss != nil`" → "spotlight MVP skips `hss > 0`" + explicit note that `hss <= 0` (nil, &0, negative) is the intentional Go↔painless unrestricted sentinel. Aligns with spotlight.go and inbox_stream.go. .github/workflows/ci.yml - Emit a SINGLE multi-line annotation per integration log (50 lines joined with %0A) instead of 50 individual ::error:: lines. GitHub's Checks UI caps at 10 error annotations per step; the old loop would truncate in-UI even when all 50 land in the annotations API. The single-message form keeps every line visible. - Defensively substitute literal `::` with U+2027 (HYPHENATION POINT) in log lines before embedding so a log line that happens to start with `::cmd::` can't be re-parsed by the runner as a workflow command. Skipped with rationale: - stubEngine / user_room.go / inbox_integration_test.go HSS comment updates — PR #115 territory; that branch owns the files. - Switch startNATS to testcontainers-go/modules/nats — adds go.mod dep for low value; current GenericContainer path works and CI is green. Flag as follow-up when the module is used elsewhere. - Azure-pipelines coverage gate — search-service's 73.7% is below the 80% threshold because main.go is an unreachable-from-unit-tests startup harness. Proper gate needs either a main.go exclusion or cross-service rollout with per-service thresholds; landing it asymmetrically on search-service alone would be noisy. https://claude.ai/code/session_01J5V5cbBdzmh8NWaBEtrSjq
8 items from the round-4 review on commits b248eca + 837aee2. Real bugs: - query_messages.go restrictedRoomClauseB: add `createdAt >= hssISO` as an outer gate on the thread-reply branch. Previously, a pre-HSS reply flagged `tshow=true` would leak restricted-room history the user never had access to because the tshow `should` was evaluated regardless of when the reply itself was written. Both Clause A (non-thread) and Clause B (thread-reply) now enforce the HSS boundary. Updated TestBuildMessageQuery_GlobalWithRestricted to pin the new 4-slot `must` (term + exists + createdAt-range + inner-bool) layout. - query_messages.go recentWindowToGte: drop the millisecond branch. ES date-math has no sub-second unit (supported: y/M/w/d/h/H/m/s). Sub-second durations now ceil up to the next whole second so a misconfigured value widens the window rather than collapsing it (which would silently drop matches). Updated the unit test to expect "2s" for 1.5s and "91s" for 90.5s. - pkg/searchengine/adapter.go: propagate io.ReadAll errors on the five `status != 200` branches (Bulk, UpsertTemplate, GetIndexMapping, Search, GetDoc). Previously the discards silently collapsed transport/body read failures into an empty backend-error string; now the read error is wrapped and returned as the primary error so operators can distinguish "ES returned a 500 with no body" from "we couldn't read the body of whatever ES returned". CI hardening: - .github/workflows/ci.yml: swap `grep -v '^$'` for `sed '/^$/d'` in the failure-tail surfacer. Under `bash -eo pipefail`, an all-blank log would make grep exit 1, aborting the assignment before later log files got annotated. sed drops blanks without flagging exit. - Drop the unused `id: int_search_service` step reference. Repo conventions: - search-service/handler.go: unexport Handler → handler, HandlerConfig → handlerConfig, NewHandler → newHandler. Everything in this file is `package main` — no external consumer — so the uppercase names were leaking internals per CLAUDE.md Section 3 ("Export only what other packages consume"). - search-service/deploy/azure-pipelines.yml: raise the service gate back to 80% and exclude main.go from the coverage profile (awk filter drops any line matching `/main.go:`, preserving the `mode:` header). Service coverage with main.go filtered out: 88.6%, well above the 80% repo minimum. The lower 70% threshold was always a workaround for an unreachable startup harness — proper fix is to exclude the harness, not lower the bar. - docs/...search-service-design.md: add SEARCH_USER_ROOM_INDEX and SEARCH_METRICS_ADDR to the SearchConfig snippet + env-var table so deploy docs match the binary. - search-service/store_es_test.go: replace `_, _, _ = s.GetUserRoomDoc` with require.NoError so the default-index assertion can't pass when the call silently starts failing. Skipped (not in this PR's scope): - room-worker/handler.go split-brain + handler_test.go edge-case test — PR #115 territory. https://claude.ai/code/session_01J5V5cbBdzmh8NWaBEtrSjq
Summary
Prerequisite for PR #114 (search-service). Lands the
search-sync-worker+pkg/modelchanges sosearch-servicecan readrestrictedRooms{}directly from the user-room ES doc (no Mongo at query time), brings thetshowMVP parity gap forward, and hardens theHistorySharedSincepublisher contract end-to-end.Spec:
docs/superpowers/specs/2026-04-21-search-service-sync-worker-extension-design.md(added in #114).Changes
1.
*int64onHistorySharedSince+ publisher contract guardspkg/model.InboxMemberEvent/MemberAddEvent:int64→*int64withomitemptyto disambiguate "unrestricted" (nil) from "restricted-since-timestamp" (non-nil).nilfor unrestricted rooms — never&0. The Go↔painless sentinel treats anyhss <= 0as unrestricted.room-workerbuilds the pointer only whenHistory.Mode == None && req.Timestamp > 0; logs an error (leaves pointer nil) if a restricted request arrives with a missing timestamp rather than silently emitting&0.inbox-workerconsumer checks!= nilbefore dereferencing.2.
restrictedRooms{}on user-room ES docrestrictedRoomsasflattened(same approach asroomTimestamps).params.hss:hss > 0→restrictedRooms[rid] = hss+ evict fromrooms[];hss <= 0→rooms[]+ evict fromrestrictedRooms{}.roomTimestampsentry.hss > 0sentinel so a leaked&0is treated consistently as unrestricted by both indices (documented MVP gap for restricted-room typeahead).3.
tshowend-to-end on the message pathpkg/model.MessagegainsTShow boolwithjson:"tshow,omitempty" bson:"tshow,omitempty".message-worker'sSaveMessage(→messages_by_room,messages_by_id) andSaveThreadMessage(→messages_by_id) now bindmsg.TShow.thread_messages_by_roomis unchanged — its schema has notshowcolumn by design (the flag lives on the parent channel row).MessageSearchIndexgainsTShowwithes:"boolean"; the mapping auto-picks it up viaesPropertiesFromStruct.omitemptyon the wire, Mongo bson, and ES doc so unset messages don't bloat the inverted index.Tests
HistorySharedSinceand wire omission.room-worker.processAddMembersassert pointer propagation (restricted) and wire omission (unrestricted), both for the localRoomMemberEventand the batched cross-site outbox payload.TestUserRoomSync_Integrationexpects the restricted bulk to land inrestrictedRooms{};TestUserRoomSync_BulkInvitegains an eviction-on-remove subtest for the restricted path.tshowround-trip (true) and wire omission (false), plus the template mapping assertion.Test plan
make lint— 0 issuesmake test— all packages passrestrictedRooms{}from the doc shape this PR establishes)Review notes
CodeRabbit's suggestion to reject non-nil
<=0HistorySharedSinceat the user-room consumer was declined — the spec explicitly documentshss <= 0 → unrestrictedas the Go↔painless sentinel, with publisher-side guards (theroom-workerTimestamp > 0check + publisher unit tests) as the source of truth. Full rationale in the resolved review thread.https://claude.ai/code/session_018uJCyUSaHiPnmnyfUDeQUj