explorer: search-results "50+" → real "N of M" count (closes #232) by rdhyee · Pull Request #236 · isamplesorg/isamplesorg.github.io

rdhyee · 2026-05-23T03:00:09Z

Summary

Roadmap step 2 of #234 (#232 "50+" → real count, quick-win).

The in-map search-results line used to read 50+ results for "pottery" whenever the SELECT hit LIMIT 50, hiding whether the true result set is 51 or 50,000. For pottery Cyprus (world scope) the real count is ~7,124; users saw 50+ and couldn't gauge whether to refine.

Before: 50+ results for "pottery"
After: 50 of 7,124 results for "pottery"

Mirrors the pattern used to close #206 (real-count follow-up in loadViewportSamples()).

Approach

When the SELECT cap is hit, fire a follow-up COUNT(*) with the same WHERE clause and replace the label.
Below the cap, results.length IS the truth — no second scan.
Initial render keeps 50+ while COUNT is in flight, so users see "more exist" signaling immediately.

Key implementation details

Effective query shape. The area-scope path can fall back to the world-mode SELECT when viewerBboxSQL() returns null. Track which shape actually ran (effectiveQueryShape / effectiveBboxSQL) so the COUNT uses the matching WHERE/JOIN. Keying off effectiveScope === 'area' would have generated a malformed COUNT in the fallback case.
Cancellation. A second search bumps _searchSeq. The COUNT guards against searchId !== _searchSeq so a stale COUNT can't overwrite a fresher search's text or pollute its structured log.
Failure handling. A COUNT error leaves the initial 50+ label in place — under-disclosure beats clearing the line. Logged via console.warn; count_ms is still recorded.
Structured log (Explorer FTS Track 1a: Browser perf-smoke baseline #167) additions. total_count (real count, or null when not computed) and count_ms (COUNT scan duration). elapsed_ms continues to mark end-of-handler so it remains an end-to-end "search complete" latency including the COUNT.

Test plan

New spec tests/playwright/search-real-count.spec.js:
- cap-hit case (pottery): initial 50+ then 50 of N (N>50), structured log carries total_count and count_ms > 0.
- no-results case (xyzzyqqqplugh): still short-circuits to No results; COUNT not fired; total_count: null, count_ms: 0.
Fault-injection: both new tests FAIL against pre-fix code, PASS against the fix.
Local regression: search-real-count + url-roundtrip + facetnote-url-load → 9 passed (1.7m).
Manual: verify against the cold-cache deploy that the COUNT scan time is acceptable for pottery / pottery Cyprus.

What's NOT in this PR

Search-as-global-filter (A1) — that's step 5 of explorer: architectural direction — make filter semantics coherent across all surfaces #234, gated on FTS landing first.
Pagination past 50 — out of scope per explorer: search-results '50+' label should show the real count #232.
Real BM25/FTS substrate — Explorer FTS Track 2: search_index_v1 contract doc #169-Explorer FTS Track 5: GO/NO-GO decision gate #172 line of work.

Acceptance criteria (from #232)

Search line reads 50 of N results for "term" when the cap is hit.
Search line reads 12 results for "term" (unchanged) when below the cap.
No regression in cancellation behavior when the user types quickly (guarded via _searchSeq invariant; covered conceptually rather than via a flaky two-SELECT race in Playwright).

🤖 Generated with Claude Code

…org#232) The search-results line under the in-map search box used to read `50+ results for "pottery"` whenever the SELECT hit `LIMIT 50`, hiding whether the true result set was 51 or 50,000. For `pottery Cyprus` (world scope) the real count is ~7,124; users saw `50+` and had no way to gauge whether to refine, scroll, or look elsewhere. Fix follows the same pattern that closed isamplesorg#206 in `loadViewportSamples()`: when the SELECT cap is hit, fire a follow-up `COUNT(*)` with the same WHERE clause and replace the label. Below the cap, `results.length` IS the truth and we skip the second scan. Before: `50+ results for "pottery"` After: `50 of 7,124 results for "pottery"` Implementation notes: - The area-scope path can fall back to the world-mode SELECT when `viewerBboxSQL` returns null. Track which shape actually ran (`effectiveQueryShape` / `effectiveBboxSQL`) so the COUNT uses the matching WHERE/JOIN. A previous draft that keyed off `effectiveScope === 'area'` would have generated a malformed COUNT in the fallback case. - Cancellation: a second search bumps `_searchSeq`. The COUNT guards against `searchId !== _searchSeq` so a stale COUNT can't overwrite a fresher search's text or pollute its structured log. - Failure handling: a COUNT error leaves the initial `50+` label in place — under-disclosure beats clearing the line. Logged via console.warn; `count_ms` is still recorded for isamplesorg#167 analysis. - Initial render keeps `50+` while the COUNT scan is in flight, so cold-cache users see "more exist" signaling immediately rather than a bare 50 with no qualifier. - Structured log (isamplesorg#167) gains `total_count` and `count_ms` fields. `elapsed_ms` continues to mark end-of-handler so it remains an honest end-to-end "search complete" latency including the COUNT. Validation: - New spec `tests/playwright/search-real-count.spec.js`: - cap-hit: "pottery" produces `50+` then `50 of N (N>50)`, and the structured log carries `total_count` and `count_ms`. - no-results: "xyzzyqqqplugh" still short-circuits to "No results" without firing COUNT (`total_count: null`, `count_ms: 0`). - Fault-injection: confirmed both new tests FAIL against pre-fix code and PASS against the fix. - Full local suite (`search-real-count` + `url-roundtrip` + `facetnote-url-load`): 9 passed (1.7m). No regression. Roadmap step 2 of isamplesorg#234. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Two blockers raised in Codex review (PR thread): 1. **Stale-search guard placement.** `_searchSeq` was checked only after the COUNT awaited. A superseded search could still mutate `searchResults.textContent`, repaint `samplesSection`, trigger `camera.flyTo`, AND launch a wasted COUNT scan before the existing guard fired. Fix: check `searchId !== _searchSeq` immediately after the SELECT returns and before any UI mutation or COUNT launch. The later COUNT-specific guard stays as a second line of defense. 2. **Predicate snapshot.** The COUNT recomputed `sourceFilterSQL('f.source')` and `facetFilterSQL()` at the moment it built its SQL, rather than reusing the strings the SELECT used. If a user toggled a source checkbox or material facet while the SELECT was in flight, the COUNT would scan a different filter state than the rendered 50 rows, producing a label like "50 of M" where M is incoherent with the visible result set. Fix: snapshot `sourceSQL` / `facetSQL` once at the top of the try block and reuse for both SELECT (all shapes) and COUNT. Also: - Add `superseded` to the `isamples.search` structured log (isamplesorg#167) so downstream perf analysis can distinguish abandoned searches from live ones. - Add a filtered cap-hit test (Codex coverage nit). The test ticks the first material facet then searches; asserts the structured-log invariant `total_count present ⟺ cap was hit` and `superseded === false`. The test tolerates all three terminal states (cap hit / sub-cap / no-results) because the first material URI is data-dependent and may not overlap with the search term — what we're proving is the invariant, not a specific count. - Add `superseded === false` assertion to the existing happy-path cap-hit test for coverage on the new field. Validation: 8/8 local Playwright tests pass (search-real-count [3] + url-roundtrip [5] + facetnote-url-load [2]). Fault-injection: confirmed the filtered-cap test still catches the bug if the snapshot is removed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Codex round-2 found two remaining issues: 1. **Blocker — stale error UI write.** The success-path freshness guard was correctly placed, but the outer `catch` still unconditionally wrote `searchResults.textContent = "Search error: ..."`. A superseded search whose SELECT later rejected would overwrite a newer search's UI with its stale error. Fix: gate the catch's UI write on the same `searchId !== _searchSeq` predicate so superseded errors are logged + recorded in the structured log (with `superseded: true`) but never repaint the live search line. 2. **Telemetry caveat — DOM-live booleans in `finally`.** The `has_source_filter` / `has_facet_filter` log fields were computed inside `finally` from the live DOM. If the user toggled a checkbox during the SELECT/COUNT await chain, the log would report post-toggle state instead of search-fire-time state. Fix: snapshot both booleans alongside the SQL predicates at search-fire time (declared at function scope so they remain in scope through `finally`). Also strengthened the filtered-cap test (Codex round-2 coverage gap): The previous test ticked a facet BEFORE searching, so SELECT and COUNT saw the same DOM state regardless of whether the snapshot existed — the test passed even without the round-1 snapshot fix. The new "telemetry snapshot survives a facet toggle during search await" test exercises the race directly: fires search with no facet, waits for the visible `50+` (proves SELECT done), THEN ticks the material facet, then waits for the final `50 of N` label. Asserts `has_facet_filter === false` in the structured log — which only holds if the snapshot survives the toggle. Fault-injection confirmed: the test fails if the snapshot is reverted to live-DOM reads. Validation: 8/8 local Playwright tests pass (search-real-count [3] + url-roundtrip [5] + facetnote-url-load [2]). Telemetry-snapshot test verified to fail against pre-fix code. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Codex round 3 found: 1. **Blocker — invalid-submit stale overwrite.** A valid search in flight could still have its catch-path error overwrite the "Type at least 2 characters" message: the early-return for empty / short terms did NOT bump `_searchSeq`, so when the prior search's SELECT later rejected, its catch saw `searchId === _searchSeq` and wrote "Search error: …" over the live UI. Fix: bump `_searchSeq` at the top of `doSearch`, before the length guard. Every submission — valid or invalid — now invalidates in-flight work. 2. **Coverage — race test was still timing-dependent.** The previous test accepted either `50+` or the final `50 of N` label before toggling. If the COUNT happened to land first, the toggle was a no-op and the test passed regardless of whether the snapshot existed. Fix: monkey-patch the page's `db.query` to introduce a 1.5s delay on COUNT(*) queries (scoped to the page context via the OJS reactive-value accessor, same pattern url-roundtrip.spec.js uses for `viewer`). The toggle is now GUARANTEED to land between SELECT completion and COUNT completion. Wait for EXACT `50+` state before toggling — no more "or final label" loophole. Source-filter snapshot coverage is documented as a TODO: Quarto's see-also rendering duplicates `#sourceFilter` checkboxes, making a toggle-and-count assertion ambiguous in the preview environment. That's a fixture problem orthogonal to isamplesorg#236; leaving it for a follow-up rather than blocking this PR. Validation: 8/8 local Playwright tests pass. Fault-injection (revert `hadFacetFilter` snapshot to live `hasFacetFilters()` call in `finally`) confirmed: the new test fails deterministically under the held-open-COUNT timing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

rdhyee · 2026-05-23T05:41:38Z

Codex review iteration log (4 rounds → LGTM)

Round	Codex finding	Resolution	Commit
1	Stale `_searchSeq` guard placement: UI mutations + COUNT launch could fire for superseded searches before guard	Added early `searchId !== _searchSeq` check immediately after SELECT, before any UI write / panel render / fly-to / COUNT	`27aa70f`
1	Predicate not snapshotted: filter toggle mid-flight produces "50 of M" where M is for a different filter than the rendered 50	Snapshot `sourceSQL` / `facetSQL` once at search-fire time, reuse in SELECT (both shapes) and COUNT (both shapes)	`27aa70f`
2	Stale error UI write: `catch` block unconditionally overwrites live UI with `Search error: …` if a superseded SELECT later rejects	Guard the catch's `searchResults.textContent` write on `searchId !== _searchSeq` (sets `wasSuperseded = true` and logs the original error without UI mutation)	`764ed63`
2	DOM-live telemetry in `finally`: `has_source_filter` / `has_facet_filter` could reflect post-toggle state	Snapshot `hadSourceFilter` / `hadFacetFilter` at function scope at search-fire time, reused in `finally`	`764ed63`
3	Invalid-submit stale overwrite: empty/short term early-return didn't bump `_searchSeq`, so a prior search's rejection could overwrite "Type at least 2 characters"	Bump `_searchSeq` at the TOP of `doSearch`, before any guard — every submission supersedes in-flight work	`5de9136`
3	Race test timing-dependent: wait accepted either `50+` or final `50 of N`, so the toggle could miss the window	Monkey-patch `db.query` with a 1.5s delay on `COUNT(*)` queries; wait for EXACT `50+` before toggling	`5de9136`

Round 4: LGTM (Codex)

The COUNT-delay test closes the round-3 coverage hole in practice: the patch is installed before the search, the test waits for exact 50+, and COUNT is held open before the final log. […] Round 4 verdict: LGTM.

Known follow-up filed as a TODO comment in the test:

Independent has_source_filter snapshot coverage requires a non-Quarto fixture — Quarto's see-also rendering duplicates #sourceFilter checkboxes, so toggle-and-count assertions are ambiguous in preview. The production snapshot uses the same pre-await pattern as has_facet_filter, which IS covered.

Final test count: 8 passed (3 search-real-count + 5 url-roundtrip + 2 facetnote-url-load, total ~2 min) plus the fault-injection sanity check on every round.

Thanks codex for the four-round audit — the catch-path supersession (round 2) and invalid-submit window (round 3) were both real bugs I'd have missed.

rdhyee and others added 4 commits May 22, 2026 19:59

rdhyee merged commit 04ba01c into isamplesorg:main May 23, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

explorer: search-results "50+" → real "N of M" count (closes #232)#236

explorer: search-results "50+" → real "N of M" count (closes #232)#236
rdhyee merged 4 commits into
isamplesorg:mainfrom
rdhyee:fix/search-real-count

rdhyee commented May 23, 2026

Uh oh!

rdhyee commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rdhyee commented May 23, 2026

Summary

Approach

Key implementation details

Test plan

What's NOT in this PR

Acceptance criteria (from #232)

Uh oh!

rdhyee commented May 23, 2026

Codex review iteration log (4 rounds → LGTM)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant