Originally filed as Part 1 of #201. Splitting out as a dedicated issue since #201 was closed by #203 / #205 (which fixed Part 2 only).
Symptom
The "Samples in View" stat box reads exactly 5,000 in dense regions — the value of DEFAULT_POINT_BUDGET at explorer.qmd:418. In Cyprus (lat ≈ 34.99, lng ≈ 33.70), direct DuckDB query against data.isamples.org/isamples_202601_samples_map_lite.parquet returns 23,421 samples in a ±0.1° box. The counter underreports by ~5x there. The cluster is one dense site (almost certainly Polis Excavations, OPENCONTEXT source).
Root cause
explorer.qmd:1530-1538 — the point-mode viewport query:
SELECT pid, label, source, latitude, longitude, place_name, result_time
FROM read_parquet('${lite_url}')
WHERE latitude BETWEEN ${padded.south} AND ${padded.north}
AND longitude BETWEEN ${padded.west} AND ${padded.east}
${sourceFilterSQL('source')}
${facetFilterSQL()}
LIMIT 5000
explorer.qmd:1557:
updateStats('Samples', cachedData.length, cachedData.length, ..., 'Samples in View', 'Samples in View');
cachedData.length IS the row count of the LIMIT 5000 result. The counter therefore tops out at 5000 by construction.
Secondary smells:
- No
ORDER BY before LIMIT → which 5000 rows return is undefined (probably stable in DuckDB-on-parquet but not contractual).
- Label says "in View" but fetch uses a padded (30%) viewport (
explorer.qmd:1514-1522). Even ignoring the cap, the count meaning is loose.
renderSamplePoints plots all of cachedData including rows outside the actual viewport.
Fix directions (from Codex retrospective on #203)
In rough order of effort:
- Honest relabel (cheapest): change the label to "Samples Loaded (max N)" and wire the budget value into the label. Counter stops lying.
- Compute real count alongside: a fast
SELECT count(*) against the same WHERE (no LIMIT) is cheap on the lite parquet via DuckDB-WASM range reads. Display "X loaded / Y total in view", with explicit signaling when Y > X.
- Adaptive aggregation: if real count > budget, fall back to a cluster-style representation or surface a "too dense to render individually — Y samples here" affordance.
- Add
ORDER BY pid to the point query so the 5000 subset is at least deterministic across browsers and sessions.
Direction 2 (real-count alongside) is probably the right user-visible answer; direction 4 is independent and could ship with any of the others.
Acceptance
Originally filed as Part 1 of #201. Splitting out as a dedicated issue since #201 was closed by #203 / #205 (which fixed Part 2 only).
Symptom
The "Samples in View" stat box reads exactly 5,000 in dense regions — the value of
DEFAULT_POINT_BUDGETatexplorer.qmd:418. In Cyprus (lat ≈ 34.99, lng ≈ 33.70), direct DuckDB query againstdata.isamples.org/isamples_202601_samples_map_lite.parquetreturns 23,421 samples in a ±0.1° box. The counter underreports by ~5x there. The cluster is one dense site (almost certainly Polis Excavations, OPENCONTEXT source).Root cause
explorer.qmd:1530-1538— the point-mode viewport query:explorer.qmd:1557:cachedData.lengthIS the row count of the LIMIT 5000 result. The counter therefore tops out at 5000 by construction.Secondary smells:
ORDER BYbeforeLIMIT→ which 5000 rows return is undefined (probably stable in DuckDB-on-parquet but not contractual).explorer.qmd:1514-1522). Even ignoring the cap, the count meaning is loose.renderSamplePointsplots all ofcachedDataincluding rows outside the actual viewport.Fix directions (from Codex retrospective on #203)
In rough order of effort:
SELECT count(*)against the same WHERE (no LIMIT) is cheap on the lite parquet via DuckDB-WASM range reads. Display "X loaded / Y total in view", with explicit signaling when Y > X.ORDER BY pidto the point query so the 5000 subset is at least deterministic across browsers and sessions.Direction 2 (real-count alongside) is probably the right user-visible answer; direction 4 is independent and could ship with any of the others.
Acceptance
#v=1&lat=34.9957&lng=33.6798&alt=15212&mode=point) shows a number that does not silently understate the real density.