Skip to content

explorer: 'Samples in View' shows the 5000 fetch budget, not real count + URL round-trip doesn't reproduce view #201

@rdhyee

Description

@rdhyee

Investigated 2026-05-11 starting from this URL:
https://isamples.org/explorer.html#v=1&lat=34.9957&lng=33.6798&alt=15212&heading=360.0&mode=point

User-visible symptoms:

  1. The "Samples in View" stat box reads exactly 5000 — a suspiciously round number.
  2. Pan and zoom from that starting state, copy the URL, paste into a different browser → the view that comes back is not the view that was captured.

Both symptoms have concrete root causes. Treating as a single issue because they were investigated together and overlap; happy to split if a maintainer prefers.


Part 1 — "Samples in View" is the fetch budget, not the real in-view count

Root cause

explorer.qmd:418

DEFAULT_POINT_BUDGET = 5000

explorer.qmd:1530-1538 — the point-mode viewport query:

SELECT pid, label, source, latitude, longitude, place_name, result_time
FROM read_parquet('${lite_url}')
WHERE latitude BETWEEN ${padded.south} AND ${padded.north}
  AND longitude BETWEEN ${padded.west} AND ${padded.east}
  ${sourceFilterSQL('source')}
  ${facetFilterSQL()}
LIMIT 5000

explorer.qmd:1557 — the UI counter:

updateStats('Samples', cachedData.length, cachedData.length, ..., 'Samples in View', 'Samples in View');

cachedData is the result of the LIMIT 5000 query. The counter therefore tops out at 5000 by construction. In dense regions it does not represent "samples in view" — it represents "samples we chose to load from a slightly-padded box around the view."

Ground-truth numbers for the Cyprus URL

Direct DuckDB query against https://data.isamples.org/isamples_202601_samples_map_lite.parquet, centered on the URL's lat=34.9957, lng=33.6798:

Viewport (degrees half-extent) Actual samples
±0.10° 23,421
±0.20° 23,803
±0.50° 24,305
±1.00° 37,869

So when the UI says "5,000 Samples in View" at Cyprus, the truth is 23,000+ even within a viewport tighter than the explorer's 30%-padded fetch box. Counter is wrong by ~5x in this region.

Secondary smells in the same query path

  • No ORDER BY before LIMIT 5000 → which 5000 rows are returned is undefined. DuckDB-on-parquet is probably stable file-order in practice, but it's not a contract.
  • Label says "in View" but fetch uses a padded box (30% larger; explorer.qmd:1514-1522). Even if we set aside the cap, the count is loosely defined.
  • The displayed count never shrinks to "true visible" as the user pans inside the cached padded box — renderSamplePoints plots all of cachedData, including rows outside the actual viewport.

What's at Cyprus (for context)

The 23K samples in the box are one dense cluster around lat 34.98, lng 33.71, all OPENCONTEXT. That's the Polis excavations project (Excavations at Polis had 52,762 OC records per the Open Context facet API). So the cap is hiding a single very-dense site, not a diffuse distribution.

Suggested fix directions (for discussion, not prescriptive)

  1. Cheapest: change the label from "Samples in View" to "Samples Loaded (max N)" so the counter no longer lies. Wire the budget value into the label.
  2. Show the real count separately: a fast SELECT count(*) against the same WHERE clause (no LIMIT) is cheap on the lite parquet via DuckDB-WASM range reads. Display two numbers: real count and rendered count.
  3. Adaptive budget / aggregation: if real count > budget, fall back to a server-side aggregation or show "23,421 samples — too dense to render individually" with a UI affordance to drill in.
  4. Add ORDER BY pid (or similar) so the LIMIT 5000 subset is at least deterministic across browsers and sessions.

Part 2 — URL state round-trip doesn't reproduce the view

What we know about the write path

Camera-change handler (explorer.qmd:1965-2030):

viewer.camera.changed.addEventListener(() => {
    if (timer) clearTimeout(timer);
    timer = setTimeout(async () => {
        // ... mode/resolution decisions ...
        if (!viewer._suppressHashWrite) {
            history.replaceState(null, '', buildHash(viewer));
        }
    }, 600);
});

buildHash (explorer.qmd:651-671) encodes only: v=1, lat, lng, alt, optional heading (only if abs(heading % 360) > 1), optional pitch (only if not nadir), optional mode=point, optional pid or h3. The query-string state (?search=, ?sources=, ?material=, etc.) is written by a separate writeQueryState() function (explorer.qmd:494-526) on filter changes, not by the camera handler.

Hypotheses (need empirical confirmation per the EXPLORER_STATE.md state-contract framing, #164)

  1. Copy mid-debounce. 600ms debounce means a user who pans/zooms and immediately copies the URL gets stale state. Easy to confirm: pan, wait 2s, then check the address bar.
  2. Heading normalization drops 360.0. buildHash only writes heading if abs(heading % 360) > 1 (line 661). The deep-link URL Raymond used has heading=360.0, but after one camera-handler tick that value is normalized to 0 and the param is dropped. If the user's view depends on heading != 0, the next URL write silently discards it.
  3. Cold-cache point-mode latency dominates. Deep-link to mode=point triggers the res8 + samples_map_lite fetch path that takes 60–90s on a cold cache (the same path explorer: 60–90 s 'no dots' window on cold-cache deep-link to point mode (DuckDB-WASM 1.24.0 falls back to full HTTP read) #190 / PR explorer: surface 'Fetching sample index…' during cold-cache boot→point-mode wait (#190 fix 2) #191 worked around). In a fresh browser the view "looks different" might mean "samples haven't loaded yet."
  4. 5000-cap non-determinism (Part 1). Even when both browsers finish loading, the displayed 5000-sample subset is undefined without ORDER BY. The two browsers might render different 5000 dots.
  5. _suppressHashWrite could stay stuck. Hashchange handler sets it true (line 2127) and clears it after a 2000ms timeout (lines 2140-2145). If a user chains hashchanges (back/forward repeatedly) faster than 2000ms, the flag may stay set across writes. Edge case; less likely in Raymond's flow but worth ruling out.

Cleanest discriminating test

After pan/zoom and a 2-second pause:

  • If the address bar contains the current camera state → problem is on the load side. Suspects: (3) cold-cache latency, (4) 5000-cap subset roulette, (5) stuck suppress flag.
  • If the address bar still has stale state → problem is on the write side. Suspects: (1) longer-than-600ms thing keeping _suppressHashWrite true, or (2) heading-normalization dropping a param the user needed.

Relationship to other work


Repro (Cyprus, 2026-05-11)

URL: https://isamples.org/explorer.html#v=1&lat=34.9957&lng=33.6798&alt=15212&heading=360.0&mode=point
Observed: "Samples in View: 5,000"
Real count in tight ±0.1° box: 23,421 (one dense OPENCONTEXT cluster, probably Polis)

DuckDB reproduction (no auth needed):

import duckdb
con = duckdb.connect()
con.execute("INSTALL httpfs; LOAD httpfs;")
url = 'https://data.isamples.org/isamples_202601_samples_map_lite.parquet'
con.execute(f"""
    SELECT count(*) FROM read_parquet('{url}')
    WHERE latitude BETWEEN 34.8957 AND 35.0957
      AND longitude BETWEEN 33.5798 AND 33.7798
""").fetchone()
# (23421,)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions