Add H3 spatial optimization and facet summaries#6
Conversation
- geoparquet.ipynb: Add H3-accelerated bbox filtering benchmark, H3 cell distribution stats, and Lonboard visualization using H3-indexed parquet - isamples_explorer.ipynb: Load 2KB facet summaries at startup for instant widget population; add context and object_type dropdown filters - h3_clustering.ipynb: New notebook demonstrating H3 clustering with Lonboard visualization, multi-resolution comparison, performance benchmarks, and hierarchical drill-down - pqg_demo.ipynb: Add wide format shortcut section comparing graph traversal vs H3 spatial queries with performance timing Closes #2, closes #3, closes #4, closes #5 https://claude.ai/code/session_01ADUWKdT6dM7gqmauf6TqWB
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8c911a9e5c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "# Get the set of res4 cells that cover the bounding box corners + center\n", | ||
| "covering_cells = con_h3.sql(f\"\"\"\n", | ||
| " WITH corners AS (\n", | ||
| " SELECT h3_latlng_to_cell(lat, lng, 4) AS cell FROM (\n", | ||
| " VALUES\n", |
There was a problem hiding this comment.
Cover bbox with all intersecting H3 cells
This pre-filter only collects H3 cells from the bbox corners/center, which is not a complete cover of the rectangle at resolution 4. Any intersecting cell that does not contain one of those sampled points is excluded before the exact lat/lon check, so valid rows can be dropped and the benchmarked row-count parity/speedup can become incorrect for many bounding boxes.
Useful? React with 👍 / 👎.
| " COUNT(*) AS total_points,\n", | ||
| " ROUND(COUNT(*) * 1.0 / COUNT(DISTINCT {col}), 1) AS avg_points_per_cluster,\n", |
There was a problem hiding this comment.
Compute point totals from aggregated cell counts
In this query, rows are already grouped by H3 cell in the subquery (ct), so COUNT(*) in the outer select counts clusters, not points. That makes total_points equal num_clusters and forces avg_points_per_cluster to about 1, which corrupts the multi-resolution comparison output.
Useful? React with 👍 / 👎.
| "context_dropdown = widgets.Dropdown(\n", | ||
| " options=context_options,\n", | ||
| " value='',\n", | ||
| " description='',\n", | ||
| " layout=widgets.Layout(width='100%')\n", |
There was a problem hiding this comment.
Wire new facet dropdowns into filtering logic
The new Context/Object Type widgets are rendered in the facet accordion, but they are not connected to explorer state updates or query parameters, so selecting a value does not change the loaded sample set. This creates a silent mismatch where users think those facets are active filters when results remain unfiltered.
Useful? React with 👍 / 👎.
Multi-AI Code ReviewCodex Review
Codex Recommendations:
Gemini Review
SummaryMust Fix Before Merge:
Should Fix: Nice to Have: Reviews generated by OpenAI Codex and Google Gemini via Claude Code |
1. H3 bbox coverage (Critical): Replace corners+center sampling with full data query to find all res4 cells within bbox. The old approach missed edge cells for large bounding boxes, causing false negatives. Fixed in geoparquet.ipynb and pqg_demo.ipynb. 2. Empty covering list guard (High): Add check for empty cell list before building IN () SQL clause, which would be invalid SQL. Fixed in geoparquet.ipynb and pqg_demo.ipynb. 3. Material URI suffix collision (Critical): get_all_material_counts() now keys by (scheme, suffix) internally before collapsing to suffix, keeping the highest-count entry when different vocabularies share a suffix (e.g., "rock" in isample/vocabulary vs isample/opencontext). Fixed in isamples_explorer.ipynb. 4. N+1 rollup queries (Medium): Replaced per-suffix COUNT(DISTINCT) loop with single batch UNION ALL query in compute_accurate_rollup_counts(). Fixed in isamples_explorer.ipynb. 5. Lazy-load heavy queries (High): Deferred get_all_material_counts() and get_year_range_stats() to first use instead of eager startup, preserving the "instant" facet experience from pre-computed summaries. Fixed in isamples_explorer.ipynb. 6. Graph traversal no-op (Low): Replaced bare `pass` with explanatory comment noting it's pseudocode. Fixed in pqg_demo.ipynb. Addresses review: PR #6 comment #3882037312 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Review Fixes Applied (4ba023e)Addressed all 3 must-fix and 3 should-fix items from the multi-AI review: Must Fix ✅
Should Fix ✅
Files Changed
|
1. H3 bbox coverage: Use h3 Python library (h3.geo_to_cells) to compute
covering cells mathematically instead of scanning the data. This makes
the benchmark meaningful — cell computation is O(1) relative to data
size, pure geometry with no I/O. (Codex Critical + Gemini Performance)
2. Filter parity: All H3 queries now include otype='MaterialSampleRecord'
to match baseline queries exactly. (Codex Medium)
3. SQL quoting: Fixed batch rollup query — was producing escaped quotes
(\'rock\') instead of proper SQL quotes ('rock'). (Codex High)
4. Actually lazy startup: Deferred get_all_material_counts() and
get_year_range_stats() for real — they now run on first accordion
open, not at module import. Startup only loads 2KB summary parquet.
(Codex High)
5. Scheme-aware rollup: expand_material_filters_with_rollup() now
matches children within the same vocabulary scheme prefix, preventing
cross-vocabulary suffix collisions during expansion. (Codex Critical)
6. Notebook JSON format: Restored list-of-strings source format in
explorer notebook for clean git diffs. (Gemini Style)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Round-2 Multi-AI Review (commit eaec245)Codex Round-2 Findings
What's Fixed ✅
Remaining for Round 3
Gemini Round-1 Findings (for reference)
Reviews by OpenAI Codex (gpt-5.1-codex-max) and Google Gemini (gemini-2.5-pro) via Claude Code |
1. Time accordion index: Changed idx==2 to idx==4 to match actual
accordion children order (Sources=0, Material=1, Context=2,
Object Type=3, Time=4). Also rebuilds decade checkboxes and
slider bounds after lazy loading year stats. (Codex High)
2. Scheme-keyed material map: get_all_material_counts() now stores
all_uris dict mapping {scheme_prefix: uri} per suffix. Rollup
expansion looks up children by same scheme first, falls back to
primary URI. Prevents cross-vocabulary collisions. (Codex High)
3. Signed BIGINT normalization: h3_to_signed() converts h3.str_to_int()
output to signed int64 (val - 2**64 if val >= 2**63) to match
DuckDB BIGINT storage. All current data is positive, but this
guards against future cells with high bit set. (Codex Medium)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Round-3 Codex Review (commit 4112e7d) — Converged ✅All critical and high issues from prior rounds are resolved. Remaining findings are low/cosmetic:
No blocking issues. PR ready for merge. Fix Summary Across All Rounds
Review by OpenAI Codex (gpt-5.1-codex-max) via Claude Code — 3 rounds to convergence |
Summary
Test plan
examples/basic/geoparquet.ipynbend-to-end — verify H3 benchmark and map renderexamples/basic/isamples_explorer.ipynb— verify facet summaries load in <1s, context/object_type dropdowns appearexamples/basic/h3_clustering.ipynb— verify cluster map, multi-res table, drill-down outputexamples/basic/pqg_demo.ipynb— verify wide format shortcut and timing comparisonCloses #2, closes #3, closes #4, closes #5
https://claude.ai/code/session_01ADUWKdT6dM7gqmauf6TqWB