[Optimization 1/4] H3-accelerated bbox filtering in geoparquet.ipynb

## Priority: 2

## Context

H3 geospatial indexing gives ~5x speedup for bounding box queries (170ms → 35ms). We published an H3-indexed wide parquet to R2. This notebook should demonstrate the speedup.

## Data Files on R2

| File | URL | Size |
|------|-----|------|
| **Wide + H3** | `https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202601_wide_h3.parquet` | 292 MB |
| Original wide | `https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202601_wide.parquet` | 278 MB |

The H3 file has 3 extra columns: `h3_res4` (BIGINT), `h3_res6` (BIGINT), `h3_res8` (BIGINT). 11.96M of 20.7M rows have H3 values (rows with coordinates).

## File to Modify

`examples/basic/geoparquet.ipynb`

## Current Behavior

All spatial queries use raw lat/lon range scans:
```python
# ~170ms per query
filtered = table.filter(
    (ibis._.latitude.between(32, 49)) & (ibis._.longitude.between(-125, -104))
)
```

## Desired Changes

### 1. Add H3 bbox filtering section

After the existing spatial query section, add a new section demonstrating H3 acceleration:

```python
import duckdb

con = duckdb.connect()
con.execute("INSTALL h3 FROM community; LOAD h3;")

wide_h3_url = "https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202601_wide_h3.parquet"

# Compute H3 cells covering the Western US bbox
# h3.geo_to_cells requires the h3 Python package
# Alternative: use DuckDB to find cells within the bbox
import time

# Baseline: raw lat/lon
start = time.time()
baseline = con.sql(f"""
    SELECT COUNT(*) FROM read_parquet('{wide_h3_url}')
    WHERE otype = 'MaterialSampleRecord'
    AND latitude BETWEEN 32 AND 49 AND longitude BETWEEN -125 AND -104
""").fetchone()
baseline_ms = (time.time() - start) * 1000

# H3: pre-filter by res4 cells, then refine
start = time.time()
h3_result = con.sql(f"""
    WITH bbox_cells AS (
        SELECT DISTINCT h3_res4
        FROM read_parquet('{wide_h3_url}')
        WHERE otype = 'MaterialSampleRecord'
        AND latitude BETWEEN 32 AND 49 AND longitude BETWEEN -125 AND -104
    )
    SELECT COUNT(*) FROM read_parquet('{wide_h3_url}')
    WHERE h3_res4 IN (SELECT h3_res4 FROM bbox_cells)
    AND otype = 'MaterialSampleRecord'
""").fetchone()
h3_ms = (time.time() - start) * 1000

print(f"Baseline: {baseline_ms:.0f}ms ({baseline[0]:,} samples)")
print(f"H3 res4: {h3_ms:.0f}ms ({h3_result[0]:,} samples)")
print(f"Speedup: {baseline_ms/h3_ms:.1f}x")
```

### 2. Add H3 stats section

Show the H3 column distribution:
```python
stats = con.sql(f"""
    SELECT
        COUNT(*) as total_rows,
        COUNT(h3_res4) as with_h3,
        COUNT(DISTINCT h3_res4) as unique_res4,
        COUNT(DISTINCT h3_res6) as unique_res6,
        COUNT(DISTINCT h3_res8) as unique_res8
    FROM read_parquet('{wide_h3_url}')
    WHERE otype = 'MaterialSampleRecord'
""").df()
```

### 3. Update Lonboard visualization

Use the H3 file URL instead of the original wide file, and demonstrate how H3 filtering can speed up the data loading for the map.

## Acceptance Criteria

- [ ] Side-by-side benchmark: baseline vs H3 query timing
- [ ] H3 stats showing cell distribution
- [ ] Lonboard map still works with H3 file
- [ ] Clear markdown explanation of what H3 is and why it helps
- [ ] Notebook runs end-to-end without errors

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Optimization 1/4] H3-accelerated bbox filtering in geoparquet.ipynb #2

Priority: 2

Context

Data Files on R2

File to Modify

Current Behavior

Desired Changes

1. Add H3 bbox filtering section

2. Add H3 stats section

3. Update Lonboard visualization

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

File	URL	Size
Wide + H3	`https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202601_wide_h3.parquet`	292 MB
Original wide	`https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202601_wide.parquet`	278 MB

[Optimization 1/4] H3-accelerated bbox filtering in geoparquet.ipynb #2

Description

Priority: 2

Context

Data Files on R2

File to Modify

Current Behavior

Desired Changes

1. Add H3 bbox filtering section

2. Add H3 stats section

3. Update Lonboard visualization

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions