Priority: 6
Context
No clustering visualization exists in the notebooks. Lonboard shows every point, which overwhelms at 6M points. H3 res6 columns enable zoom-adaptive clustering.
Data Files on R2
| File |
URL |
Size |
| Wide + H3 |
https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202601_wide_h3.parquet |
292 MB |
| Facet summaries |
https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202601_facet_summaries.parquet |
2 KB |
H3 columns: h3_res4 (BIGINT), h3_res6 (BIGINT), h3_res8 (BIGINT). 11.96M rows have H3 values.
File to Create
examples/basic/h3_clustering.ipynb
Notebook Structure
Cell 1: Introduction (markdown)
Explain H3 hierarchical hexagonal indexing, why it's useful for geospatial clustering, link to h3geo.org.
Cell 2: Setup and H3 stats
import duckdb
con = duckdb.connect()
wide_h3_url = "https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202601_wide_h3.parquet"
# Show H3 column distribution
stats = con.sql(f"""
SELECT
COUNT(*) as total,
COUNT(h3_res4) as with_h3,
COUNT(DISTINCT h3_res4) as cells_res4,
COUNT(DISTINCT h3_res6) as cells_res6,
COUNT(DISTINCT h3_res8) as cells_res8
FROM read_parquet('{wide_h3_url}')
WHERE otype = 'MaterialSampleRecord'
""").df()
stats
Cell 3: Cluster at res6 (~3.2km hexagons)
clusters = con.sql(f"""
SELECT
h3_res6,
COUNT(*) as n,
AVG(latitude) as lat,
AVG(longitude) as lon,
MODE(n) as dominant_source
FROM read_parquet('{wide_h3_url}')
WHERE otype = 'MaterialSampleRecord' AND h3_res6 IS NOT NULL
GROUP BY h3_res6
""").df()
print(f"{len(clusters):,} clusters from {clusters.n.sum():,} samples")
print(f"Cluster sizes: min={clusters.n.min()}, median={clusters.n.median():.0f}, max={clusters.n.max():,}")
Cell 4: Lonboard clustered visualization
from lonboard import Map, ScatterplotLayer
import numpy as np
# Scale radius by log of count
clusters['radius'] = np.clip(np.log2(clusters['n']) * 500, 500, 50000)
# Color by dominant source
source_colors = {
'SESAR': [0, 100, 255],
'OpenContext': [0, 200, 100],
'GEOME': [255, 165, 0],
'Smithsonian': [148, 0, 211]
}
clusters['color'] = clusters['dominant_source'].map(
lambda s: source_colors.get(s, [128, 128, 128])
)
layer = ScatterplotLayer.from_dataframe(
clusters,
get_position=['lon', 'lat'],
get_radius='radius',
get_fill_color='color',
opacity=0.6,
pickable=True,
)
Map(layer)
Cell 5: Compare resolutions side-by-side
Show how res4 vs res6 vs res8 produce different clustering granularity. Include a table comparing cluster counts and a note about when to use each.
Cell 6: Benchmark — clustering vs full points
Time comparison: loading 112K clusters vs 6M individual points into Lonboard.
Cell 7: Regional drill-down demo
Select a res4 cell, then show its res6 children, then res8. Demonstrate hierarchical zoom.
Acceptance Criteria
Priority: 6
Context
No clustering visualization exists in the notebooks. Lonboard shows every point, which overwhelms at 6M points. H3 res6 columns enable zoom-adaptive clustering.
Data Files on R2
https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202601_wide_h3.parquethttps://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202601_facet_summaries.parquetH3 columns:
h3_res4(BIGINT),h3_res6(BIGINT),h3_res8(BIGINT). 11.96M rows have H3 values.File to Create
examples/basic/h3_clustering.ipynbNotebook Structure
Cell 1: Introduction (markdown)
Explain H3 hierarchical hexagonal indexing, why it's useful for geospatial clustering, link to h3geo.org.
Cell 2: Setup and H3 stats
Cell 3: Cluster at res6 (~3.2km hexagons)
Cell 4: Lonboard clustered visualization
Cell 5: Compare resolutions side-by-side
Show how res4 vs res6 vs res8 produce different clustering granularity. Include a table comparing cluster counts and a note about when to use each.
Cell 6: Benchmark — clustering vs full points
Time comparison: loading 112K clusters vs 6M individual points into Lonboard.
Cell 7: Regional drill-down demo
Select a res4 cell, then show its res6 children, then res8. Demonstrate hierarchical zoom.
Acceptance Criteria
examples/basic/h3_clustering.ipynb