Priority: 8 (lowest — tutorial enhancement)
Context
The PQG demo notebook shows multi-hop graph traversals for spatial queries. With H3 columns pre-computed in the wide format, we can demonstrate spatial aggregation without traversal — a significant simplification.
Data Files on R2
| File |
URL |
Size |
| Wide + H3 |
https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202601_wide_h3.parquet |
292 MB |
| Facet summaries |
https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202601_facet_summaries.parquet |
2 KB |
File to Modify
examples/basic/pqg_demo.ipynb
Current Behavior
Demonstrates multi-hop traversals for spatial queries:
Sample → produced_by → SamplingEvent → sample_location → GeospatialCoordLocation
This is correct for the narrow (graph) format but verbose.
Desired Changes
1. Add "Wide format shortcut" section
After the existing graph traversal section, add a new section showing the equivalent query using the wide format with H3:
import duckdb
con = duckdb.connect()
wide_h3_url = "https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202601_wide_h3.parquet"
# "What materials are found in the Bay Area?"
# With H3: single query, no traversal needed
bay_area_samples = con.sql(f"""
WITH bay_area AS (
SELECT DISTINCT h3_res6
FROM read_parquet('{wide_h3_url}')
WHERE otype = 'MaterialSampleRecord'
AND latitude BETWEEN 37.0 AND 38.5 AND longitude BETWEEN -123.0 AND -121.5
)
SELECT
c.label as material,
COUNT(*) as count
FROM read_parquet('{wide_h3_url}') w
JOIN read_parquet('{wide_h3_url}') c
ON c.row_id = ANY(w.p__has_material_category)
WHERE w.h3_res6 IN (SELECT h3_res6 FROM bay_area)
AND w.otype = 'MaterialSampleRecord'
AND c.otype = 'IdentifiedConcept'
GROUP BY c.label
ORDER BY count DESC
""").df()
2. Add comparison table
Show when to use graph traversal (narrow) vs H3 shortcut (wide):
| Use Case |
Best Format |
Why |
| "Show me the full provenance chain" |
Narrow (graph) |
Need explicit edge traversal |
| "What materials are in this area?" |
Wide + H3 |
Single query, no joins |
| "How many samples per source per region?" |
Wide + H3 |
Aggregation with spatial filter |
3. Performance comparison
Time the graph traversal vs the H3 shortcut for the same spatial query.
Acceptance Criteria
Priority: 8 (lowest — tutorial enhancement)
Context
The PQG demo notebook shows multi-hop graph traversals for spatial queries. With H3 columns pre-computed in the wide format, we can demonstrate spatial aggregation without traversal — a significant simplification.
Data Files on R2
https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202601_wide_h3.parquethttps://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202601_facet_summaries.parquetFile to Modify
examples/basic/pqg_demo.ipynbCurrent Behavior
Demonstrates multi-hop traversals for spatial queries:
This is correct for the narrow (graph) format but verbose.
Desired Changes
1. Add "Wide format shortcut" section
After the existing graph traversal section, add a new section showing the equivalent query using the wide format with H3:
2. Add comparison table
Show when to use graph traversal (narrow) vs H3 shortcut (wide):
3. Performance comparison
Time the graph traversal vs the H3 shortcut for the same spatial query.
Acceptance Criteria