Priority: 5
Context
The interactive Jupyter explorer re-scans 6.7M rows on every widget interaction (3-8s). Pre-computed summaries (2KB) can provide instant facet counts.
Data Files on R2
| File |
URL |
Size |
| Facet summaries |
https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202601_facet_summaries.parquet |
2 KB |
| Facet cross-tab |
https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202601_facet_cross.parquet |
1 KB |
| Wide + H3 |
https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202601_wide_h3.parquet |
292 MB |
Facet summaries schema: facet_type (source/material/context/object_type), facet_value, scheme, count
File to Modify
examples/basic/isamples_explorer.ipynb
Current Behavior
Uses ipywidgets + Lonboard. Every facet/filter interaction re-queries the full parquet:
# Slow: scans full file every time
results = con.sql(f"SELECT ... FROM read_parquet('{url}') WHERE ...").df()
Desired Changes
1. Load summaries at startup
import duckdb
con = duckdb.connect()
summaries_url = "https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202601_facet_summaries.parquet"
# Instant: 2KB download
facets = con.sql(f"SELECT * FROM read_parquet('{summaries_url}')").df()
source_options = facets[facets.facet_type == 'source'][['facet_value', 'count']].to_dict('records')
material_options = facets[facets.facet_type == 'material'][['facet_value', 'count']].to_dict('records')
# etc.
2. Populate widgets from summaries
import ipywidgets as widgets
source_dropdown = widgets.Dropdown(
options=[(f"{r['facet_value']} ({r['count']:,})", r['facet_value']) for r in source_options],
description='Source:'
)
material_dropdown = widgets.Dropdown(
options=[('All', None)] + [(f"{r['facet_value']} ({r['count']:,})", r['facet_value']) for r in material_options],
description='Material:'
)
3. Only query full parquet on explicit "Search" action
Keep full parquet queries for the actual sample results table and map, but facet counts should come from summaries.
Acceptance Criteria
Priority: 5
Context
The interactive Jupyter explorer re-scans 6.7M rows on every widget interaction (3-8s). Pre-computed summaries (2KB) can provide instant facet counts.
Data Files on R2
https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202601_facet_summaries.parquethttps://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202601_facet_cross.parquethttps://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202601_wide_h3.parquetFacet summaries schema:
facet_type(source/material/context/object_type),facet_value,scheme,countFile to Modify
examples/basic/isamples_explorer.ipynbCurrent Behavior
Uses ipywidgets + Lonboard. Every facet/filter interaction re-queries the full parquet:
Desired Changes
1. Load summaries at startup
2. Populate widgets from summaries
3. Only query full parquet on explicit "Search" action
Keep full parquet queries for the actual sample results table and map, but facet counts should come from summaries.
Acceptance Criteria