docs(pubs): expand GitHub Repositories with pipeline diagram + name reconciliation note by rdhyee · Pull Request #144 · isamplesorg/isamplesorg.github.io

rdhyee · 2026-04-24T14:46:44Z

Summary

Rewrites the `## GitHub Repositories` section of `pubs.qmd` (rendered at https://isamples.org/pubs.html#github-repositories) so the repos are shown as a four-tier pipeline rather than a flat list.

Before: 4 entries, no relationships, missing `examples` and `pqg`, broken vocabularies link.

After: pipeline diagram + layered table (schema / serialization / consumer) + domain extensions subsection + legacy subsection + callout flagging the `examples` ↔ `isamples-python` naming mismatch.

The pipeline framing

metadata + vocabularies       ← canonical data model & SKOS terms
          │
          ▼
        pqg                   ← property-graph parquet format + tooling
          │
          ▼
 data.isamples.org + Zenodo   ← published parquet snapshots
          │
   ┌──────┴──────┐
   ▼             ▼
examples   isamplesorg.github.io
(Python)   (Web + DuckDB-WASM)

Things this PR does NOT do (discussed, out of scope)

Doesn't rename the `examples` repo to `isamples-python` — that's an org-admin action with ecosystem implications (local forks, bookmarks, Zenodo metadata). Flagged in a callout as a pending decision.
Doesn't add a user-facing data-files page. The Serialization catalog (PR docs: SERIALIZATIONS.md — catalog the ~11 parquet files in flight #143) is the internal version; a user-facing `data.qmd` page listing what's at data.isamples.org and Zenodo may come next.

The current listing has four entries and no framing of how they relate. In practice iSamples is four-tier pipeline: metadata + vocabularies → pqg → data.isamples.org/Zenodo → consumers but the previous table didn't show this and was missing two of the five core repos (examples/pqg). Specifically: - Added `examples` (the Python client + notebooks) and `pqg` (the property-graph parquet framework) — both are core consumer/ serialization repos the previous table omitted. - Added an ASCII pipeline diagram above the table so the layer grouping is visible. - Fixed the `vocabularies` link — previously pointed at a subdir of `metadata`; the actual repo is `isamplesorg/vocabularies`. - Grouped domain extensions (metadata_profile_*) into their own subsection so core vs extension is clear. - Split isamples_inabox into a "Legacy / infrastructure" subsection with a note about the API going offline Aug 2025 + Solr schema as query-dimension precedent. - Added cross-links to query-spec.qmd and SERIALIZATIONS.md as the companion docs that document the substrate itself. - Flagged the known `examples` vs `isamples-python` naming mismatch as a reconciliation decision (callout block). No structural changes to the file — same H2, same position under Zenodo Community. Just replacing the inner table with layered listings and a diagram.

rdhyee · 2026-04-24T15:04:11Z

Review notes from Codex:

pubs.qmd links to query-spec.qmd, but query-spec.qmd is not present on main or in PR docs(pubs): expand GitHub Repositories with pipeline diagram + name reconciliation note #144’s head commit. If docs(pubs): expand GitHub Repositories with pipeline diagram + name reconciliation note #144 merges as-is, both the inline “Query Specification” link in the legacy/infrastructure bullet and the related-docs link will render to a 404 until the query-spec work lands. Either merge the query-spec branch first, include it in this PR, or point temporarily to the GitHub issue/PR instead of a site-local page.
pubs.qmd links to SERIALIZATIONS.md, but that file is only in open PR docs: SERIALIZATIONS.md — catalog the ~11 parquet files in flight #143, not in main or PR docs(pubs): expand GitHub Repositories with pipeline diagram + name reconciliation note #144. This creates a merge-order dependency. If docs(pubs): expand GitHub Repositories with pipeline diagram + name reconciliation note #144 lands before docs: SERIALIZATIONS.md — catalog the ~11 parquet files in flight #143, the public page gets a broken link. Mark docs(pubs): expand GitHub Repositories with pipeline diagram + name reconciliation note #144 as dependent on docs: SERIALIZATIONS.md — catalog the ~11 parquet files in flight #143, merge docs: SERIALIZATIONS.md — catalog the ~11 parquet files in flight #143 first, or remove/replace the link until the catalog exists on the deployed site.

I verified the new vocabularies and examples GitHub repos exist. I also checked the Central API link; it failed to connect after roughly 75 seconds, so the “offline” note is directionally still valid, though the “as of August 2025” wording may age poorly.

No tests/build run; this was a diff and link-target review.

— Codex

rdhyee · 2026-04-24T15:33:13Z

All 6 Codex findings addressed

Finding	Severity	Fixed in
Notebook output/widget-state bloat (~109k-line diff, 20 MB file)	High	examples#3 e32ec88 — stripped outputs + `metadata.widgets`; file now 137 KB
H3 source-filter inaccuracy (`dominant_source` filter + cell-total `sample_count`)	High	examples#3 e32ec88 — expanded docstring with accuracy caveats; added "⚠️ source filter is dominant-source only" suffix to status bar when the filter is active
Empty tier_df crash in `_make_tier_table_df`	Medium	examples#3 e32ec88 — guards in both `_make_tier_table_df` + `_update_map_and_table_tier`; 0-cell case shows "0 cells in viewport" instead of IndexError
h3_summary schema mislabeled (`h3_res{N}` vs `h3_cell` + `resolution`)	Medium	pqg#22 91f4de4 — changed h3[resN] × h3_summary cells from ✅ to 🔄, added column-name-gotcha paragraph ghio#145 da2a713 — corrected §2.4 callout + §5.1 binding row
sample_facets_v2 facets are VARCHAR scalars, not arrays	Medium	ghio#143 b91b314 — rewrote §3 row + §4.8 detail + query pattern (was `ANY(material)`, now `material = '<uri>'` / ILIKE)
Appendix B SERIALIZATIONS link pointed to wrong repo	Low	ghio#145 da2a713 — fixed to `isamplesorg.github.io#143`

All live-verified where applicable (DESCRIBE re-run for the schema fixes). The examples#3 notebook runs clean end-to-end via nbclient; outputs intentionally stripped so the file stays small.

cc @rdhyee — ready for Codex to re-check (or a human to merge).

rdhyee · 2026-04-24T15:55:26Z

Codex round-2 findings addressed

All 3 file claims fixed:

Finding	Fixed in
ghio#145 §5.1 binding: `source IN (…)` on wide is wrong (wide uses `n`)	ghio#145 c962f4a — binding row now distinguishes `n IN (…)` (wide/narrow) from `source IN (…)` (lite/facets)
ghio#143 §4.3 wide example SQL fails (uses nonexistent `source` column)	ghio#143 f9533e5 — `SELECT n AS source, COUNT(*) ... GROUP BY n`; verified returns SESAR=4.69M, OC=1.06M, GEOME=606K, SMITHSONIAN=322K
ghio#143 §4.3 "each an INT32[]" understates mixed live types	ghio#143 f9533e5 — softened to "integer array" with exact split listed (6 cols `INTEGER[]`, 6 cols `BIGINT[]`)

Non-blocking PR-body cleanup also done:

examples#3 body: corrected "30 cells" → "31 cells"; moved the lite-parquet bullet from "Not in scope" to a new "Additional scope that landed" section.
ghio#145 body: fixed isamplesorg/pqg#143 → isamplesorg/isamplesorg.github.io#143 in amendment row 9.

Ready for Codex round-3 or merge.

rdhyee merged commit 3f14104 into isamplesorg:main Apr 24, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(pubs): expand GitHub Repositories with pipeline diagram + name reconciliation note#144

docs(pubs): expand GitHub Repositories with pipeline diagram + name reconciliation note#144
rdhyee merged 1 commit intoisamplesorg:mainfrom
rdhyee:docs/pubs-repositories-expanded

rdhyee commented Apr 24, 2026

Uh oh!

rdhyee commented Apr 24, 2026

Uh oh!

rdhyee commented Apr 24, 2026

Uh oh!

rdhyee commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rdhyee commented Apr 24, 2026

Summary

The pipeline framing

Things this PR does NOT do (discussed, out of scope)

Related

Uh oh!

rdhyee commented Apr 24, 2026

Uh oh!

rdhyee commented Apr 24, 2026

All 6 Codex findings addressed

Uh oh!

rdhyee commented Apr 24, 2026

Codex round-2 findings addressed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant