Skip to content

QUERY_SPEC.md v0.1 + v0.2 amendments (informed by PQG conformance matrix)#145

Merged
rdhyee merged 4 commits intoisamplesorg:mainfrom
rdhyee:docs/query-spec-v0.2
Apr 24, 2026
Merged

QUERY_SPEC.md v0.1 + v0.2 amendments (informed by PQG conformance matrix)#145
rdhyee merged 4 commits intoisamplesorg:mainfrom
rdhyee:docs/query-spec-v0.2

Conversation

@rdhyee
Copy link
Copy Markdown
Contributor

@rdhyee rdhyee commented Apr 24, 2026

Summary

This PR (a) introduces query-spec.qmd as the v0.1 baseline (written in a previous session but never committed) and (b) applies v0.2 amendments informed by the newly-landed PQG conformance matrix — the audit of which shipped parquet files actually carry which QUERY_SPEC dimensions.

The two changes are split across two commits for reviewer sanity:

  1. Add QUERY_SPEC.md v0.1 (draft) — the baseline (context).
  2. Apply QUERY_SPEC v0.2 amendments from PQG conformance matrix — the substantive changes.

Reviewers should focus on commit 2; commit 1 is context.

v0.2 amendments (9)

All amendments trace to conformance_matrix.md §5:

# Amendment Source row
1 Rename specimenobjectType (§2.2). Every shipped parquet uses object_type / hasSampleObjectType; adopt data-side name, keep hasSpecimenCategory as Solr alias. §5.1, §4 naming-drift observation, §3.2
2 Drop ghosts informalClassification (§2.2) and resultTimeRange (§2.3) — Solr-era remnants, never in any shipped parquet. Also drop time_range OVERLAPS from §3.1 grammar and §5.3 Solr binding. §5.2, §4 "ghosts in the spec", §3.2, §3.3
3 Add thumbnailURL to §2.1 (optional). Ships in wide for OpenContext today; moving to per-source sidecars (issue #131). §5.3, §4 "ghosts in the data"
4 Update §5.1 time BETWEEN binding from "TBD" to TRY_CAST(result_time AS TIMESTAMP) BETWEEN t1 AND t2. result_time IS in lite (as VARCHAR). §5.4, §3.3
5 Document H3 column availability in §2.4: wide_h3 has direct h3_res4/6/8 columns; h3_summary_res{4,6,8} tier files ship h3_cell + resolution (NOT h3_res{N} columns); lite has h3_res8 only; plain wide / narrow carry no H3 columns. §5.6, §3.4
6 Pick tmodified (INTEGER epoch) over last_modified_time (VARCHAR) for sourceUpdatedTime in §2.1; alias VARCHAR as deprecated. §5.5, §3.1 note
7 Bump draft callout version 0.1 → 0.2. §5 preamble
8 §7 open questions: close Q2 (time filter in lite — resolved); reframe Q1 around new objectType naming. §5.1, §5.4
9 Appendix B: add references to conformance_matrix.md and SERIALIZATIONS.md (#143). (new)

Links

Test plan

  • quarto render query-spec.qmd renders cleanly (tables, callouts, cross-refs)
  • Verify v0.2 callout text and cross-references to conformance_matrix.md resolve
  • Spot-check §2.2 naming note, §2.4 H3 availability callout, §5.1 time binding row
  • Confirm §7 "Questions resolved in v0.2" block renders correctly

🤖 Generated with Claude Code

rdhyee added 4 commits April 24, 2026 07:59
Substrate-neutral query contract spanning DuckDB-WASM (web), DuckDB/Ibis
(Python), and Apache Solr (legacy). Names mirror the Solr schema
vocabulary (authoritative precedent) with substrate-specific aliases
provided in §5.

Scope:
- Canonical facet / filter dimensions (§2)
- Abstract filter grammar (§3)
- Full-text search semantics (§3.2, the 16-field Solr searchText target)
- Sample-card projection (§4.2)
- Substrate binding tables (§5)
- Open questions for v0.2 (§7)

Out of scope: PQG graph traversal (see QUERY_COMPARISON.md), bulk
export, ingestion.

Refs isamplesorg.github.io#138.
Amendments informed by isamplesorg/pqg#22 (conformance_matrix.md §4-§5),
which audited which shipped parquet files actually carry which spec
dimensions:

1. Rename `specimen` → `objectType` (§2.2). Every shipped parquet uses
   `object_type` / `hasSampleObjectType`; adopt the data-side name as
   canonical, keep `hasSpecimenCategory` as Solr alias.
2. Drop ghosts: `informalClassification` (§2.2) and `resultTimeRange`
   (§2.3) — both were in Solr but never migrated to any parquet. Also
   drop `time_range OVERLAPS` from §3.1 grammar and §5.3 Solr binding.
3. Add `thumbnailURL` to §2.1 as optional (ships in `wide` today for
   OpenContext only; moving to per-source sidecars — issue isamplesorg#131).
4. Update §5.1 `time BETWEEN` binding from "TBD" to real DuckDB cast:
   `TRY_CAST(result_time AS TIMESTAMP) BETWEEN t1 AND t2`. `result_time`
   IS in lite (as VARCHAR).
5. Document H3 column availability in §2.4: `wide_h3` and
   `h3_summary_res{4,6,8}` carry res 4/6/8; `lite` has res 8 only;
   plain `wide` / `narrow` carry no H3 columns.
6. Pick `tmodified` (INTEGER epoch) over `last_modified_time` (VARCHAR)
   for `sourceUpdatedTime` in §2.1; alias the VARCHAR as deprecated.
7. Bump version callout to v0.2.
8. §7 open questions: close Q2 (time filter in lite — now resolved);
   reframe Q1 around the new `objectType` naming.
9. Appendix B: reference conformance_matrix.md and SERIALIZATIONS.md
   (pqg#143) as companion documents.

Refs isamplesorg/pqg#22, isamplesorg.github.io#138.
…NS link

Two issues from Codex review:

1. **§2.4 callout wrong about h3_summary schema**: the previous text
   said the summary tier files carry `h3_res4`, `h3_res6`, `h3_res8`.
   They don't — they ship `h3_cell` (UBIGINT) + `resolution` (INTEGER)
   and filter by resolution. Corrected the callout and the §5.1
   DuckDB binding row to show the actual form
   (`h3_cell IN (...) AND resolution = 6`).

2. **Appendix B wrong link target**: the SERIALIZATIONS.md reference
   pointed at `isamplesorg/pqg/pull/143`, but the catalog PR is
   `isamplesorg#143`. Fixed.
Codex round-2: §5.1 DuckDB binding claimed `source IN (…)` binds to
`source IN (…) on wide / lite parquet`. Wrong for wide — it uses `n`
(PQG convention), not `source`. The query as written fails with
"Referenced column source not found".

Updated the binding row to distinguish:
  wide / narrow: WHERE n IN (…)
  lite / sample_facets_v2: WHERE source IN (…) — alias already exposed
@rdhyee rdhyee merged commit 1e364d7 into isamplesorg:main Apr 24, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant