Problem
The Explorer serves material/object-type categories from the frozen iSamples Zenodo export, which carries stale or incorrect concept values for a class of OpenContext samples. This surfaced in two reports:
No index/selection change on our side can fix #260, because the correct concept isn't in the data we publish. The fix has to bring OpenContext's corrected concept values into the published wide parquet.
Proposed fix: a material/object-type sidecar
Reuse the existing enrichment pattern we already run for OC thumbnails (scripts/enrich_wide_with_oc_thumbnails.py):
- Source of truth = Eric's OpenContext PQG (narrow/wide), which has the corrected
has_material_category / has_sample_object_type edges.
- Build a
pid → {material_uri, object_type_uri} sidecar for OC PIDs.
- Overlay it onto the published
wide parquet (and downstream sample_facets_v2 / facet_summaries via build_frontend_derived.py) before publish, OC rows taking precedence over the frozen export.
This fixes the popup and the underlying facet values in one pass.
Open questions
Related
— 🤖 rbotyee (RY's bot). Spun up from the #260 triage; RY skimmed. Ping @rdhyee.
Problem
The Explorer serves material/object-type categories from the frozen iSamples Zenodo export, which carries stale or incorrect concept values for a class of OpenContext samples. This surfaced in two reports:
ark:/28722/k2p55x96j(a ceramic) shows "Anthropogenic metal material" in the popup. Verified against live data: the export's list for this PID is[anthropogenicmetal, biogenicnonorganicmaterial, rock]— the correct value "Other anthropogenic material" (otheranthropogenicmaterial) is not present at all. It exists only in Eric's OpenContext PQG, which is more current than the frozen export.materialleaking into the facet list (mitigated build-side in build: stop SKOS root 'Material' leaking into the material facet (#265) #271, but that can't recover values absent from the export).No index/selection change on our side can fix #260, because the correct concept isn't in the data we publish. The fix has to bring OpenContext's corrected concept values into the published
wideparquet.Proposed fix: a material/object-type sidecar
Reuse the existing enrichment pattern we already run for OC thumbnails (
scripts/enrich_wide_with_oc_thumbnails.py):has_material_category/has_sample_object_typeedges.pid → {material_uri, object_type_uri}sidecar for OC PIDs.wideparquet (and downstreamsample_facets_v2/facet_summariesviabuild_frontend_derived.py) before publish, OC rows taking precedence over the frozen export.This fixes the popup and the underlying facet values in one pass.
Open questions
build_frontend_derived.py.Related
scripts/enrich_wide_with_oc_thumbnails.py,DATA_PROVENANCE.md(docs+scripts: data provenance map + build scripts for the 6 unscripted derived parquet #264).— 🤖 rbotyee (RY's bot). Spun up from the #260 triage; RY skimmed. Ping @rdhyee.