You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The same logical entity can appear in the combined triple store under three distinct IRI forms — full-path, synthesized basename-only (synth-A), and bare literal — causing JOIN failures, incorrect DISTINCT counts, and broken materialisation in cross-vault queries.
User Story / JTBD
As a developer running cross-vault SPARQL queries I want all references to the same entity to use a single canonical IRI form after loading So that JOINs succeed, DISTINCT counts are correct, and prototype-chain materialisation works reliably
The prototype's own subject IRI (from vault-2025) is the full-path form. PrototypeChainMaterializer matches on the full-path subject. The 96 synth-A references from the archive never match → those 96 tasks' prototype chain is never materialised → queries return 19% recall.
// When target file not found in current vault during index:constsynthesized=`obsidian://vault/${path.basename(linkpath)}`;// synth-A form
This produces obsidian://vault/fb3d12b2-...md (no subdirectory path) instead of the full obsidian://vault/assetspaces/shared-identities/fb3d12b2-...md.
exo:Asset_prototype already emits both IRI form and UUID bare literal, but this is insufficient for cross-vault JOINs because the IRI forms still differ between vaults.
Pros: non-destructive, preserves original IRI forms
Cons: requires queries to use owl:sameAs aware patterns or the reasoner to materialise equivalences
Sub-task C: STR-based join bridge utility
Utility function bridgeIRIForms(sparqlQuery, store) that rewrites JOIN patterns to include UNION over known IRI forms
Lower priority; primarily useful as workaround before full canonicalization is stable
Code Example
// packages/exocortex/src/services/IRICanonicalizer.ts (new)exportclassIRICanonicalizer{/** * Remaps synth-A IRIs (obsidian://vault/<uid>.md without subdirectory) * to canonical full-path IRIs when the target is found in any loaded vault. * Operates in-memory only — no mutation of source files. */asynccanonicalize(store: ITripleStore,vaultFileIndex: VaultFileIndex): Promise<void>{constsynthAPattern=/^obsidian:\/\/vault\/([0-9a-f-]{36})\.md$/;constremapTable=newMap<string,string>();for(constiriofstore.allSubjects()){constmatch=synthAPattern.exec(iri);if(!match)continue;constuid=match[1];constcanonicalPath=vaultFileIndex.lookupByUID(uid);if(canonicalPath)remapTable.set(iri,canonicalPath);}if(remapTable.size>0){store.remapIRIs(remapTable);// atomic remap — subject + object occurrences}}}
Techniques Applied
Post-load normalisation: canonicalization runs after union store construction, before materialisation — cleans input before inference
In-memory-only: no mutation of source vault files (non-destructive)
Feature: IRI canonicalization for cross-vault entitiesScenario: synth-A IRI remapped to full-path when target accessibleGiven asset with UID U accessible at full path in vault-A
And task in vault-B references U via synth-A IRI "obsidian://vault/U.md"When both vaults loaded with --also and canonicalization enabled
Then all synth-A occurrences of U remapped to full-path canonical IRI
And SELECT DISTINCT on that subject returns one row, not two
Scenario: JOIN succeeds after canonicalizationGiven same setup as above
When running cross-vault query with property path "Effort_area/Area_parent*"Then tasks with synth-A prototype refs join correctly with prototype's triples
And result count is within 95% of Python-verified baseline (488)
Scenario: No remap when synth-A target not foundGiven task with synth-A reference to non-existent UID
When canonicalization pass runs
Then synth-A IRI preserved as-is (no-op)
And no error thrown
Summary
The same logical entity can appear in the combined triple store under three distinct IRI forms — full-path, synthesized basename-only (synth-A), and bare literal — causing JOIN failures, incorrect
DISTINCTcounts, and broken materialisation in cross-vault queries.User Story / JTBD
As a developer running cross-vault SPARQL queries
I want all references to the same entity to use a single canonical IRI form after loading
So that JOINs succeed,
DISTINCTcounts are correct, and prototype-chain materialisation works reliablyBackground
Empirical evidence (2026-05-27 cross-vault audit, ~220K triples):
The same UID
fb3d12b2-9552-4866-a31e-2b5f65ea433cappears as object ofexo:Asset_prototypein three distinct IRI forms in the combined store:obsidian://vault/assetspaces/shared-identities/fb3d12b2-...mdobsidian://vault/fb3d12b2-...mdNoteToRDFConverter.ts:1089-1090)fb3d12b2-...The prototype's own subject IRI (from vault-2025) is the full-path form.
PrototypeChainMaterializermatches on the full-path subject. The 96 synth-A references from the archive never match → those 96 tasks' prototype chain is never materialised → queries return 19% recall.NoteToRDFConverter.ts:1089-1090synthesis fallback (codegraph-verified):This produces
obsidian://vault/fb3d12b2-...md(no subdirectory path) instead of the fullobsidian://vault/assetspaces/shared-identities/fb3d12b2-...md.Existing dual-storage (
NoteToRDFConverter.ts:1033-1037):exo:Asset_prototypealready emits both IRI form and UUID bare literal, but this is insufficient for cross-vault JOINs because the IRI forms still differ between vaults.Reproducer:
Related Issues
--alsoload)BoK References
Technical Approach
Architecture Context
Implementation Steps
Sub-task A: Post-load canonicalization pass
IRICanonicalizer.ts--alsovaults (post-feat(sparql): cross-vault index + runtime materialization #3281buildUnionStore()):obsidian://vault/<uid>.mdin the store (no subdirectory):<uid>in the file index of all loaded vaultsobsidian://vault/<path>/<uid>.md→ this is the canonical full-path form.mdfilesEXOCORTEX_IRI_CANONICALIZE=true(default: false for v1, enable post-validation)Sub-task B: Optional
owl:sameAsemission for IRI synonymsowl:sameAstriples:synth-A-IRI owl:sameAs full-path-IRIRDFSInferenceEngine(post-feat(reasoner): expand RDFSInferenceEngine beyond rdfs:subClassOf #3283 sameAs support)owl:sameAsaware patterns or the reasoner to materialise equivalencesSub-task C: STR-based join bridge utility
bridgeIRIForms(sparqlQuery, store)that rewritesJOINpatterns to includeUNIONover known IRI formsCode Example
Techniques Applied
EXOCORTEX_IRI_CANONICALIZEallows gradual rolloutTest Plan
Unit Tests
IRICanonicalizerremaps synth-A IRI to full-path when target found in vault indexDISTINCTcount after canonicalization: one row per entity (not two)Integration Tests
BDD Scenarios
Deliverables
IRICanonicalizer.ts— post-load synth-A → full-path remappingbuildUnionStore()pipeline (post-feat(sparql): cross-vault index + runtime materialization #3281, pre-materialisation)owl:sameAsemission option (Sub-task B) — behindEXOCORTEX_IRI_SAMAS=trueQuality Criteria
DISTINCTcount per entity: 1 (not 2 or 3).mdfilesAcceptance Criteria
IRICanonicalizerremaps synth-A → full-path in combined storeowl:sameAsemission available (Sub-task B)EXOCORTEX_IRI_CANONICALIZEdocumentedDefinition of Done
RACI
Risks
owl:sameAsroute increases reasoner complexity (Sub-task B)Rollback Plan
EXOCORTEX_IRI_CANONICALIZE=false(default) skips all remappingIRICanonicalizer.tsis a post-load step — removing it from pipeline reverts behaviourowl:sameAsemission independently toggleableDependencies
Estimates
IRICanonicalizer.ts— Sub-task A (synth-A remap)buildUnionStore()pipelineowl:sameAsemissionLabels
refactoring,sparql,package:cli,priority:P1,tech-debt,size:largeBest Practices Checklist
.mdfilesEXOCORTEX_IRI_CANONICALIZE(off by default for v1)PrototypeChainMaterializer(clean input for inference)Review Checklist
EXOCORTEX_IRI_CANONICALIZEdefaults to true in future