Summary
Cross-vault analytical queries return ~19% recall because sparql index has no --also flag, materialization runs per-vault only, and query --use-cache --also loads independent per-vault caches without cross-vault materialization.
User Story / JTBD
As a knowledge worker running analytical SPARQL over a primary vault + archive vault
I want a single combined index that materialises prototype-chain inheritance across vault boundaries
So that queries like "all ems__Effort in Q1-26 under TBank area chain" return complete results instead of 19% recall
Background
Empirical evidence (2026-05-27 cross-vault audit, ~220K triples):
Running the following reproducer on a live vault returns 91 results where Python-verified ground truth is 488:
cd /Users/kitelev/vault-2025 && time npx @kitelev/exocortex-cli query \
--vault /Users/kitelev/vault-2025 \
--also /Users/kitelev/vault-2025-archive \
--use-cache --format json /tmp/q1-26-tbank-final.sparql
# Returns 91, expected 488 (19% recall — 81% precision gap)
Root cause trace (codegraph-verified):
Archive tasks contain frontmatter exo__Asset_prototype: "[[fb3d12b2-...]]" where the prototype file lives in vault-2025/assetspaces/shared-identities/. When building the archive cache (sparql index --vault vault-2025-archive), NoteToRDFConverter.ts:1089-1090 cannot find the target file in the archive vault → falls back to synthesized basename-only IRI obsidian://vault/fb3d12b2-...md. The prototype's own subject IRI after --also merge is obsidian://vault/assetspaces/shared-identities/fb3d12b2-...md (real path). JOIN between these two IRI forms fails silently.
The same UID fb3d12b2-9552-4866-a31e-2b5f65ea433c appears as object of Asset_prototype in 3 distinct IRI forms in the combined store:
obsidian://vault/assetspaces/shared-identities/<uid>.md — 6 refs (from vault-2025 assets)
obsidian://vault/<uid>.md — 96 refs (synth-A fallback from archive assets)
- bare literal
<uid> — 6 refs
PrototypeChainMaterializer.ts correctly supports combined stores via store: ITripleStore interface, but it is never invoked on the combined store because materialisation happens only per-vault at index time.
Related Issues
BoK References
| Body of Knowledge |
Chapter/Section |
Relevance |
| SWEBOK v3 |
Ch. 2 Software Design |
Federated store architecture, IRI resolution layer |
| SWEBOK v3 |
Ch. 3 Software Construction |
CLI flag design, backward-compat constraints |
| DMBOK v2 |
Ch. 8 Data Integration |
Multi-source data loading, IRI identity resolution |
| PMBOK v7 |
Project Work |
Regression baseline required before ship |
Technical Approach
Architecture Context
Current flow:
sparql index --vault A → cache-A (only vault-A triples + materialisation)
sparql index --vault B → cache-B (only vault-B triples + materialisation)
query --vault A --also B → load cache-A + cache-B independently → store-union
(no cross-vault materialisation on union)
Target flow:
sparql index --vault A --also B → combined-cache (union triples + cross-vault materialisation)
query --vault A --also B → load combined-cache OR run runtime materialisation
Implementation Steps
-
Sub-task A: index --also <path> repeatable flag
- Add
--also option to sparql-index.ts (mirrors existing sparql-query.ts pattern)
- Collect all vault paths, build union triple store before passing to
PrototypeChainMaterializer
- Write combined cache to
<primary-vault>/.exocortex/cache/triples-combined.json (or hash-keyed filename per --also set)
PrototypeChainMaterializer already accepts store: ITripleStore — no changes needed there
-
Sub-task B: query --inference flag → runtime materialisation on combined store
- Add
--inference flag to sparql-query.ts
- When
--also provided and --inference set, run PrototypeChainMaterializer on union store before query
- Cache key includes
--also paths so combined-cache hit avoids re-materialisation
-
Sub-task C: Regression test
- Integration test: query Q1-26 ems__Effort in TBank area chain → assert result count ≥ 464 (95% of 488 baseline)
- Test fixture: minimal vault pair with prototype in vault-A, task in vault-B referencing it via
[[<uid>]]
Code Example
// packages/cli/src/commands/sparql-index.ts — add --also support
program
.option('--also <path>', 'Additional vault to include', (v, prev) => [...(prev || []), v], [])
.action(async (options) => {
const vaultPaths = [options.vault, ...(options.also || [])];
const store = await buildUnionStore(vaultPaths); // new helper
await PrototypeChainMaterializer.materialize(store); // existing — no changes
await writeCache(options.vault, store, { alsoVaults: options.also }); // hash-keyed
});
Techniques Applied
- Federated triple store: union of N per-vault stores before materialisation
- Content-addressed cache: cache filename keyed on
hash(primary + sorted(also)) to support multiple --also combinations
- Flag parity:
--also already exists in sparql-query.ts — reuse same semantics in sparql-index.ts
Test Plan
Unit Tests
buildUnionStore([pathA, pathB]) returns store containing triples from both vaults
- Cache key differs for different
--also sets
PrototypeChainMaterializer resolves prototype chain when prototype is in secondary vault
Integration Tests
- Vault-pair fixture: task in vault-B references prototype in vault-A → after
index --also, property path ems:Effort_area/ems:Area_parent* resolves correctly
- Regression: result count for TBank Q1-26 query ≥ 464 (95% recall baseline)
BDD Scenarios
Feature: Cross-vault SPARQL index
Scenario: Combined index resolves cross-vault prototype references
Given vault-2025 (primary) and vault-2025-archive (secondary)
When running: exocortex-cli index --vault vault-2025 --also vault-2025-archive
Then single combined cache is built with materialisation on the union store
And prototype refs to cross-vault targets resolve to consistent IRI form
Scenario: Cross-vault property path query returns correct recall
Given combined cache built for vault-2025 + vault-2025-archive
When running cross-vault query with property path "Effort_area/Area_parent*"
Then tasks with prototype-inherited area residing in another vault are matched
And result count is within 95% of Python-verified baseline (488)
Scenario: Runtime inference flag substitutes for pre-built combined cache
Given no combined cache exists
When running: exocortex-cli query --vault A --also B --inference
Then PrototypeChainMaterializer runs on combined triple store at query time
And results match pre-built combined-cache results
Deliverables
Quality Criteria
- Cross-vault query recall ≥ 95% of Python-verified baseline (488 tasks → ≥ 464)
- Combined cache build time ≤ 2× single-vault index time
- Backward-compat:
sparql index --vault A (no --also) behaviour unchanged
- No regressions in existing
sparql-query tests
Acceptance Criteria
Definition of Done
RACI
| Activity |
Responsible |
Accountable |
Consulted |
Informed |
| Implementation |
AI Agent |
Tech Lead |
— |
Team |
| Testing |
AI Agent |
QA |
— |
Team |
| Documentation |
AI Agent |
Tech Lead |
— |
Stakeholders |
Risks
| Risk |
Probability |
Impact |
Mitigation |
| Combined cache doubles disk usage (~43MB + ~16MB → ~60MB) |
High |
Low |
Content-addressed naming; document in CLI help |
Cache invalidation logic breaks when --also set changes |
Medium |
Medium |
Hash-keyed cache files; stale detection via mtime |
| PrototypeChainMaterializer performance on 2× triples |
Low |
Medium |
Benchmark before ship; add --no-inference escape hatch |
Rollback Plan
--also flag is additive — removing it restores per-vault-only behaviour
- Combined cache is a separate file — deleting it forces fallback to per-vault caches
- Feature flag
EXOCORTEX_COMBINED_INDEX=0 as escape hatch if needed
Dependencies
Estimates
| Task |
Effort |
sparql-index.ts — add --also flag + union store builder |
3h |
sparql-query.ts — add --inference flag + runtime materialisation |
2h |
| Cache naming / invalidation logic |
2h |
| Integration tests + regression test |
3h |
| Total |
10h |
Labels
enhancement, sparql, cli, package:cli, priority:P0, epic:sparql-engine, size:large
Best Practices Checklist
Review Checklist
Summary
Cross-vault analytical queries return ~19% recall because
sparql indexhas no--alsoflag, materialization runs per-vault only, andquery --use-cache --alsoloads independent per-vault caches without cross-vault materialization.User Story / JTBD
As a knowledge worker running analytical SPARQL over a primary vault + archive vault
I want a single combined index that materialises prototype-chain inheritance across vault boundaries
So that queries like "all ems__Effort in Q1-26 under TBank area chain" return complete results instead of 19% recall
Background
Empirical evidence (2026-05-27 cross-vault audit, ~220K triples):
Running the following reproducer on a live vault returns 91 results where Python-verified ground truth is 488:
Root cause trace (codegraph-verified):
Archive tasks contain frontmatter
exo__Asset_prototype: "[[fb3d12b2-...]]"where the prototype file lives invault-2025/assetspaces/shared-identities/. When building the archive cache (sparql index --vault vault-2025-archive),NoteToRDFConverter.ts:1089-1090cannot find the target file in the archive vault → falls back to synthesized basename-only IRIobsidian://vault/fb3d12b2-...md. The prototype's own subject IRI after--alsomerge isobsidian://vault/assetspaces/shared-identities/fb3d12b2-...md(real path). JOIN between these two IRI forms fails silently.The same UID
fb3d12b2-9552-4866-a31e-2b5f65ea433cappears as object ofAsset_prototypein 3 distinct IRI forms in the combined store:obsidian://vault/assetspaces/shared-identities/<uid>.md— 6 refs (from vault-2025 assets)obsidian://vault/<uid>.md— 96 refs (synth-A fallback from archive assets)<uid>— 6 refsPrototypeChainMaterializer.tscorrectly supports combined stores viastore: ITripleStoreinterface, but it is never invoked on the combined store because materialisation happens only per-vault at index time.Related Issues
query --also <path>strips path prefix; wikilinks stored as string literal not IRI #3219 (query --alsopath-prefix strip — adjacent parser bug)BoK References
Technical Approach
Architecture Context
Implementation Steps
Sub-task A:
index --also <path>repeatable flag--alsooption tosparql-index.ts(mirrors existingsparql-query.tspattern)PrototypeChainMaterializer<primary-vault>/.exocortex/cache/triples-combined.json(or hash-keyed filename per--alsoset)PrototypeChainMaterializeralready acceptsstore: ITripleStore— no changes needed thereSub-task B:
query --inferenceflag → runtime materialisation on combined store--inferenceflag tosparql-query.ts--alsoprovided and--inferenceset, runPrototypeChainMaterializeron union store before query--alsopaths so combined-cache hit avoids re-materialisationSub-task C: Regression test
[[<uid>]]Code Example
Techniques Applied
hash(primary + sorted(also))to support multiple--alsocombinations--alsoalready exists insparql-query.ts— reuse same semantics insparql-index.tsTest Plan
Unit Tests
buildUnionStore([pathA, pathB])returns store containing triples from both vaults--alsosetsPrototypeChainMaterializerresolves prototype chain when prototype is in secondary vaultIntegration Tests
index --also, property pathems:Effort_area/ems:Area_parent*resolves correctlyBDD Scenarios
Deliverables
--alsoflag added tosparql-indexcommand--inferenceflag added tosparql-querycommand (runtime materialisation)Quality Criteria
sparql index --vault A(no--also) behaviour unchangedsparql-querytestsAcceptance Criteria
index --vault A --also Bproduces combined cache without errorquery --vault A --also B --use-cachehits combined cache when availablequery --vault A --also B --inferenceruns materialisation at query timeDefinition of Done
RACI
Risks
--alsoset changes--no-inferenceescape hatchRollback Plan
--alsoflag is additive — removing it restores per-vault-only behaviourEXOCORTEX_COMBINED_INDEX=0as escape hatch if neededDependencies
Estimates
sparql-index.ts— add--alsoflag + union store buildersparql-query.ts— add--inferenceflag + runtime materialisationLabels
enhancement,sparql,cli,package:cli,priority:P0,epic:sparql-engine,size:largeBest Practices Checklist
--alsoflag semantics match existingsparql-query.tsimplementationtriples.json)--helpupdated for both new flagsReview Checklist
--also= same behaviour as before