Skip to content

fix(locorda_core): per-property vclk stamps for LWW concurrent-merge (closes #50)#61

Open
kkalass wants to merge 7 commits into
mainfrom
fix/issue-50-property-statements-vclk
Open

fix(locorda_core): per-property vclk stamps for LWW concurrent-merge (closes #50)#61
kkalass wants to merge 7 commits into
mainfrom
fix/issue-50-property-statements-vclk

Conversation

@kkalass

@kkalass kkalass commented May 28, 2026

Copy link
Copy Markdown
Collaborator

Summary

Fixes #50LwwRegister concurrent merge used document-level clock, causing silent data loss on uncontested LWW properties.

Root Cause

Without per-property causality information, LwwRegister.remoteMerge() fell back to comparing document-level HLC physical timestamps when two installations had the same logical time (ClockComparison.concurrent). If installation B had a higher maxPhysicalTime than A, it would win all LWW properties — including ones it never touched — silently discarding A's uncontested writes.

Fix (proposal 028)

On every LWW property write, emit a sync:PropertyStatement keyed on (subject, predicate) carrying a crdt:VersionedClock (vclk) IRI. During concurrent merge, each property is now resolved against its own per-(s,p) vclk rather than the whole document clock, so uncontested writes win unconditionally.

Changes

  • LwwRegister.generateMetadata() emits PropertyStatement + vclk node; suppresses the statement when the property's vclk equals the document-wide implicit default (no overhead for the common uncontested case). GC removes stale vclk subgraphs on re-save.
  • RemoteCrdtMergeContext now receives MetadataGenerator so the merge path can resolve vclks from the remote graph's PropertyStatements.
  • RemoteDocumentMerger._extractStatements() recognises the new sync:PropertyStatement shape (sync:resource / sync:property) alongside the existing rdf:Statement reification shape.
  • OrganizedGraph carries documentIri to distinguish framework-data subjects (which must NOT get PropertyStatements) from app-data.
  • CrdtDocumentManager opts out of per-property stamping for the framework-data pass to avoid recursive self-stamping of the PropertyStatement / VersionedClock nodes.
  • New test case 43_concurrent_lww_different_props exercises the exact scenario from the bug report: A and B diverge at logicalTime=3, A changes both name+description, B changes only name; after cross-sync B wins name (higher physical time) but A retains description.
  • All existing expected_stored_graph.ttl fixtures updated to include the new PropertyStatement metadata.

Preparatory commits

  • spec: introduce sync:PropertyStatement and crdt:VersionedClock vocabulary
  • docs(proposed-changes): 028 — property statements and versioned clocks
  • spec: use crdt:forClockEntry as VersionedClock identity anchor

kkalass added 7 commits May 28, 2026 11:58
… truth)

The vocabulary spec/vocabularies/crdt-mechanics.ttl defines
crdt:installationIri (see proposed-changes/007), but spec/mappings/core-v1.ttl
still referenced the old name crdt:installationId. Bring the mapping in line
with the vocabulary and regenerate mapping_bootstrap.g.dart accordingly.

No semantic change.
…lary

Adds the vocabulary and core merge-contract scaffolding needed to scope LWW
(and future per-property/per-value) CRDT bookkeeping at the right granularity,
fixing issue #50 conceptually.

Vocabulary changes (spec/vocabularies):
* sync.ttl
  - New class sync:PropertyStatement: framework metadata about a (resource,
    property) pair, distinct from rdf:Statement (which identifies a specific
    (s,p,o) triple).
  - New property sync:property (domain PropertyStatement, range rdf:Property).
  - sync:hasStatement.range widened to a union of rdf:Statement,
    sync:PropertyStatement and sync:ResourceStatement (one mechanism, three
    statement granularities).
  - sync:resource.domain widened to union of ResourceStatement and
    PropertyStatement.
* crdt-mechanics.ttl
  - New class crdt:VersionedClock: immutable, content-addressed HLC snapshot.
    Hash is over (installationIri, logicalTime) pairs sorted by
    installationIri; physicalTime is a tie-break annotation, NOT part of the
    hash (consistent with crdt:clockHash).
  - New property crdt:vclk: attaches a VersionedClock snapshot to a framework
    metadata node (PropertyStatement, rdf:Statement, ...). Merge semantics:
    vector-clock domination on the referenced vclks; ties broken by
    max(physicalTime) within the vclk on 'concurrent'.

Mapping changes (spec/mappings/core-v1.ttl):
* New <#property-statement> ClassMapping: (sync:resource, sync:property)
  identifying + Immutable; crdt:vclk LWW (placeholder; merger applies
  vclk-domination semantics).
* New <#versioned-clock> ClassMapping: crdt:hasClockEntry Immutable with
  mc:disableBlankNodePathIdentification true (vclk is content-addressed; no
  path identification for embedded clock entries).
* Traversal boundaries extended with sync:resource, sync:property, crdt:vclk,
  crdt:hasClockEntry to keep framework-side metadata cleanly separated.

Regenerates packages/locorda_core/lib/src/generated/mapping_bootstrap.g.dart.

No implementation change yet; this commit is vocabulary + contract only.
Concept reference will follow as proposed-changes/028.
Concept note for the issue #50 fix, replacing the PR #60 per-property
clock attempt with a sync:PropertyStatement + crdt:VersionedClock design.

Covers problem reproduction, three statement granularities, vclk
content-addressing, per-property merge semantics, GC strategy and a
worked example showing both properties surviving the fix.
Change the VersionedClock entry identity from crdt:installationIri to
crdt:forClockEntry — a reference to the document's existing crdt:ClockEntry
IRI. Rationale:

- The framework currently does not write crdt:installationIri onto its
  ClockEntry nodes; installation identity is conveyed purely through the
  deterministic lcrd-clk-md5-<hash(installationLocalId)> fragment.
- The ClockEntry IRI is content-addressed, stable across local and remote
  storage, and identical on every installation observing the entry — so
  it is semantically equivalent to an installation-keyed vector clock as
  the per-entry key, without requiring additional triples on existing
  ClockEntry nodes.
- crdt:installationIri remains in the vocabulary as OPTIONAL (annotated)
  for future cross-document discovery use cases.

Each VersionedClock entry carries its own crdt:logicalTime / crdt:physicalTime
as frozen snapshot values; the referenced ClockEntry may have advanced since
the snapshot. The vclk content hash is over (forClockEntry, logicalTime)
pairs sorted by forClockEntry; physicalTime is annotation only.

Mapping additions:
* crdt:forClockEntry → Immutable in <#clock-mappings>.
* Traversal boundary extended for crdt:forClockEntry.

Updates proposed-changes/028 and regenerates vocab + mapping bootstrap.
Without per-property causality information, LwwRegister.remoteMerge()
fell back to comparing document-level HLC physical timestamps when two
installations had the same logical time (ClockComparison.concurrent).
If installation B had a higher maxPhysicalTime than A, it would win ALL
LWW properties — including ones it never touched — silently discarding
A's uncontested writes.

Fix (proposal 028): on every LWW property write, emit a
sync:PropertyStatement keyed on (subject, predicate) carrying a
crdt:VersionedClock (vclk) IRI. During concurrent merge, each property
is now resolved against its own per-(s,p) vclk rather than the whole
document clock, so uncontested writes win unconditionally.

Details:
- LwwRegister.generateMetadata() emits PropertyStatement + vclk node;
  suppresses the statement when the property's vclk equals the document-
  wide implicit default (no overhead for the common uncontested case).
  GC removes stale vclk subgraphs on re-save of the same property.
- RemoteCrdtMergeContext now receives MetadataGenerator so the merge
  path can resolve vclks from the remote graph's PropertyStatements.
- RemoteDocumentMerger._extractStatements() recognises the new
  sync:PropertyStatement shape (sync:resource / sync:property) alongside
  the existing rdf:Statement reification shape.
- OrganizedGraph carries documentIri to distinguish framework-data
  subjects (which must NOT get PropertyStatements) from app-data.
- CrdtDocumentManager opts out of per-property stamping for the
  framework-data pass to avoid recursive self-stamping of the
  PropertyStatement / VersionedClock nodes themselves.
- New test case 43_concurrent_lww_different_props exercises the exact
  scenario from the bug report: A and B diverge at logicalTime=3,
  A changes both name+description, B changes only name; after cross-sync
  B wins name (higher physical time) but A retains description.
- All existing expected_stored_graph.ttl fixtures updated to include the
  new PropertyStatement metadata.

Closes #50
On Windows/Linux expect DesktopGDriveAuth; on all other platforms
(macOS, iOS, Android) expect GoogleSignInGDriveAuth, matching the
runtime dispatch in gdrive_auth_io.dart.
// Remove all old triples that the new emission will re-introduce,
// plus the inbound hasStatement edge. The new emission re-adds them
// via `addNodes(documentIri, sync:hasStatement, ...)`.
triplesToRemove.addAll(oldFrameworkGraph.findTriples(subject: stmtIri));

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is dangerous - If I understand correctly, this would also remove triples of the same statement not managed by us. We must not touch anything that was possibly added by a future version (or a different program).

// plus the inbound hasStatement edge. The new emission re-adds them
// via `addNodes(documentIri, sync:hasStatement, ...)`.
triplesToRemove.addAll(oldFrameworkGraph.findTriples(subject: stmtIri));
triplesToRemove.addAll(oldFrameworkGraph.findTriples(

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, find without subject needs to iterate all triples - can hasStatement be on any subject iri? or on the documentIri?


// Vclk subgraph GC: if the OLD vclk this stmt pointed to is now
// unreferenced by any OTHER PropertyStatement, drop its subgraph too.
final staleVclks = oldFrameworkGraph

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.findTriples is quite optimized for subject/predicate, but I wonder if garbage collection really belongs here in localValueChange - isn't this a separate thing we need to do after all individual property value changes where handled? Also because the idea of property statements is not so very special for LWW - I think that assigning metadata to a subject/property pair is a pretty general idea. And regarding vclk: this is a bit more specific, yes - but not necessarily bound to LWW either, right? So cleaning up unused vclk references IMHO really should be separate - what do you think? And about the semantics above that a vclk needs to be referenced only if it is not equivalent to the base vclk - maybe there should be a general helper? But beware: please discuss this issue only, do not change anything for it without my explicit consent.

// the writer's HLC at save time). When PropertyStatements exist on
// BOTH sides, compare their vclks as proper vector clocks and resolve
// independently from other properties.
// 2. Fall back to the document-wide clock comparison when per-property

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this really be an all-or-nothing kind of thing? My intention in the concept was, that the vclk is taken from the property if possible, else the base clock, else the document clock. This is done per side in the merge, so a remote could have the property vclk and local the document wide one. It must not fall back to comparing document wide on both sides. Or did I misunderstand something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LwwRegister concurrent merge uses document-level clock, causing silent data loss on uncontested LWW properties

1 participant