Skip to content

Snapshot auditability #1539

@jumaffre

Description

@jumaffre

Follow up from #1302

Snapshots are currently generated at regular intervals for a state that is globally committed. However, the snapshot evidence (hash of snapshot) is only committed after the snapshot has been generated. The snapshot is written to disk as soon as it is generated.

This first implementation means that the evidence of a snapshot that is available for new joiners to resume from (i.e. an operator can copy the snapshot file and start a new joiner from it straight away) can actually be rolled back. In this case, the snapshot would be blameless as there's no evidence for it in the ledger.

What we should do instead is:

  • Only write the snapshot to disk once the snapshot evidence is globally committed. This means we would have to keep the serialised snapshot around (+ the version of the evidence) until version is globally committed.

However, this may not be enough to guarantee that a joiner that resumed from a snapshot can join the consensus:

  • From its perspective, it should not become part of the network (e.g. process app requests) until it has seen the globally committed evidence of the snapshot it has resumed from.
  • From the existing network perspective, this new node shouldn't count as part of the Raft quorum until it has confirmed that it has resumed from a trustworthy snapshot.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions