Formal models of core Elasticsearch algorithms
This repository contains formal models of core Elasticsearch algorithms and is directly related to ongoing implementation efforts around data replication and cluster consensus. We consider this work-in-progress: Models as well as implementations are still evolving and might differ in substantial ways. The formal models mainly serve to illustrate some of the high-level concepts and help to validate resiliency-related aspects.
Models
Data replication
The data replication model consists of two files:
Cluster consensus
The cluster consensus TLA+ model consists of two files:
The cluster consensus Isabelle model consists of the following theories:
- Basic definitions
- An implementation in functional style
- An implementation in monadic style, along with a proof it's equivalent to the previous
- The proof that each slot is consistent, based on Lamport's Synod algorithm
- The proof that the implementation ensures consistency
Replica engine
A TLA+ model of how the engine handles replication requests.
Zen with terms
An alternative idea for improving the consistency of cluster state updates, effectively by adding the notion of a term to the existing Zen algorithm.
How to edit/run TLA+:
- Install the TLA Toolbox
- If on Mac OS, move the downloaded app to the Applications folder first
- Read some documentation
How to run the model checker in headless mode:
- Download tla2tools.jar
- Run the model checker once in TLA+ Toolbox on desktop (can be aborted once started). This generates the folder
elasticsearch.toolbox/model/that contains all model files that are required to run the model checker in headless mode. - Copy the above folder and
tla2tools.jarto the server running in headless mode. cdto the folder and runjava -Xmx30G -cp ../tla2tools.jar tlc2.TLC MC -deadlock -workers 12. The setting-Xmx30Gdenotes the amount of memory to allocate to the model checker and-workers 12the number of worker threads (should be equal to the number of cores on machine). The setting-deadlockensures that TLC explores the full reachable state space, not searching for deadlocks.