Mesos can run in fault-tolerant mode, in which multiple Mesos masters run simultaneously, with one of them being the active master, and the others acting as stand-bys ready to take over if the active master fails. Mesos uses Apache ZooKeeper in to elect a new active master.
Fault-tolerant mode requires Mesos to be built with ZooKeeper. This can be done with the configure option
--with-included-zookeeper, which will ensure that ZooKeeper (which resides in the
third_party directory) gets compiled. It is also possible to run Mesos with an external ZooKeeper by using the configure option
DIR to the directory of the external ZooKeeper.
To run Mesos in fault-tolerant mode, ZooKeeper has to be up and running. The script
third_party/zookeeper-*/bin/zkServer.sh can be used to launch ZooKeeper (see the ZooKeeper documentation for more information). Once ZooKeeper is running, the master daemon, slave daemon(s), and the framework schedulers have to be passed a URL to the running ZooKeeper instance. The URL is of the form
zoo://host1:port1,host2:port2/znode, where the
host:port pairs are ZooKeeper servers and
znode is a path to a znode (ZooKeeper’s equivalent of a directory) for use by Mesos. It is also possible to use the URL
zoofile://filename/znode, in which case
filename should contain one
host:port pair per line. This URL replaces the Mesos master URL (i.e.
mesos://) which is passed when Mesos is not running in fault-tolerant mode. Multiple Mesos masters can be executed this way. Mesos will ensure, through ZooKeeper, that only one of them is the active master at any given time.