You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 6, 2018. It is now read-only.
the restart one can know that by receive the first heartbeat from the current leader (but it is slow to replay all the logs after the server has been started)
maybe we can keep the committed index and flush that to disk every several seconds?
The text was updated successfully, but these errors were encountered:
@ongardie Should a server wait until receiving an AppendEntries RPC (to receive the commitIndex) from the current leader before applying commands in the log? Or should the committed index be saved to disk?
To make sure everyone's clear, persisting commitIndex isn't needed for safety, since a new leader can always figure this out again with help from a quorum, and then tells everyone else through AppendEntries RPCs. I assume the question is when/whether it's beneficial to persist the commitIndex to disk. You can certainly do it, and as xiangli points out, you can do so asynchronously or periodically. But here are a couple reasons why you may not want to:
It's extra code.
If you're booting a server, hopefully it's in the minority of your cluster that's not needed for availability.
If you're booting a server, it's probably already experienced significant downtime. If you delay applying log commands even by a few minutes, it wouldn't necessarily add significantly more downtime.
If you need to read the log from a magnetic disk, you're lucky if you can get 100MB/s. Applying the commands should be much faster (a few hundred MB/s probably), so Amdahl's law says you can't expect much gain by overlapping the two. A good SSD might change this, though.
So I guess the most super-optimized implementations would do this, but I probably wouldn't bother.
@ongardie I understand that it's not needed for safety. I guess my biggest concern was if the entire cluster goes down and then reboots then every node is waiting for an AE. I suppose it could wait for an election timeout before trying to replay the log. That way you wouldn't get delayed by replaying first if it's not needed.
the restart one can know that by receive the first heartbeat from the current leader (but it is slow to replay all the logs after the server has been started)
maybe we can keep the committed index and flush that to disk every several seconds?
The text was updated successfully, but these errors were encountered: