Replies: 5 comments 13 replies
-
Thanks for starting this discussion. The in-of-band and out-of-band classifications are informative.
This direction sounds very interesting.
For the raft-based journal, do you think we can store all data in cloud object storage? If so, |
Beta Was this translation helpful? Give feedback.
-
A useful reference is KIP-500: Replace ZooKeeper with a Self-Managed Metadata Quorum, how Apache Kafka implements their quorum mechanism. Sorry I don't take a closer look into our use case yet, but election by an append only data structure can share many natures. |
Beta Was this translation helpful? Give feedback.
-
I’m happy to talk about the question "Do we really need the raft-protocol?". |
Beta Was this translation helpful? Give feedback.
-
I have submitted a RFC to illustrate the design ideas: #280. |
Beta Was this translation helpful? Give feedback.
-
@w41ter-l as the development moved to w41ter-l/shared-journal. I think this RFC as well as the leader based journal can be moved to discussions. Now, we have 2 RFCs and 2 discussion threads:
It's quite fragmented our discussions. You can continue this thread, update the description and send your update on works in w41ter-l/shared-journal so that all developers can catch the progress. WDYT? |
Beta Was this translation helpful? Give feedback.
-
In Engula, Journal provides abstractions and implementations to store data streams. For example, transaction logs. The Journal is mainly used by Engine, but also used by Kernel who need to persist metadata. Unlike the previous discussion of journal: #70 , this discussion only focus on the design and implementation of leader based Journal service.
I want to discuss design from two aspects:
Reconfiguration
Here are two kinds of reconfiguration of Journal:
Those out-of-band reconfiguration journal system use sealing and bridge records to switch configurations, so their advantages are 1:
Of course, the out-of-band reconfiguration journal system requires another store to save metedata, such as epoch number, copyset. The in-of-band reconfiguration journal system haven't such requirement, so the could be used independently.
In Engula, the Kernel might use the in-of-band reconfiguration journal system to persist metadata, and Engine use the out-of-band reconfiguration journal system to persist log. So I plan to design and implement a raft based journal system to supply Kernel, a journal system using sealing and bridge records, flexible paxos to supply Engine. The latter also uses the former to persist out-of-band metadata.
Storage medium
The simple way is store all data in local disk, like some raft impelementation. But the local capacity is limitted and once reconfiguration occurs, the entires data need to replicate to the new nodes.
Cloud object store could be used to persist logs, and the underlying storage could provide availblity. But not all large-scale storage
systems have an atomic put-if-absent operation, so a separate lightweight coordination service need be used to ensure that only one client can append data to log 2. At the same time, each user operation put a piece of data to the object storage, which is not efficient; batching will increasing the latency.
In order to achieve both low latency and zero-move. we could save recently data in local disk but move sealed (or read-only segments) to cloud object store.
If you have a better design, welcome to discuss.
Beta Was this translation helpful? Give feedback.
All reactions