Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timely worker synchronization for new dataflow creation #6

Closed
rjnn opened this issue Mar 13, 2019 · 4 comments
Closed

Timely worker synchronization for new dataflow creation #6

rjnn opened this issue Mar 13, 2019 · 4 comments

Comments

@rjnn
Copy link
Contributor

rjnn commented Mar 13, 2019

All timely workers need to create new dataflows in the exact same order. This is arguably an implementation flaw - it is possible that timely workers could have a namespace and new dataflows could register against this namespace. But today, there is a single global counter from which channels are assigned for dataflows. Thus, every timely worker must create new dataflows in the same order.

In order to facilitate this ordering, the interactive crate has a meta-dataflow in which workers insert their intents to create new dataflows. This meta-dataflow assigns an ordering, which workers use to coordinate.

One disadvantage of this hack is that dataflows are not persistent. if we want to be able to recover from crashes, we want to be able to see the history of dataflows we created (i.e. the views). Thus, replacing this meta-dataflow framework with a consensus-replicated durable log fulfills this other goal, as well as satisfying the original desideratum of having a consistent shared ordering.

We should rip out the meta-dataflow, and instead put all dataflow creation commands through a shared durable log (i.e. on top of zookeeper/etcd).

@frankmcsherry
Copy link
Contributor

Btw, it is probably worth spec'ing out which parts of Materialize exist in which offerings. My sense is:

  1. There is a very reasonable, non-durable, non-autoscaling, non-clustermanaged thing that each user could try out on their laptop / in a browser / etc., which is a simple binary download without GBs of Apache cruft, and which gets them up and running in minutes.
  2. There is also a very reasonable durable, autoscaling, cluster managed thing that wraps around this that is worth paying for, and that serious people would agree that they need, even though the above did not crash during their demo runs.

That's one hypothetical partitioning of value, but with something like that in mind we can more clearly determine which features need to go where. At the moment interactive is the closest thing to (1.) above, but it isn't obvious (to me) that we want to add all of the features in to it, both because they may make the initial experience more awkward, and because it may be "useful" to hold them back.

@rjnn
Copy link
Contributor Author

rjnn commented Mar 13, 2019

Currently, @benesch is working on an intermediate API, 'metastore'. The intention is for metastore to be backed by zookeeper for (2). But it's equally possible to have a shim 'zookeeper' that's a linked-list that satisfies (1) with no additional Apache cruft.

The reason why zookeeper is such an attractive option is that if the primary streaming data ingest layer is Kafka, then Kafka users typically have Zookeeper running anyhow.

@benesch
Copy link
Member

benesch commented Apr 1, 2019

Closing this out. I'm pretty happy with the end result, which uses ZooKeeper to store the metadata, but then the Chosen Node (worker node 0) pushes them through a timely sequencer for a consistent ordering.

@benesch benesch closed this as completed Apr 1, 2019
@benesch
Copy link
Member

benesch commented Apr 1, 2019

For posterity: getting ZooKeeper to expose a consistent stream of events is literally impossible. It's not built for that. I think it might be possible with etcd, but even if it is, it's definitely awkward. If we want that, we'd need to bundle a Raft implementation, and that seems like serious overkill. The solution of using ZooKeeper for persistence and timely's sequencer for ordering actually seems like the best option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants