-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The clustering megathread #1911
Comments
A lot of these suggestions involve adding features to the cluster administration code. We should consider making it possible for external programs to bypass/replace the C++ cluster administration code. I made a separate issue to keep the discussion organized. See #1913. |
So I went through and identified the biggest problems, prioritized them, and have a rough outline of a proposal in 6 phases: Phase 1: Directory and Blueprint optimization
Phase 2: Auto-Failover
Process to elect a new master:
On peer reconnect
Phase 3: ReQL Cluster API
Phase 4: Minimal downtime on blueprint change
Phase 5: Blueprint generation without full connectivity
Phase 6: More Directory and Blueprint optimization
As you can see, there are a number of open questions. The biggest are: Phases 1 and 6 are optimization changes, but phase 1 is there because it should be relatively easy and hits one of the biggest bottlenecks in the current state of things. Phases 2 and 3 are necessary for a production-ready product. I wouldn't say phases 4 and 5 are necessary for production-readiness, but I would strongly recommend them. |
@Tryneus This sounds like a great plan. Depending on how difficult phase 5 is, we should probably do it earlier because it impacts users a lot. |
Phase 1 sounds like a really good optimization. When removing "nothing" from the directory, there is a small complication to watch out for. When the current system is switching between two roles, it temporarily has no directory entry. For example, when it is switching from "primary" to "secondary", it first removes the "primary" entry and then adds the "secondary" entry, so it briefly has no entry at all. It's important to distinguish between this case and a true "nothing". This should be easy to fix by inserting a placeholder during the switch-over, or perhaps by leaving the old entry until the new entry is ready. Your description for Phase 2 is not very clear, and a lot of details are missing. Would you mind writing it up in more detail? Perhaps it should be a separate issue, to keep the discussion organized. You propose to move the web UI into a separate process (presumably Python) in Phase 3. I suggest you consider moving the suggester and auto-failover logic into the separate process as well. Then, if you do Phase 3 before Phase 2, the consensus system could be integrated in Python rather than C++, which might be nicer. |
I don't think @Tryneus proposed that. I think he meant implementing a ReQL API to conveniently manage the cluster via client drivers, and then switching the WebUI and the Admin UI to use this API instead of writing to semilattices directly. That doesn't require moving the WebUI out. |
Oh! I thought he was proposing to partially implement the idea from #1913. Oops. |
I think #1913 is a good idea, and it could even fit into one of the phases I proposed above, but it is also orthogonal to a lot of these features. It would be really nice if we could use some higher-level constructs for dealing with the goals, blueprints, and even consensus (the libraries available for C/C++ are pretty sparse), but at the same time, it wouldn't necessarily affect the user experience for some time. |
I agree. The advantage of doing it soon is that the longer we wait, the more code we have to write in C++ and later port to Python. The disadvantage is that it will take a lot of time, and there is no direct payoff in terms of user experience or making any particular issue easier to solve. So the correct answer depends on the development schedule, which I don't know very much about. I just wanted to suggest that you keep it in mind. 😃 |
I'm extremely skeptical of outside paxos libraries. libpaxos3 looks like it was written by some dude (http://atelier.inf.usi.ch/~sciascid/), has like 4 simple tests (https://bitbucket.org/sciascid/libpaxos/src/20414d195443e9fe82973f0a0be8c5a3bd24e954/unit/?at=master), doesn't appear to have a bug tracker, had a super-low-traffic mailing list (http://sourceforge.net/mailarchive/forum.php?forum_name=libpaxos-general), and has bug reports in said mailing list that don't seem to all be resolved. |
If we are forced to choose an external library, we should also look at https://github.com/logcabin/logcabin . It claims it isn't ready for production use, which I honestly take as a positive signal in this case, and it's written by Diego Ongaro, who's one of the authors on https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf . |
@mlucy I would also vote for Logcabin. It's the reference implementation of Raft. |
Also note #2083 is a part of this. |
I talked a little with @timmaxw last Monday and asked him a few questions. I don't remember all of them, but here are a few (feel free to open an issue if they are relevant)
I'm pretty sure there was other stuff, but I somehow can't remember. I'll add another comment if something comes up. |
The ideas in here have pretty much been translated into the ReQL admin interface that we shipped with 1.16 and the Raft rework that @timmaxw has been working on for the past months. There are some separate issues (query routing, hash sharding etc.) that are already tracked elsewhere. I think this thread has outlived its usefulness. |
Our clustering implementation has a lot of limitations, both in performance/scalability, and operations. We have lots of open issues about various problems. Many of them are related on the product level, many are related on the code/architecture level, many are unrelated. We'll have to do significant reworking and I'd like to start a discussion about the overall refactor/rearchitecture here.
Here are the issues we'll need to fix:
EDIT:
EDIT2:
We could make this a bazaar or a cathedral or anything in between. Pinging @Tryneus and @timmaxw. I'd like your thoughts and proposal on how to go about this.
EDIT3:
The text was updated successfully, but these errors were encountered: