New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Advise on building a fail-over solution on top of NFSdb (with two+ nodes) #29
Comments
Hi Venkatt, The bad news is that at present NFSdb can only run in-process, so its failover is that of parent process failover mechanism. There is on-going effort to have nfsdb run out-of-process, in which case it will have its own client with failover built-in. The good news is that getting a client to be a server at the same time can be done pretty easily. I'll write up an example very shortly. Server recovery after fail over is relatively easy. Because updates are incremental it is possible to wrap a journal in a client instance and have it replicate from former client-now-server. Multicast is not supported for data, not yet anyway. This is partly because nfsdb protocol allows each client to have different state and replication is tailored to state of client. But in BAU over dedicated network link i guess multicast will have an advantage. May be one day :) |
Hi Venkatt, I had a recap on replication and failover and it isn't possible to fail over writer automatically. Client can reconnect if server goes down, but that is all that is atumatic in current version. Making automated failover for your scenario is not difficult and there is a plan to do it now! Here is sample logic:
Let me know if this works for you. Vlad |
Vlad: I believe that the last model you have described with "ClusterNode" will work. So is "ClusterNode" a code/class that you will be adding in an upcoming release or is this something I can develop with your guidance and/or examples? Please advise. Venkatt |
Implementing cluster will require changes in both server and client code, so i'll do that. Changes are not very complex, so it won't be long. It should be possible to announce cluster winner to other nodes. After voting for master all remaining nodes will have to connect their clients there and this information can be published to the app code. I'll post more details on usage model very soon, need to prove that all the parts work first. |
Hi Venkatt, I have an example of creating a cluster of producers for you: ClusteredProducerMain.java Although it is for two producers, you can extend it for three or more as you need. Important thing to be aware of that each cluster node must have their unique integer instance id. It is used in logging and also for tie break voting in case two nodes start up at the same time. As things stand it is safe to have nodes started by either monitoring tools or schedulers, if they come up at the same time they will resolve their roles automatically. Shutdown procedure is as graceful as possible and will wait for all in-flight network transmissions before cutting the wire. I will expose a timeout API though in case waiting is not in option. In this case in-flight transactions may be lost. There is more work needed to make reades fail over between cluster nodes and automatically error correct. But that should not take long at all. Let me know if you think current API can improve in some way or if anything doesn't work for you. Regards, |
Hi Vlad, Thank you very much on devising this solution. I will try this out either Best Regards, On Thu, Jan 22, 2015 at 12:07 PM, Vlad Ilyushchenko <
|
this feature is complete, lets open another issue should we discover defects with it. |
…d Chang and Reberts algo (http://en.wikipedia.org/wiki/Chang_and_Roberts_algorithm). Modifications include: - dead node detection - acks - hop counting to prevent infinite loops - leader reassertion to prevent current leader from being demoted.
Hello:
I've been testing the replication options available in NFSdb.
In order to use NFSdb in a production environment, fail-over and data-redundancy becomes an question.
Is there any plans to implement a fail-over implementation or sample code/pseudo-code that can be shared that will help me in building a fail-over solution on top of NFSdb?
Scenario:
Is this possible using the (multicast) foundation in NFSdb.
Any advise on how this can be achieved?
Thank you and looking forward to your feedback.
Venkatt
The text was updated successfully, but these errors were encountered: