Advise on building a fail-over solution on top of NFSdb (with two+ nodes) #29

vguhesan · 2015-01-12T20:40:35Z

Hello:
I've been testing the replication options available in NFSdb.

I like the benchmark numbers I see on NFSdb
I like the fact that I can setup a client to replicate the data onto one or many nodes

In order to use NFSdb in a production environment, fail-over and data-redundancy becomes an question.

Is there any plans to implement a fail-over implementation or sample code/pseudo-code that can be shared that will help me in building a fail-over solution on top of NFSdb?

Scenario:

{Node1} has a server running NFSdb
{Node2} is running a client replicating the database
Node1 goes down, I need Node2 to switch from a client mode to a server mode.
More data is appended into Node2 and at some later point of time Node1 is started
Node1 needs to be aware that there is already a master running on Node2 and join it as a client (instead of a server)

Is this possible using the (multicast) foundation in NFSdb.

Any advise on how this can be achieved?

Thank you and looking forward to your feedback.

Venkatt

bluestreak01 · 2015-01-13T16:42:30Z

Hi Venkatt,

The bad news is that at present NFSdb can only run in-process, so its failover is that of parent process failover mechanism. There is on-going effort to have nfsdb run out-of-process, in which case it will have its own client with failover built-in.

The good news is that getting a client to be a server at the same time can be done pretty easily. I'll write up an example very shortly.

Server recovery after fail over is relatively easy. Because updates are incremental it is possible to wrap a journal in a client instance and have it replicate from former client-now-server.

Multicast is not supported for data, not yet anyway. This is partly because nfsdb protocol allows each client to have different state and replication is tailored to state of client. But in BAU over dedicated network link i guess multicast will have an advantage. May be one day :)

bluestreak01 · 2015-01-14T11:13:54Z

Hi Venkatt,

I had a recap on replication and failover and it isn't possible to fail over writer automatically. Client can reconnect if server goes down, but that is all that is atumatic in current version.

Making automated failover for your scenario is not difficult and there is a plan to do it now! Here is sample logic:

Node 1 and Node 2 are identical, they both will run "ClusterNode", which is both server and client
On startup nodes automatically decide who master is, this will depend on startup order.
"ClusterNode" will signal application that it is a master and provide you with a way to get JournalWriter(s)
On client "ClusterNode" will signal your code that is in standby mode
Both server and client will maintain heartbeat and once it is lost "client" will become server and will notify your code of state change
When old server is restarted it will assume role of client automatically due to other node being present.
Client will automatically recover itself and will start replicating from server node.

Let me know if this works for you.

Vlad

vguhesan · 2015-01-15T02:34:29Z

Vlad:

I believe that the last model you have described with "ClusterNode" will work. So is "ClusterNode" a code/class that you will be adding in an upcoming release or is this something I can develop with your guidance and/or examples? Please advise.
What I can do on my application side is programmatically determine if the underlying instance is running on master or on standby mode. If running on standby mode, I can have my web application send a HTTP-302 redirect for the REST API onto the master server which will consume the POST data normally.
Question - in the example you had described, is there a way, I could get a list of all other nodes participating in this group? For example, if I POST to the client, can it determine what the IP for the master is and send the redirect URL with the correct master IP?
Please advise on how I can proceed forward with the "ClusterNode" implementation.
Thanks in advance.

Venkatt

bluestreak01 · 2015-01-16T02:41:11Z

Implementing cluster will require changes in both server and client code, so i'll do that. Changes are not very complex, so it won't be long.

It should be possible to announce cluster winner to other nodes. After voting for master all remaining nodes will have to connect their clients there and this information can be published to the app code.

I'll post more details on usage model very soon, need to prove that all the parts work first.

bluestreak01 · 2015-01-22T17:07:31Z

Hi Venkatt,

I have an example of creating a cluster of producers for you: ClusteredProducerMain.java

Although it is for two producers, you can extend it for three or more as you need. Important thing to be aware of that each cluster node must have their unique integer instance id. It is used in logging and also for tie break voting in case two nodes start up at the same time.

As things stand it is safe to have nodes started by either monitoring tools or schedulers, if they come up at the same time they will resolve their roles automatically.

Shutdown procedure is as graceful as possible and will wait for all in-flight network transmissions before cutting the wire. I will expose a timeout API though in case waiting is not in option. In this case in-flight transactions may be lost.

There is more work needed to make reades fail over between cluster nodes and automatically error correct. But that should not take long at all.

Let me know if you think current API can improve in some way or if anything doesn't work for you.

Regards,
Vlad

vguhesan · 2015-01-22T18:38:54Z

Hi Vlad,

Thank you very much on devising this solution. I will try this out either
tonight or in the next few days and get back to you.

Best Regards,
Venkatt Guhesan

On Thu, Jan 22, 2015 at 12:07 PM, Vlad Ilyushchenko <
notifications@github.com> wrote:

Hi Venkatt,

I have an example of creating a cluster of producers for you:
ClusteredProducerMain.java
https://github.com/NFSdb/nfsdb/blob/master/nfsdb-examples/src/main/java/org/nfsdb/examples/network/cluster/ClusteredProducerMain.java

Although it is for two producers, you can extend it for three or more as
you need. Important thing to be aware of that each cluster node must have
their unique integer instance id. It is used in logging and also for tie
break voting in case two nodes start up at the same time.

As things stand it is safe to have nodes started by either monitoring
tools or schedulers, if they come up at the same time they will resolve
their roles automatically.

Shutdown procedure is as graceful as possible and will wait for all
in-flight network transmissions before cutting the wire. I will expose a
timeout API though in case waiting is not in option. In this case in-flight
transactions may be lost.

There is more work needed to make reades fail over between cluster nodes
and automatically error correct. But that should not take long at all.

Let me know if you think current API can improve in some way or if
anything doesn't work for you.

Regards,
Vlad

—
Reply to this email directly or view it on GitHub
#29 (comment).

…s older.

bluestreak01 · 2015-02-15T05:02:30Z

this feature is complete, lets open another issue should we discover defects with it.

…d Chang and Reberts algo (http://en.wikipedia.org/wiki/Chang_and_Roberts_algorithm). Modifications include: - dead node detection - acks - hop counting to prevent infinite loops - leader reassertion to prevent current leader from being demoted.

bluestreak01 added a commit that referenced this issue Jan 16, 2015

cluster bits (issue #29)

7328cd2

bluestreak01 added a commit that referenced this issue Jan 21, 2015

(issue #29) cluster controller

45cba78

bluestreak01 added a commit that referenced this issue Jan 21, 2015

(issue #29) disabled cluster multicast

c330e1e

bluestreak01 added a commit that referenced this issue Jan 22, 2015

(issue #29) increased logging, reduced dataset size for busy test

b1a072a

bluestreak01 added a commit that referenced this issue Jan 22, 2015

(issue #29) increased logging

f9e3925

bluestreak01 added a commit that referenced this issue Jan 22, 2015

(issue #29) adapting tie-break test to slow servers

15b5d58

bluestreak01 added a commit that referenced this issue Jan 22, 2015

(issue #29) adapting tie-break test to slow servers

236a26f

bluestreak01 added a commit that referenced this issue Jan 22, 2015

(issue #29) deterministic tie break asserts

5ba55dd

bluestreak01 added a commit that referenced this issue Jan 22, 2015

(issue #29) producer cluster example

9d9ff65

bluestreak01 added a commit that referenced this issue Jan 22, 2015

(issue #29) producer cluster example with shutdown hook

29d9544

bluestreak01 added a commit that referenced this issue Jan 26, 2015

(issue #29) client failover, network config refactor

dfbabca

bluestreak01 added a commit that referenced this issue Jan 26, 2015

(issue #29) disabled multicast in cluster tests

18a6a9e

bluestreak01 added a commit that referenced this issue Jan 28, 2015

(issue #29) multi-node cluster voting

6415aba

bluestreak01 added a commit that referenced this issue Jan 28, 2015

(issue #29) increased test timeout

4874f0d

bluestreak01 added a commit that referenced this issue Jan 28, 2015

(issue #29) speeding up test

bb88593

bluestreak01 added a commit that referenced this issue Jan 28, 2015

(issue #29) voting algo bugfix

3212148

bluestreak01 added a commit that referenced this issue Jan 28, 2015

(issue #29) fixing build

1bde960

bluestreak01 added a commit that referenced this issue Jan 28, 2015

(issue #29) updated cluster voting algo to be more robust

c42e0f9

bluestreak01 added a commit that referenced this issue Jan 29, 2015

(issue #29) logging

7f9badf

bluestreak01 added a commit that referenced this issue Feb 13, 2015

Issue #29, automatic client rollback, writing discard.txt if server i…

387de24

…s older.

bluestreak01 self-assigned this Feb 13, 2015

bluestreak01 added this to the Release 3.0 milestone Feb 13, 2015

bluestreak01 added the New feature Feature requests label Feb 13, 2015

bluestreak01 added a commit that referenced this issue Feb 15, 2015

issue #29 - fixes after soak test with 2bn row table

672f6ab

bluestreak01 closed this as completed Feb 15, 2015

DzmitryShylovich mentioned this issue Nov 17, 2015

Single write optimization #36

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advise on building a fail-over solution on top of NFSdb (with two+ nodes) #29

Advise on building a fail-over solution on top of NFSdb (with two+ nodes) #29

vguhesan commented Jan 12, 2015

bluestreak01 commented Jan 13, 2015

bluestreak01 commented Jan 14, 2015

vguhesan commented Jan 15, 2015

bluestreak01 commented Jan 16, 2015

bluestreak01 commented Jan 22, 2015

vguhesan commented Jan 22, 2015

bluestreak01 commented Feb 15, 2015

Advise on building a fail-over solution on top of NFSdb (with two+ nodes) #29

Advise on building a fail-over solution on top of NFSdb (with two+ nodes) #29

Comments

vguhesan commented Jan 12, 2015

bluestreak01 commented Jan 13, 2015

bluestreak01 commented Jan 14, 2015

vguhesan commented Jan 15, 2015

bluestreak01 commented Jan 16, 2015

bluestreak01 commented Jan 22, 2015

vguhesan commented Jan 22, 2015

bluestreak01 commented Feb 15, 2015