Skip to content
Andreas Ronge edited this page Feb 26, 2012 · 1 revision

High Availability Cluster

This page contains material copied from the neo4j wiki It has been adapted to be used with Neo4j.rb.

endprologue.

Introduction

This feature is only available in the neo4j-enterprise edition
Please add a dependency to the neo4j-enteprise gem and require it (in upcomming 2.0.0 release)

The Neo4j High Availability (HA) project has the following two goals:

  1. Provide a fault-tolerant database architecture, where several Neo4j slave databases can be configured to be exact replicas of a single Neo4j master database. This allows the end-user system to be fully functional and both read and write to the database in the event of hardware failure.
  2. Provide a horizontally scaling read-mostly architecture that enables the system to handle much more read load than a single Neo4j database.

Neo4j HA uses a single master and multiple slaves. Both the master and the slaves can accept write requests. A slave handles a write by synchronizing with the master to preserve consistency. Updates to slaves are asynchronous so a write from one slave is not immediately visible on all other slaves. This is the only difference between HA and single node operation (all other ACID characteristics are the same).

Installation of ZooKeeper

The example/ha-cluster example contains
a complete configuration and setup for running ZooKeeper.

You can also set up zookeeper yourself by using the following instructions:

Go to zookeer, select a mirror and grab the 3.3.2 release.

Unpack somewhere and create three config files called server1.cfg, server2.cfg and server3.cfg in the conf directory:

#server1.cfg
tickTime=2000
initLimit=10
syncLimit=5
 
dataDir=data/zookeeper1
clientPort=2181
 
server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890

The other two config files will have a different dataDir and clientPort set but the other parameters identical to the first one:

#server2.cfg
#...
dataDir=data/zookeeper2
clientPort=2182
#...
 
#server3.cfg
dataDir=data/zookeeper3
clientPort=2183

Create the data dirs:

zookeeper-3.3.2$ mkdir -p data/zookeeper1 data/zookeeper2 data/zookeeper3

Next we need to create a file in each data directory called “myid” that contains an id for each server equal to the number in “server.1” “server.2” and “server.3” from the configuration files.

zookeeper-3.3.2$ echo '1' > data/zookeeper1/myid
zookeeper-3.3.2$ echo '2' > data/zookeeper2/myid
zookeeper-3.3.2$ echo '3' > data/zookeeper3/myid

We are now ready to start the ZooKeeper instances:

zookeeper-3.3.2$ java -cp lib/log4j-1.2.15.jar:zookeeper-3.3.2.jar org.apache.zookeeper.server.quorum.QuorumPeerMain conf/server1.cfg &
zookeeper-3.3.2$ java -cp lib/log4j-1.2.15.jar:zookeeper-3.3.2.jar org.apache.zookeeper.server.quorum.QuorumPeerMain conf/server2.cfg &
zookeeper-3.3.2$ java -cp lib/log4j-1.2.15.jar:zookeeper-3.3.2.jar org.apache.zookeeper.server.quorum.QuorumPeerMain conf/server3.cfg &

For more information on ZooKeeper see here

Configure Neo4j.rb

You must set the Neo4j::Config['ha.db']=true configuration in order to start a HA clustered database (HighlyAvailableGraphDatabase) instead of a local graph database.

If the 'ha.db' configuration value is set to true it will also use the following configuration properties:

ha.db: true ha.machine_id: 2 ha.server: 'localhost:6002' ha.zoo_keeper_servers: 'localhost:2181,localhost:2182,localhost:2183'

The default configuration can be found here

Chef and Vagrant Scripts

Check this cookbook

Gotchas

You should only write to slave nodes.
You can check if a node is a slave or master by

Neo4j.management(org.neo4j.management.HighAvailability).is_master

You can also get this info from the jconsole or neo4j-shell and the hainfo command, check
check the monitoring page

For more information, check the neo4j wiki

Clone this wiki locally