Skip to content
Andreas Ronge edited this page Feb 26, 2012 · 2 revisions

Batch Insert

endprologue.

Introduction

Neo4j has a batch insert mode that drops support for transactions and concurrency in favor of insertion speed. This is useful when you have a big dataset that needs to be loaded once.
In our experience, the batch inserter will typically inject data around five times faster than running in normal transactional mode.

TIP: Be aware that the BatchInserter is intended use is for initial import of data. It is non thread safe, non transactional and failure to successfully invoke shutdown (properly) results in corrupt database files.

Nodes/Properties and Relationships

The Neo4j::Batch::Inserter has a simple
API for creating nodes,properties,relationships and lucene indexes.

Example

inserter = Neo4j::Batch::Inserter.new(storage, config)
node_a = inserter.create_node('name' => 'andreas')
node_c = inserter.create_node('name' => 'craig')'
inserter.create_rel(:friends, node_a, node_c, :since => '2009')

node_a and node_b are simply Fixnum objects – the node id.
The property hash keys should be strings.

NodeMixin/RelationshipMixin and Model

The Inserter#create_node takes an extra parameter, the class you want to index. This can be both a relationship or a node class.

Example:

class Person
  include Neo4j::NodeMixin
end

inserter = Neo4j::Batch::Inserter.new # use the default Neo4j::Config and storage location
node_a = inserter.create_node({'name' => 'andreas'}, Person)
node_c = inserter.create_node({'name' => 'craig', Person)

This will add the Neo4j.rb internal property _classname which is needed to map nodes to Ruby classes.

Creating relationships using the RelationshipMixin works in a similar way.

my_relationship = inserter.create_rel(rel_type, from_node, to_node, props_hash, MyRelationshipClass)

Relationships, has_n

class Person
  include Neo4j::NodeMixin
  has_n(:friends).to(Person)
end

inserter = Neo4j::Batch::Inserter.new
node_a = inserter.create_node({'name' => 'andreas'}, Person)
node_c = inserter.create_node({'name' => 'craig'}, Person)'

inserter.create_rel(Person.friends, node_a, node_b, :since => '2009')

This create a relationship of type ‘Person#friend’ from node_a to node_b.
Notice the Person.friends class method was generated because of the has_n(:friends).to(Person) above.
Using a declared has_n(x).to(y) relationship will add a prefix on the relationships (‘Person#friends’)

Index, Neo4j::Node

The Neo4j::Batch::Inserter does automatically
create lucene index if you have declared them.

Example:

# declare an (exact by default) index on property name on Neo4j::Node (the java Neo4j node objects)
Neo4j::Node.index :name

#  You can now add lucene index using the batch inserter, example:
inserter = Neo4j::Batch::Inserter.new(storage, config)
# create_node will now index property name
node_a = inserter.create_node('name' => 'andreas')

Index on NodeMixin or Model

Adding an index on your Neo4j::Rails::Model or your Neo4j::NodeMixin class works
in a similar manner to indexing a Neo4j::Node

Example:

class Person
  include Neo4j::NodeMixin
  index :desc => :fulltext
end

inserter = Neo4j::Batch::Inserter.new(storage, config)
 # the next line will add a lucene index on field desc
node_a = inserter.create_node(:desc => 'bla bla', Person)

Using Index

After the #index_flush has been call one can use the index to find nodes.
There are two methods for searching, index_get use a simply key value,
and index_query uses the full lucene syntax.

Example:

inserter = Neo4j::Batch::Inserter.new
inserter.index_flush
node_a = inserter.index_get('name', 'andreas').first
node_b = inserter.index_query('name: craig').first

Shutdown

It is important to call the Neo4j::Batch::Inserter#shutdown after using the inserter.
Failing to invoke the shutdown method may corrupt the store !

Example

inserter = Neo4j::Batch::Inserter.new

# lots of operation using the inserter
inserter.shutdown

The shutdown method will also shutdown all index inserters.

Clone this wiki locally