Neo4j::Core Lucene

andreasronge edited this page Aug 16, 2012 · 5 revisions
Clone this wiki locally

Neo4j comes included with the lucene document database.
A common use case for using lucene is searching for one node and from that node traverse or use a cypher query.

Define an Index

The lucene integration uses the beforeCommit hook, see http://docs.neo4j.org/chunked/1.7/transactions-events.html. The method Neo4j::Node.trigger_on is used to tell neo4j.rb which nodes or relationship it will index.
The Neo4j::Node.index method is used to tell which properties are going to be index and which type of index should be used.

Example:

Neo4j::Node.trigger_on(:typex => 'MyTypeX')
Neo4j::Node.index :name
Neo4j::Node.index :description, :type => :fulltext
Neo4j::Node.index :age, :field_type => Fixnum

Notice: only properties that have been set are added to the index. Thus a property with no set value and no default value will NOT be matched by a wildcard query (*)!

:trigger_on

The declaration above will tell neo4j.rb that only nodes which have the property typex value MyTypeX will be indexed. When the before commit hook finds a node with that property it will index each property declared with the Neo4j::Node.index method. You can declare several properties and each property can take several values. This is used for example when the same property (_classname) can be triggered by different values (the name of each subclass) – see for example the use of trigger_on in the Neo4j::NodeMixin implemementation.

:type

The default type of index being used is exact. The example above declares index on property name and description with different types of indexes (exact and fulltext).

:field_type

All values are indexed as Strings in lucene by default. If you want to index a value as a numeric value you can specify that with setting :field_type to Fixnum or Float. By doing that allows you to use lucene range searches.

Notice, if you are using the Neo4j::NodeMixin or Neo4j::Rails::Model@ you define an index using the property method instead.

Notice You can’t find using a String on none String field_types. Instead you must use hash queryes (e.g. Neo4j::Node.find(:age => 4) !

Custom Index

Instead of using the Neo4j::Node.index or Neo4j::Relationship.index methods you can define your own class for indexing. This allows you to have different lucene index files and configurations.

Example:

class MyIndex
  extend Neo4j::Core::Index::ClassMethods
  include Neo4j::Core::Index

  self.node_indexer do
    index_names :exact => 'myindex_exact', :fulltext => 'myindex_fulltext'
    trigger_on :myindex => true # trigger on all nodes having property myindex == true
  end

  index :name
end

This class is also used when searching, e.g. MyIndex.find(:name => 'andreas').first

For index on relationship use the rel_indexer method instead of the node_indexer method.

See also neo4j-creating-custom-index blog

Search

Lucene Query Language

You can use the lucene query language

Let say you have defined the index on the Neo4j::Node class like this:

Neo4j::Node.trigger_on(:typex => 'MyTypeX')
Neo4j::Node.index(:name)

And you have created a node like this:

a = Neo4j::Node.new(:name => 'andreas', :typex => 'MyTypeX')

Then you can find it using the lucene query syntax like this:

 Neo4j::Node.find('name: andreas') {|result| puts result.first[:name] }

 # or same if this if you want to close the lucene connection yourself
 result = Neo4j::Node.find('name: andreas')
 puts result.first[:name]
 result.close

Notice It is important that you close the lucene connection. Either use the block approach or call the close method on the query result.

Notice The Neo4j::Rails::Model.find and Neo4j::Rails::Relationship.find methods does close the lucene connection automatically using a Rack.

Search in Property Arrays.

A property in neo4j can be an array of (primitive) values.
Lucene does also support searching in arrays of values.

Example:

# don't forget declare index on things: Neo4j::Node.index :things
Neo4j::Transaction.new do
  Neo4j::Node.new(:things => ['aaa', 'bbb', 'ccc'])
end

Neo4j::Node.find("things: bbb"){|r| puts r.first}

You can also search with an array of values. Example let find a node with property :things having value "bbb" or "ccc" :

Neo4j::Node.find(:things => ["bbb", "ccc"]){|r| puts r.first}

The search will result in an OR lucene search for that property.

Fulltext and Exact

By default indexes are of type :exact which is great for indexing keywords etc.
To index each word in a text you should use a fulltext index. Fulltext uses white-space tokenizer in its analyzer. Add the type :fulltext (:exact is default) when you declare the index and in the find method.

Example:

MyIndex.index :name, :type => :fulltext
MyIndex.find('name: andreas', :type => :fulltext).first #=> andreas

Notice You must specify the :type in the find method unless you are using a :exact index.

Hash Queries

By using a hash instead of String inj the find method you can search on several properties at once (a compound AND lucene query).

MyIndex.find(:name => 'asd').or(:wheels => 8).first.should == thing3

Compound Queries

You can make compound queries using the and or and not method on the query result.

Example

MyIndex.find(:name => 'asd').or(:wheels => 8).first.should == thing3

Range Search

If you want to do a numerical range search you must declare the index field_type to Fixnum or Float
For Neo4j::Rails::Model and Neo4j::NodeMixin you do that with the :type config on the property instead of using the field_type on the index method.

There are two ways of doing range search – using the between method or using the Ruby Range class.

Example, using between

MyIndex.find(:age).between(2, 5)

Example, using the Ruby Range class:

MyIndex.find(:name => 'thing').and(:wheels => (9..15)).should be_empty

Notice Range queries on none String (e.g. :field_type => Fixnum) is not possible using a String lucene query, instead you must use a hash query, as shown above.

Sorting

Use the asc or desc method on the query result, example:

MyIndex.find('name: *@gmail.com').asc(:name).desc(:age)

Notice you must have an index on the sorted fields.

Manually Indexing

You can instead of waiting for the transaction to finish manually index a node or relationship.

Example (from RSpec):

new_node = Neo4j::Node.new
new_node[:name] = 'Kalle Kula'
new_node.add_index(:name)
new_node.rm_index(:name)
new_node[:name] = 'lala'
new_node.add_index(:name)
Neo4j::Node.find('name: lala').first.should == new_node
Neo4j::Node.find('name: "Kalle Kula"').first.should_not == new_node

Optimization

If you are looping thru a lot of nodes you might get better performance by not loading the Ruby wrappers
around the Java nodes.

MyIndex.find('name: andreas', :type => :fulltext, :wrapped => false)

When using the :wrapped => false parameter the find method will return a Java org.neo4j.graphdb.index.IndexHit instance
(which works like an Ruby Enumerable so you can use the normal each, collect etc.. methods)

Lucene Configuration

It is possible to create your own lucene configuration.
Example, see the configuration for fulltext and exact indexing in the Neo4j::Config[:lucene]
You can add your own lucene indexing configuration in the Neo4j::Config and use it with the index keyword.

Neo4j::Config[:lucene][:my_index_type] = ...
  
class Person
   index :name, :type => :my_index_type
end

I have not tested this :-)

Gotchas

Nil values will never be indexed !