Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
A nicer Ruby interface for the Xapian full text indexer
Ruby

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
examples
lib
spec
LICENSE
README.rdoc
Rakefile
xapian-fu.gemspec

README.rdoc

Xapian Fu

XapianFu is a Ruby library for working with Xapian databases. It builds on the GPL licensed Xapian Ruby bindings but provides an interface more in-line with “The Ruby Way”(tm) and is considerably easier to use.

For example, you can work almost entirely with Hash objects - XapianFu will handle converting the Hash keys into Xapian term prefixes when indexing and when parsing queries.

It also handles storing and retrieving hash entries as Xapian::Document values. XapianFu basically gives you a persistent Hash with full text indexing (and ACID transactions).

Installation

sudo gem install xapian-fu

Documentation

XapianFu::XapianDb is the corner-stone of XapianFu. A XapianDb instance will handle setting up a XapianFu::XapianDocumentsAccessor for reading and writing documents from and to a Xapian database. It makes use of XapianFu::QueryParser for parsing and setting up a query.

XapianFu::XapianDoc represents a document retrieved from or to be added to a Xapian database.

Basic usage example

Create a database, add 3 documents to it and then search and retrieve them.

db = XapianDb.new(:dir => 'example.db', :create => true,
                  :store => [:title, :year])
db << { :title => 'Brokeback Mountain', :year => 2005 }
db << { :title => 'Cold Mountain', :year => 2004 }
db << { :title => 'Yes Man', :year => 2008 }
db.flush
db.search("mountain").each do |match|
  puts match.values[:title]
end

Ordering of results

Create an in-memory database, add 3 documents to it and then search and retrieve them in year order.

db = XapianDb.new(:store => [:title], :sortable => [:year])
db << { :title => 'Brokeback Mountain', :year => 2005 }
db << { :title => 'Cold Mountain', :year => 2004 }
db << { :title => 'Yes Man', :year => 2008 }
db.search("mountain", :order => :year)

will_paginate support

Simple integration with the will_paginate Rails helpers.

@results = db.search("mountain", :page => 1, :per_page => 5)
will_paginate @results

Transactions support

Ensure that a group of documents are either entirely added to the database or not at all - the transaction is aborted if an exception is raised inside the block. The documents only become available to searches at the end of the block, when the transaction is committed.

db = XapianDb.new(:store => [:title, :year], :sortable => [:year])
db.transaction do
  db << { :title => 'Brokeback Mountain', :year => 2005 }
  db << { :title => 'Cold Mountain', :year => 2004 }
  db << { :title => 'Yes Man', :year => 2008 }
end
db.search("mountain")

Complete field definition examples

Fields can be described in more detail using a hash. For example, telling XapianFu that a particular field is a Date, Fixnum or Bignum will allow very efficient on-disk storage and will ensure the same type of object is instantiated when returning those stored values. And in the case of Fixnum and Bignum, allows you to order search results without worrying about leading zeros.

db = XapianDb.new(:fields => { 
                               :title => { :store => true },
                               :released => { :type => Date, :store => true },
                               :votes => { :type => Fixnum, :store => true }
                             })
db << { :title => 'Brokeback Mountain', :released => Date.parse('13th January 2006'), :votes => 105302 }
db << { :title => 'Cold Mountain, :released => Date.parse('2nd January 2004'), :votes => 45895 }
db << { :title => 'Yes Man', :released => Date.parse('26th December 2008'), :votes => 44936 }
db.search("mountain", :order => :votes)

Simple max value queries

Find the document with the highest :year value

db.documents.max(:year)

Search examples

Search on particular fields

db.search("title:mountain year:2005")

Boolean AND (default)

db.search("ruby AND rails")
db.search("ruby rails")

Boolean OR

db.search("rails OR sinatra")
db.search("rails sinatra", :default_op => :or)

Exclude certain terms

db.search("ruby -rails")

Wildcards

db.search("xap*")

Phrase searches

db.search("'someone dropped a steamer in the gene pool'")

And any combinations of the above:

db.search("(ruby OR sinatra) -rails xap*")

ActiveRecord Integration

XapianFu always stores the :id field, so you can easily use it with something like ActiveRecord to index database records:

db = XapianDb.new(:dir => 'posts.db', :create => true)
Post.all.each { |p| db << p.attributes }
docs = db.search("custard")
docs.each_with_index { |doc,i| docs[i] = Post.find(doc.id) }

Combine it with the max value search to do batch delta updates by primary key:

db = XapianDb.new(:dir => 'posts.db')
latest_doc = db.documents.max(:id)
new_posts = Post.find(:all, :conditions => ['id > ?', lastest_doc.id])
new_posts.each { |p| db << p.attributes }

Or by :updated_at field if you prefer:

db = XapianDb.new(:dir => 'posts.db', :fields => { :updated_at => { :type => Time, :store => true } })
last_updated_doc = db.documents.max(:updated_at)
new_posts = Post.find(:all, :conditions => ['updated_at >= ?', last_updated_doc.updated_at])
new_posts.each { |p| db << p.attributes }

Deleted records won't show up in results but can eventually put your result pagination out of whack. So, you'll need to track deletions yourself, either with a deleted_at field, some kind of delete log or perhaps by reindexing once in a while.

db = XapianDb.new(:dir => 'posts.db')
deleted_posts = Post.find(:all, :conditions => 'deleted_at is not null')
deleted_posts.each do |post| 
  db.documents.delete(post.id)
  post.destroy
end

More Info

Author

John Leach (john@johnleach.co.uk)

Copyright

Copyright © 2009 John Leach

License

MIT (The Xapian library is GPL)

Mailing list

rubyforge.org/mailman/listinfo/xapian-fu-discuss

Web page

johnleach.co.uk/documents/xapian-fu

Github

github.com/johnl/xapian-fu/tree/master

Something went wrong with that request. Please try again.