Rhino is a Ruby object-relational mapping (ORM) for HBase[http://www.hbase.org].
Author: Quinn Slack qslack@qslack.com[mailto:qslack@qslack.com]
Contributors: Dru Jensen
git clone git://github.com/sqs/rhino.git
Since Rhino uses the HBase Thrift API, you must first install both HBase and Thrift. Downloading the latest trunk revisions of each is recommended, but if you encounter problems, try using the latest stable release instead. Here are the basic steps for installing both.
svn co http://svn.apache.org/repos/asf/hadoop/hbase/trunk hbase-core-trunk cd hbase-core-trunk ant
{More installation instructions}[http://wiki.apache.org/hadoop/Hbase/10Minutes] are available on the HBase Wiki.
Thrift[http://developers.facebook.com/thrift/] also requires the Boost C++ libraries; you'll have to get those on your own if your system does not have them.
svn co http://svn.facebook.com/svnroot/thrift/trunk/ thrift-trunk cd thrift-trunk ./bootstrap.sh ./configure && make && sudo make install cd lib/rb sudo ruby setup.rb
Once you have installed HBase and Thrift, start HBase, then start the Thrift server. From the root HBase directory, run these commands:
bin/start-hbase.sh bin/hbase thrift start
Both servers need to be running to use Rhino. Occasionally, the Thrift server will be unable to connect to HBase. In that case, stop the Thrift server (ctrl-C), stop the HBase server (bin/stop-hbase.sh), and then rerun the above commands to restart both. To verify that HBase is running, try running the HBase shell (bin/hbase shell).
Since Rhino is not yet packaged as a gem, you will have to require 'PATH_TO_RHINO/lib/rhino.rb' in your scripts.
The following code points Rhino to the Thrift server you just started (which by default listens on localhost:9090).
Rhino::Model.connect('localhost', 9090)
A class definition like:
class Page < Rhino::Model include Rhino::Constraints
column_family :title
column_family :contents
column_family :links
column_family :meta
column_family :images
alias_attribute :author, 'meta:author'
has_many :links, Link
has_many :images, Image
constraint(:title_required) { |page| page.title and !page.title.empty? }
end
...is mapped to the following HBase table (described in {HBase Query Language}[http://wiki.apache.org/lucene-hadoop/HBase/HBaseShell]
CREATE TABLE pages(title:, contents:, links:, meta:, images:);
Or, in version 0.2's JRuby shell language:
create 'pages', 'title', 'contents', 'links', 'meta', 'images'
page # Page.get('some-page') all_pages # Page.get_all()
page # Page.new('the-row-key', {:title#>"my title"})
page.contents # "
welcome
" page.savepage # Page.get('some-key') puts "the old title is: #{page.title}" page.title # "another title" page.save puts "the new title is: #{page.title}"
You can also read from and write to specific columns in a column family. Since we already defined the meta: column family, Rhino knows we want to set the meta:author column:
page # Page.get('some-key') page.meta_author # "John Doe" page.save puts "the author is: #{page.meta_author}"
In the model definition above, we stated that a Page has_many :links and has_many :images. We can define what a Link and an Image is with greater detail now.
class Link < Rhino::Cell belongs_to :page
def url
url_parts # key.split('/')
backwards_host # url_parts.shift
path # url_parts.join('/')
host # backwards_host.split('.').reverse.join('.')
"http://#{host}/#{path}"
end
end
class Image < Rhino::Cell belongs_to :page end
Now that we've defined Link and Image, we can work with them easily. The following code adds a link to the page com.example, which when written to HBase becomes a cell in the links: column family named links:com.google with the contents search engine.
page # Page.get('com.example') page.links.create('com.google', 'search engine')
You can also iterate over the collection of links. In this example, we use Link#url, a method we defined on the Link class to convert from the common com.example/path URL storage style to example.com/path.
page.links.each do |link| puts "Link to #{link.url} with text: '#{link.contents}'" end
You can also get a specific link.
google_link_text # page.get('com.google').contents
First, let's create some Pages with different timestamps.
a_week_ago # Time.now - 7 * 24 * 3600 a_month_ago # Time.now - 30 * 24 * 3600
newer_page # Page.create('google.com', {:title#>'newer google'}, {:timestamp#>a_week_ago}) older_page # Page.create('google.com', {:title#>'older google'}, {:timestamp#>a_month_ago})
Now you can get by the timestamps you just set.
Page.get('google.com', :timestamp#>a_week_ago).title # #> "newer google" Page.get('google.com', :timestamp#>a_month_ago).title # #> "older google"
If you call get with no arguments, you will get the most recent Page.
Page.get('google.com').title # #> "newer google"
Read the specs in the spec/ directory to see more usage examples. Also look at the spec models in spec/spec_helper.rb.