Skip to content
Ruby interface to Hadoop's HDFS via Thrift
Ruby Shell
Latest commit c583dfb Nov 7, 2013 @p5k6 p5k6 Update README.rdoc
Failed to load latest commit information.
bin added command line utilities and bumped version Mar 31, 2011
lib made timeout in client optional, bumped version Oct 31, 2011
test Fixes bug with #readlines Oct 22, 2011
.gitignore
Gemfile Adds Gemfile Oct 22, 2011
Gemfile.lock
LICENSE first version should be done... Jan 19, 2011
README.rdoc
Rakefile fixed #3 - Client#readlines is broken Oct 24, 2011
ganapati.gemspec added Gemfile.lock Oct 24, 2011

README.rdoc

ganapati – Hadoop HDFS Thrift interface for Ruby

Note - this project has been abandoned by LivingSocial at this point in time. Feel free to fork if you wish to continue development. We have switched to using github.com/kzk/webhdfs for our projects.

Ganapati is a Ruby thrift lib for interfacing with Hadoop's distributed file system, HDFS. It also includes a few command line client utilities.

To install:

gem install ganapati

Starting thrift server

Documentation in Hadoop for the thrift interface to HDFS is crap. It can be found here.

As a much simpler and safer way of auto compiling and then starting the thrift interface, use the provided script:

bin/hdfs_thrift_server <port>

This will start a thrift server on the given port (after compiling the server code provided in the Hadoop distribution).

Basic Usage

require 'rubygems'
require 'ganapati'

# args are host, port, and optional timeout
client = Ganapati::Client.new 'localhost', 1234

# copy a file to hdfs
client.put("/some/file", "/some/hadoop/path")

# get a file from hadoop
client.get("/some/hadoop/path", "/local/path")

# Create a file
f = client.create("/home/someuser/afile.txt")
f.write("this is some text")
# Always, always close the file
f.close 

# Create a file with code block
client.create("/home/someuser/afile.txt") { |f|
  f.write("this is some text")
}

# Open a file for reading and read it
client.open('/home/someuser/afile.txt') { |f| 
  puts f.read 
  # or read for specific length from start
  puts f.read(0, 4)
}

# read a file line by line
client.readlines('/home/someuser/afile.txt') { |line|
  puts line
}

# Open a file for appending and append to it
client.append('/home/someuser/afile.txt') { |f| 
  f.write "new data" 
}	  

## Common file methods are available (chown, chmod, mkdir, stat, etc).  Examples:
# move a file
client.mv "/home/someuser/afile.txt", "/home/someuser/test.txt"

# remove a file
client.rm "/home/someuser/test.txt"

# test for file existance
client.exists? "/home/someuser/test.txt"

# get a list of all files
client.ls "/home"

client.close

# Quick and dirty way to print remote file.  The run class method takes care of closing the client.
puts Ganapati::Client.run('localhost', 1234) { |c| c.open('/home/someuser/afile.txt') { |f| f.read } }

Command Line Utilities

There are a few utility programs included in the bin directory. hls provides a way to see the contents of HDFS (recursively and verbosely with appropriate command line options):

./bin/hls hdfs://host:port/tmp

hcp provides a way to copy to/from/between HDFS servers:

./bin/hcp hdfs://host:port/some/path/to/file ./file
./bin/hcp ./file hdfs://host:port/some/path/to/file
./bin/hcp hdfs://anotherhost:port/some/path/to/file hdfs://host:port/some/path/to/file
Something went wrong with that request. Please try again.