Ruby interface to Hadoop's HDFS via Thrift
Ruby Shell
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


ganapati – Hadoop HDFS Thrift interface for Ruby

Note - this project has been abandoned by LivingSocial at this point in time. Feel free to fork if you wish to continue development. We have switched to using for our projects.

Ganapati is a Ruby thrift lib for interfacing with Hadoop's distributed file system, HDFS. It also includes a few command line client utilities.

To install:

gem install ganapati

Starting thrift server

Documentation in Hadoop for the thrift interface to HDFS is crap. It can be found here.

As a much simpler and safer way of auto compiling and then starting the thrift interface, use the provided script:

bin/hdfs_thrift_server <port>

This will start a thrift server on the given port (after compiling the server code provided in the Hadoop distribution).

Basic Usage

require 'rubygems'
require 'ganapati'

# args are host, port, and optional timeout
client = 'localhost', 1234

# copy a file to hdfs
client.put("/some/file", "/some/hadoop/path")

# get a file from hadoop
client.get("/some/hadoop/path", "/local/path")

# Create a file
f = client.create("/home/someuser/afile.txt")
f.write("this is some text")
# Always, always close the file

# Create a file with code block
client.create("/home/someuser/afile.txt") { |f|
  f.write("this is some text")

# Open a file for reading and read it'/home/someuser/afile.txt') { |f| 
  # or read for specific length from start
  puts, 4)

# read a file line by line
client.readlines('/home/someuser/afile.txt') { |line|
  puts line

# Open a file for appending and append to it
client.append('/home/someuser/afile.txt') { |f| 
  f.write "new data" 

## Common file methods are available (chown, chmod, mkdir, stat, etc).  Examples:
# move a file "/home/someuser/afile.txt", "/home/someuser/test.txt"

# remove a file
client.rm "/home/someuser/test.txt"

# test for file existance
client.exists? "/home/someuser/test.txt"

# get a list of all files "/home"


# Quick and dirty way to print remote file.  The run class method takes care of closing the client.
puts'localhost', 1234) { |c|'/home/someuser/afile.txt') { |f| } }

Command Line Utilities

There are a few utility programs included in the bin directory. hls provides a way to see the contents of HDFS (recursively and verbosely with appropriate command line options):

./bin/hls hdfs://host:port/tmp

hcp provides a way to copy to/from/between HDFS servers:

./bin/hcp hdfs://host:port/some/path/to/file ./file
./bin/hcp ./file hdfs://host:port/some/path/to/file
./bin/hcp hdfs://anotherhost:port/some/path/to/file hdfs://host:port/some/path/to/file