Getting Started

Matt edited this page Jan 11, 2015 · 13 revisions

Bootstrapping Local Development with Jading

Prerequisites:

  • Java
  • Ant
  • JRuby

You do not need to install cascading.jruby to use Jading. However, it is helpful to have the gem installed for testing in local mode.

jruby -S gem install cascading.jruby

Once you have cascading.jruby, make a working directory for yourself (which we've named "vanilla", here) and do the following:

mkdir vanilla && cd vanilla
git clone git@github.com:etsy/jading.git
export PATH=$PATH:jading

At this point, Jading is ready for use, but if you want to run a job in local mode, we still lack some dependencies (Cascading and Hadoop). If you attempt to run a local job, you'll see this exception, which indicates that required Cascading dependencies are not on the classpath:

jruby jading/examples/wordcount.rb jading/README.md local
NameError: cannot load Java class cascading.flow.planner.Scope
         for_name at org/jruby/javasupport/JavaClass.java:1204
  get_proxy_class at org/jruby/javasupport/JavaUtilities.java:34
    const_missing at /opt/jruby/lib/ruby/site_ruby/shared/builtin/javasupport/java.rb:45
       flow_scope at /opt/jruby-1.6.5/lib/ruby/gems/1.8/gems/cascading.jruby-0.0.10/lib/cascading/scope.rb:14
       initialize at /opt/jruby-1.6.5/lib/ruby/gems/1.8/gems/cascading.jruby-0.0.10/lib/cascading/flow.rb:25
             flow at /opt/jruby-1.6.5/lib/ruby/gems/1.8/gems/cascading.jruby-0.0.10/lib/cascading/cascade.rb:37
         __file__ at jading/examples/word_count.rb:7
    instance_eval at org/jruby/RubyKernel.java:2062
          cascade at /opt/jruby-1.6.5/lib/ruby/gems/1.8/gems/cascading.jruby-0.0.10/lib/cascading/cascading.rb:36
           (root) at jading/examples/word_count.rb:6

To resolve this issue, let's just use Jading's own dependency resolution capability:

jade jading/examples/wordcount.rb
export CLASSPATH="$CLASSPATH:/tmp/jading/build/lib/*"
jruby jading/examples/wordcount.rb jading/README.md local
cat output/wordcount | sort -n | less

Running Jobs Remotely on a Hadoop Cluster

Prerequisites:

  • Hadoop installed and configured to point to the desired cluster

At this point, we've bootstrapped being able to run jobs in local mode using Jading. Additionally, we've inadvertently created our first jade.jar in our working directory. You can see that it is composed of Cascading and Hadoop dependencies, the cascading.jruby gem, and our example wordcount.rb script:

jar tf jade.jar | sort | less

To run our example job on a cluster, we'll need to prepare some input data, use the execution mode of the jade script to invoke hadoop jar, and then check the results. In the below, substitute your own user directory for /user/mwalker.

hadoop fs -put jading/README.md /user/mwalker/
jade -e wordcount.rb /user/mwalker/README.md
hadoop fs -cat /user/mwalker/output/wordcount/* | sort -n | less

Resources

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.