Throwaway demo of heroku + wukong + emr
Ruby Shell
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
app
config
db
log
public
tasks
tmp
vendor
.gems
.gitignore
Capfile
README.textile
Rakefile
Thorfile
config.ru
dependencies
init.rb
unicorn-conf.rb

README.textile

Lorkbong: Very stupid example for Wukong / Elastic Map Reduce integration

Lorkbong (named after the staff carried by Sun Wukong) is a very very simple example Heroku app that lets you trigger showing job status or launching a new job, either by visiting a special URL or by triggering a rake task.

Setup

  1. Create the app:
heroku create lorkbong-example
  1. Edit the obvious files in config/. You can make life much more dangerous but slightly simpler by adding your keys to the config. If you do this, ake sure you DON’T check this app into a public repo or we’ll switch all our infrastructure to run off your account.
  1. You should probably be more responsible and use environment variables for your credentials. Use the heroku commandline tool to run the following command:
heroku config:add AWS_ACCESS_KEY_ID= heroku config:add AWS_SECRET_ACCESS_KEY= heroku config:add EMR_KEYPAIR=`cat /path/to/your/keypair.pem`
  1. Now visit http://lorkbong-example.heroku.com (or whatever you called it). Follow the link for ‘list jobs’. You should see a listing of any jobs in your queue.

Debugging

  1. Right now the job info is hardcoded into the file app/helpers/emr_script.rb (sorry). Open that file and edit the block so that it runs (for the first time) with ‘alive’ set to true.
  #
  # You need to edit the following things:
  #
  EMR_OPTS = {
    # # Path to the runner.
    :emr_runner    => "#{::ROOT_DIR}/vendor/elastic-mapreduce/elastic-mapreduce",
    # # Temp storage for the keypair file (elastic-mapreduce script demands it be a static file).
    :keypair_file  => ::ROOT_DIR+'/tmp/emr_keypair.pem',
    # # If you're debugging:
    # # first run with alive set to true, and launch the job.
    :alive => true,
    # # After the job has been created and run for the first time, fill your
    # # jobflow into the following and set alive back to nil.
    # :jobflow       => "j-18OUFBXJ0Z01W",
  }
  # Path to the input files. Note the 's3n' prefix.
  EMR_INPUT  = "s3n://emr.yourdomain.com/wukong/data/examples/links-simple-sorted-10k.txt"
  # Path to the output files. This directory must not exist. Note the 's3n' prefix.
  EMR_OUTPUT = "s3n://emr.yourdomain.com/wukong/data/examples/wp-link-degree-4"
  1. Make note of the jobflow ID (or check the list jobs path at /emr/list ) and hack that into the file above for debugging.
  1. Check the AWS console for a closer look at the job progress. You can also find the public IP of the master node from the console, and log in to the machine directly:
  ssh -i /path/to/your/keypair.pem hadoop@ec2-148-37-14-128.compute-1.amazonaws.com

Credits

  • The frontend app is based on Monk / Cartilage, a skeleton for building Sinatra apps.