hadoop-cluster

Set up and run jobs on a cluster. The cluster and jobs are described as data maps.

Introduction

The bin/hadoop script is a command line interface to build clusters and run jobs. The script supports three commands; start will start a cluster, job will run the job_spec on the cluster, and destroy will force the removal of the cluster.

To launch a new Hadoop cluster you will first need a description of the cluster to build. This is done by creating a cluster spec file.

The following example of cluster spec describes a cluster with one master and two slave nodes, both using m1.medium instances running Ubuntu 12.04 and a Cloudera Distribution of Hadoop.

{:cluster-prefix "hc1"
 :groups {:master {:node-spec
                    {:hardware
                      {:hardware-id "m1.medium"}}
                     :count 1
                     :roles #{:namenode :jobtracker}}
          :slave {:node-spec
                   {:hardware
                     {:hardware-id "m1.medium"}}
                  :count 2
                  :roles #{:datanode :tasktracker}}}
 :node-spec {:image
             {:os-family :ubuntu
              :os-version-matches "12.04"
              :os-64-bit true}}
 :hadoop-settings {:dist :cloudera}}

Similarly, the job to be run needs to be described. Below is an example of a job spec that runs the Hadoop word count.

{:steps
 [{:jar {:remote-file "hadoop-examples-0.20.2-cdh3u0.jar"}
   :main "wordcount"
   :input "s3n://pallet-play/hadoop-examples"
   :output "s3n//<your-dest-bucket>/<your-dest-directory>"}]
 :on-completion :terminate-cluster}

The jar file to be run can be specified in several ways. A file already on the jobtracker can be specified with :remote-file. A local file can be specified with :local-file. A file at a given url can be specified with :url.

Usage

Checkout this project.

Edit the file credentials.clj with your AWS identity and key.

Open the file cluster_spec.clj and decide if you want to change the nodes hardware id or memory, and the number of slave. This file should work fine without any changes.

Start the cluster:

$ lein with-profiles +jclouds run  start

If all works correctly, you should get something like this after a few log lines:

{"107.20.115.138"
 {:roles #{:datanode :namenode :jobtracker},
  :private-ip "10.80.133.215",
  :hostname "hc1-master-c8a4f1b6"},
 "174.129.113.172"
 {:roles #{:datanode :tasktracker},
  :private-ip "10.36.105.170",
  :hostname "hc1-slave-dca4f1a2"},
 "184.72.91.105"
 {:roles #{:datanode :tasktracker},
  :private-ip "10.82.254.132",
  :hostname "hc1-slave-dea4f1a0"}}

Now it is time to run the job. At the shell, run:

$ lein with-profiles +jclouds run job job_spec.clj

The logging should give you an indication of the job progression. If the job finalizes correctly, the results will be found in the bucket and directory specified in the job spec and the cluster will be destroyed automatically.

If the job fails, the cluster will not be destroyed, but you can manually destroy the cluster by running:

$ lein with-profiles +jclouds run destroy

License

Licensed under EPL

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
bin		bin
brick-scripts		brick-scripts
dev-resources		dev-resources
examples		examples
resources		resources
src/palletops/cluster		src/palletops/cluster
test/palletops/cluster		test/palletops/cluster
.gitignore		.gitignore
.travis.yml		.travis.yml
README.md		README.md
ReleaseNotes.md		ReleaseNotes.md
brick.clj		brick.clj
cluster_spec.clj		cluster_spec.clj
cluster_spec_graphite.clj		cluster_spec_graphite.clj
credentials.clj		credentials.clj
dist.md		dist.md
dist.sh		dist.sh
job_spec.clj		job_spec.clj
profiles.clj		profiles.clj
project.clj		project.clj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hadoop-cluster

Introduction

Usage

License

About

Releases

Packages

Contributors 3

Languages

palletops/hadoop-cluster

Folders and files

Latest commit

History

Repository files navigation

hadoop-cluster

Introduction

Usage

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages