Skip to content

mattf/condor_hadoop

Repository files navigation

This project is a set of tools for building and managing Hadoop
clusters on a set of shared Condor managed resources.

Four processes are currently managed: NameNode, DataNode, JobTracker
and TaskTracker. Instances of each are submitted to Condor for
scheduling and execution.

All development and testing as been with hadoop-1.0.1-bin.tar.gz

Tools provided -

 hadoop_namenode             : Manage NameNodes
   --list        	     :  List all NameNodes
   --ipc <id>    	     :  Retrieve IPC endpoint for NameNode <id>
   --file=hadoop-bin.tar.gz  :  Required: Specify tarball or set HADOOP_BIN_TARBALL
   --start       	     :  Start a new NameNode
   --stop <id>   	     :  Stop the given NameNode

 hadoop_datanode             : Manage DataNodes
   --list [<namenode id>]    :  List DataNodes, optionally associated
                             :   with a given NameNode
   --file=hadoop-bin.tar.gz  :  Required: Specify tarball or set HADOOP_BIN_TARBALL
   --start <ID>|<URL>        :  Start a new DataNode, linked hdfs:// or ID
    [--count <#>]            :   NameNode, potentially numerous
   --stop <id>               :  Stop the given DataNode

 hadoop_jobtracker           : Manage JobTrackers
   --list                    :  List all JobTrackers
   --ipc <id>                :  Retrieve IPC endpoint for JobTracker <id>
   --file=hadoop-bin.tar.gz  :  Required: Specify tarball or set HADOOP_BIN_TARBALL
   --start <ID>|<URL>        :  Start a new JobTracker, linked hdfs:// or ID
   --stop <id>               :  Stop the given JobTracker

 hadoop_tasktracker          : Manage TaskTrackers
   --list [<jobtracker id>]  :  List TaskTrackers, optionally
                             :   associated with a given JobTracker
   --file=hadoop-bin.tar.gz  :  Required: Specify tarball or set HADOOP_BIN_TARBALL
   --start <ID>|<URL>        :  Start a new TaskTracker, linked maprfs:// or ID
    [--count <#>]            :   given JobTracker, potentially numerous
   --stop <id>               :  Stop the given TaskTracker


WARNING: No data stored in the running NameNode will be retained when
the NameNode is shutdown. If you expect otherwise, you will be
disappointed.


Example usage -

Setup,

$ wget http://archive.apache.org/dist/hadoop/core/hadoop-1.0.1/hadoop-1.0.1-bin.tar.gz
...
$ export HADOOP_BIN_TARBALL=hadoop-1.0.1-bin.tar.gz 

Start a NameNode,

$ ./hadoop_namenode -s
Submitting job(s).
1 job(s) submitted to cluster 45.

$ ./hadoop_namenode -l
  ID    Submitted   Status       Uptime      Owner  Location
  45  08/11 06:44  Running            9       matt  http://eeyore.local:45172

Start a DataNode, attached to just started NameNode,

$ ./hadoop_datanode -s 45 
Found NameNode at hdfs://eeyore.local:56346
Submitting job(s).
1 job(s) submitted to cluster 46.

$ ./hadoop_datanode -s hdfs://eeyore.local:56346
Found NameNode at hdfs://eeyore.local:56346
Submitting job(s).
1 job(s) submitted to cluster 47.

$ ./hadoop_datanode -l   
  ID    Submitted   Status       Uptime      Owner  NameNode
  46  08/11 06:44  Running   0+00:00:20       matt  45 @ http://eeyore.local:45172
  47  08/11 06:44  Pending          N/A       matt  hdfs://eeyore.local:56346

Start a JobTracker, attached to the NameNode,

$ ./hadoop_jobtracker -s 45
Found NameNode at hdfs://eeyore.local:56346
Submitting job(s).
1 job(s) submitted to cluster 48.

$ ./hadoop_jobtracker -s hdfs://eeyore.local:56346
Found NameNode at hdfs://eeyore.local:56346
Submitting job(s).
1 job(s) submitted to cluster 49.

$ ./hadoop_jobtracker -l
  ID    Submitted   Status       Uptime      Owner  Location
  48  08/11 06:45  Running           46       matt  http://eeyore.local:46792
  49  08/11 06:45  Running           26       matt  http://eeyore.local:48515

Start a TaskTracker, attached to the new JobTracker,

$ ./hadoop_tasktracker -s 49
Found JobTracker at maprfs://eeyore.local:33445
Submitting job(s).
1 job(s) submitted to cluster 50.

$ ./hadoop_tasktracker -s maprfs://eeyore.local:33445
Found JobTracker at maprfs://eeyore.local:33445
Submitting job(s).
1 job(s) submitted to cluster 51.

$ ./hadoop_tasktracker -l
  ID    Submitted   Status       Uptime      Owner  JobTracker
  50  08/11 06:46  Running   0+00:00:31       matt  49 @ http://eeyore.local:48515
  51  08/11 06:46  Running   0+00:00:11       matt  maprfs://eeyore.local:33445

$ ./hadoop_namenode -l
  ID    Submitted   Status       Uptime      Owner  Location
  45  08/11 06:44  Running         7:06       matt  http://eeyore.local:45172
$ ./hadoop_datanode -l
  ID    Submitted   Status       Uptime      Owner  NameNode
  46  08/11 06:44  Running   0+00:06:49       matt  45 @ http://eeyore.local:45172
  47  08/11 06:44  Running   0+00:06:29       matt  hdfs://eeyore.local:56346
$ ./hadoop_jobtracker -l
  ID    Submitted   Status       Uptime      Owner  Location
  48  08/11 06:45  Running         6:13       matt  http://eeyore.local:46792
  49  08/11 06:45  Running         5:53       matt  http://eeyore.local:48515
$ ./hadoop_tasktracker -l
  ID    Submitted   Status       Uptime      Owner  JobTracker
  50  08/11 06:46  Running   0+00:05:13       matt  49 @ http://eeyore.local:48515
  51  08/11 06:46  Running   0+00:04:53       matt  maprfs://eeyore.local:33445

Use the Hadoop command line tools to run a job,

$ ./hadoop_namenode -i 45
hdfs://eeyore.local:56346
$ ./hadoop_jobtracker -i 49
maprfs://eeyore.local:33445

$ tar zxf hadoop-1.0.1-bin.tar.gz
$ cd hadoop-1.0.1
$ cat > conf/mapred-site.xml << EOF
<configuration>
     <property>
         <name>mapred.job.tracker</name>
         <value>maprfs://eeyore.local:33445</value>
     </property>
     <property>
         <name>fs.default.name</name>
         <value>hdfs://eeyore.local:56346</value>
     </property>
</configuration>
EOF

$ export JAVA_HOME=/usr
$ time ./bin/hadoop jar hadoop-test-1.0.1.jar mrbench
...
DataLines	Maps	Reduces	AvgTime (milliseconds)
1		2	1	36759

About

Manage Hadoop MapReduce and HDFS clusters with Condor

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published