-
Notifications
You must be signed in to change notification settings - Fork 0
mattf/condor_hadoop
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This project is a set of tools for building and managing Hadoop clusters on a set of shared Condor managed resources. Four processes are currently managed: NameNode, DataNode, JobTracker and TaskTracker. Instances of each are submitted to Condor for scheduling and execution. All development and testing as been with hadoop-1.0.1-bin.tar.gz Tools provided - hadoop_namenode : Manage NameNodes --list : List all NameNodes --ipc <id> : Retrieve IPC endpoint for NameNode <id> --file=hadoop-bin.tar.gz : Required: Specify tarball or set HADOOP_BIN_TARBALL --start : Start a new NameNode --stop <id> : Stop the given NameNode hadoop_datanode : Manage DataNodes --list [<namenode id>] : List DataNodes, optionally associated : with a given NameNode --file=hadoop-bin.tar.gz : Required: Specify tarball or set HADOOP_BIN_TARBALL --start <ID>|<URL> : Start a new DataNode, linked hdfs:// or ID [--count <#>] : NameNode, potentially numerous --stop <id> : Stop the given DataNode hadoop_jobtracker : Manage JobTrackers --list : List all JobTrackers --ipc <id> : Retrieve IPC endpoint for JobTracker <id> --file=hadoop-bin.tar.gz : Required: Specify tarball or set HADOOP_BIN_TARBALL --start <ID>|<URL> : Start a new JobTracker, linked hdfs:// or ID --stop <id> : Stop the given JobTracker hadoop_tasktracker : Manage TaskTrackers --list [<jobtracker id>] : List TaskTrackers, optionally : associated with a given JobTracker --file=hadoop-bin.tar.gz : Required: Specify tarball or set HADOOP_BIN_TARBALL --start <ID>|<URL> : Start a new TaskTracker, linked maprfs:// or ID [--count <#>] : given JobTracker, potentially numerous --stop <id> : Stop the given TaskTracker WARNING: No data stored in the running NameNode will be retained when the NameNode is shutdown. If you expect otherwise, you will be disappointed. Example usage - Setup, $ wget http://archive.apache.org/dist/hadoop/core/hadoop-1.0.1/hadoop-1.0.1-bin.tar.gz ... $ export HADOOP_BIN_TARBALL=hadoop-1.0.1-bin.tar.gz Start a NameNode, $ ./hadoop_namenode -s Submitting job(s). 1 job(s) submitted to cluster 45. $ ./hadoop_namenode -l ID Submitted Status Uptime Owner Location 45 08/11 06:44 Running 9 matt http://eeyore.local:45172 Start a DataNode, attached to just started NameNode, $ ./hadoop_datanode -s 45 Found NameNode at hdfs://eeyore.local:56346 Submitting job(s). 1 job(s) submitted to cluster 46. $ ./hadoop_datanode -s hdfs://eeyore.local:56346 Found NameNode at hdfs://eeyore.local:56346 Submitting job(s). 1 job(s) submitted to cluster 47. $ ./hadoop_datanode -l ID Submitted Status Uptime Owner NameNode 46 08/11 06:44 Running 0+00:00:20 matt 45 @ http://eeyore.local:45172 47 08/11 06:44 Pending N/A matt hdfs://eeyore.local:56346 Start a JobTracker, attached to the NameNode, $ ./hadoop_jobtracker -s 45 Found NameNode at hdfs://eeyore.local:56346 Submitting job(s). 1 job(s) submitted to cluster 48. $ ./hadoop_jobtracker -s hdfs://eeyore.local:56346 Found NameNode at hdfs://eeyore.local:56346 Submitting job(s). 1 job(s) submitted to cluster 49. $ ./hadoop_jobtracker -l ID Submitted Status Uptime Owner Location 48 08/11 06:45 Running 46 matt http://eeyore.local:46792 49 08/11 06:45 Running 26 matt http://eeyore.local:48515 Start a TaskTracker, attached to the new JobTracker, $ ./hadoop_tasktracker -s 49 Found JobTracker at maprfs://eeyore.local:33445 Submitting job(s). 1 job(s) submitted to cluster 50. $ ./hadoop_tasktracker -s maprfs://eeyore.local:33445 Found JobTracker at maprfs://eeyore.local:33445 Submitting job(s). 1 job(s) submitted to cluster 51. $ ./hadoop_tasktracker -l ID Submitted Status Uptime Owner JobTracker 50 08/11 06:46 Running 0+00:00:31 matt 49 @ http://eeyore.local:48515 51 08/11 06:46 Running 0+00:00:11 matt maprfs://eeyore.local:33445 $ ./hadoop_namenode -l ID Submitted Status Uptime Owner Location 45 08/11 06:44 Running 7:06 matt http://eeyore.local:45172 $ ./hadoop_datanode -l ID Submitted Status Uptime Owner NameNode 46 08/11 06:44 Running 0+00:06:49 matt 45 @ http://eeyore.local:45172 47 08/11 06:44 Running 0+00:06:29 matt hdfs://eeyore.local:56346 $ ./hadoop_jobtracker -l ID Submitted Status Uptime Owner Location 48 08/11 06:45 Running 6:13 matt http://eeyore.local:46792 49 08/11 06:45 Running 5:53 matt http://eeyore.local:48515 $ ./hadoop_tasktracker -l ID Submitted Status Uptime Owner JobTracker 50 08/11 06:46 Running 0+00:05:13 matt 49 @ http://eeyore.local:48515 51 08/11 06:46 Running 0+00:04:53 matt maprfs://eeyore.local:33445 Use the Hadoop command line tools to run a job, $ ./hadoop_namenode -i 45 hdfs://eeyore.local:56346 $ ./hadoop_jobtracker -i 49 maprfs://eeyore.local:33445 $ tar zxf hadoop-1.0.1-bin.tar.gz $ cd hadoop-1.0.1 $ cat > conf/mapred-site.xml << EOF <configuration> <property> <name>mapred.job.tracker</name> <value>maprfs://eeyore.local:33445</value> </property> <property> <name>fs.default.name</name> <value>hdfs://eeyore.local:56346</value> </property> </configuration> EOF $ export JAVA_HOME=/usr $ time ./bin/hadoop jar hadoop-test-1.0.1.jar mrbench ... DataLines Maps Reduces AvgTime (milliseconds) 1 2 1 36759
About
Manage Hadoop MapReduce and HDFS clusters with Condor
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published