Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hadoop: tune max {map,reduce} tasks to num CPUs available #345

Closed
wants to merge 4 commits into from

Conversation

ypwais
Copy link
Contributor

@ypwais ypwais commented Dec 9, 2013

This pull request is in response to issue #115

Hadoop defaults to 2 maps and 1 reduce per node/machine, and starcluster's hadoop plugin uses the default configs. For large AWS instance types, this configuration leaves much CPU capacity unutilized. This change creates a custom mapred-site.xml file for each node that configures the mapred.tasktracker.{map,reduce}.tasks.maximum parameters based upon the node's CPU count. In particular, the change employs a simple heuristic (similar to the one used in EMR's hadoop configs) that assigns 1 map per CPU and ~1/3 reduce per CPU. The params are included as kwargs to the plugin's constructor, so the user can override this in the plugin's config.

I have manually tested this change using the following (key) starcluster config settings:

[cluster microdumbo]
NODE_INSTANCE_TYPE = c1.xlarge
CLUSTER_SIZE = 2
PLUGINS = hadoop
NODE_IMAGE_ID = ami-765b3e1f

and ran this procedure on the master node:

$ wget http://norvig.com/big.txt
$ pip install mrjob
$ export HADOOP_HOME=/usr/lib/hadoop
$ python /usr/local/lib/python2.7/dist-packages/mrjob/examples/mr_word_freq_count.py -r hadoop big.txt big.txt big.txt big.txt big.txt big.txt big.txt big.txt big.txt big.txt big.txt big.txt big.txt big.txt big.txt big.txt big.txt big.txt big.txt big.txt

The job tracker showed 2 nodes, 16 maps, and 4 reduces available, and the job ran up to 16 map tasks in parallel, as desired. The output looked correct.

@jtriley
Copy link
Owner

jtriley commented Dec 10, 2013

Thanks! Merging soon.

@ypwais
Copy link
Contributor Author

ypwais commented Dec 11, 2013

sweet, thanks so much for adding the docs!! sorry didn't see those at first

On Tue, Dec 10, 2013 at 8:43 AM, Justin Riley notifications@github.comwrote:

Closed #345 #345 via 7b9187chttps://github.com/jtriley/StarCluster/commit/7b9187c9abe3c09b7a0e94b2827bc69febc6e7d4
.


Reply to this email directly or view it on GitHubhttps://github.com//pull/345
.

@jtriley
Copy link
Owner

jtriley commented Feb 7, 2014

@ypwais My pleasure. This is now available in the 0.95 release:

http://star.mit.edu/cluster/docs/latest/changelog.html#version-0-95

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants