Quick fix for CDH4 / MRv2. #58

Merged
merged 2 commits into from Aug 20, 2012

2 participants

@jkleint

This adds a -hadooplib command-line switch to tell dumbo where
hadoop-streaming.jar is stored, along with an addition to the jar search path.
It also uses 'hadoop fs' instead of 'hdfs dfs' since it's easier to find.

This may not be the best long-term solution, but it seems to work for now. I have not tested with MRv1 or anything off the beaten path. Addresses Issue #53.

@jkleint jkleint Quick fix for CDH4 / MRv2.
This adds a -hadooplib command-line switch to tell dumbo where
hadoop-streaming.jar is stored, along with an addition to the jar search path.
It also uses 'hadoop fs' instead of 'hdfs dfs' since it's easier to find.
6528c77
@jkleint

For people who want to try this out, the paths for CDH4 on Red Hat would look like this:

$ dumbo start examples/wordcount.py -input brian.txt -output wordcounts -hadoop /usr/lib/hadoop -hadooplib /usr/lib/hadoop-mapreduce
@jkleint jkleint Don't require -hadooplib.
This should still work for CDH3 where you don't need a separate -hadooplib.
4cedc5f
@klbostee
Owner

Sounds good. Will have a look at this soonish hopefully.

@klbostee
Owner

Just had a closer look. I ended up generalizing things a bit so that several -hadooplib options can be specified. I'll do some final testing and commit the code on Monday.

With my changes you can then run Dumbo scripts as follows:

dumbo start wordcount.py -input brian.txt -output wc -hadoop /usr -hadooplib /usr/lib/hadoop-0.20-mapreduce

This seems to be a slightly better way of doing it because apparently CDH4 has some JAVA_HOME auto detection in /usr/bin/hadoop that isn't available in /usr/lib/hadoop/bin/hadoop.

@klbostee klbostee merged commit 4cedc5f into klbostee:master Aug 20, 2012
@klbostee
Owner

This is in release 0.21.35.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment