Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dumbo doesn't work with CDH4b2 nightlies #53

Closed
aripollak opened this issue Apr 19, 2012 · 5 comments
Closed

Dumbo doesn't work with CDH4b2 nightlies #53

aripollak opened this issue Apr 19, 2012 · 5 comments
Labels

Comments

@aripollak
Copy link

Starting with CDH4b2 (nightlies at http://nightly.cloudera.com/cdh4/), /usr/lib/hadoop has been split out into a bunch of different directories, like hadoop-hdfs and hadoop-mapreduce, so a lot of the assumptions made in dumbo no longer work.
A few examples:

  • If I don't set JAVA_HOME=/usr/lib/jvm/default-java and HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec, I can't even run dumbo ls / -hadoop /usr/lib/hadoop since it will complain about JAVA_HOME not being set or not being able to find libexec.
  • Trying to run dumbo start wordcount.py -input foo -output bar -hadoop /usr/lib/hadoop results in "ERROR: Streaming jar not found" since the streaming jar is now under /usr/lib/hadoop-mapreduce.

I might be able to fix this if I find some free time, but it might require some structural changes.

@klbostee
Copy link
Owner

Right, sounds annoying.

Could we fix this by extending the -hadoop option to take a list of (comma separated) directories maybe? You can define aliases in /etc/dumbo.conf, so we could fairly easily avoid having to give it that list all the time...

@aripollak
Copy link
Author

That could be an option, at least for the immediate problem of looking for JARs in the right place. Or it could try automatically looking in <hadoop_dir>-mapreduce instead of just hadoop_dir?

@klbostee
Copy link
Owner

Guess that could work too yeah. It might even be worth implementing both maybe, so that people with other distributions have enough flexibility to get things to work as well...

@mikeinertia
Copy link

Is there a workaround available for this ? I have a fresh installation of latest Dumbo and Hadoop Cloudera 2.0.0 CDH4 and am still seeing this error. Also, tried linking all jars in the various folder into one folder and use that, no luck.

Any hints would be very helpful

@klbostee
Copy link
Owner

See comments on the pull request...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants