Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-input format not handled in local mode #42

Closed
jmesnil opened this issue May 26, 2011 · 1 comment
Closed

-input format not handled in local mode #42

jmesnil opened this issue May 26, 2011 · 1 comment

Comments

@jmesnil
Copy link

jmesnil commented May 26, 2011

hi,

I want to run Dumbo with a specific input format (to read from Avro files).
It seems Dumbo does not use the input format specified by '-inputformat' when it is run locally (without specifying '-hadoop'). Instead it uses its default input format.

To check that, I specify a unknown class with '-inputformat foo.bar.UnknownClass'. It fails on hadoop but passes in local mode.

Hadoop mode:

$ dumbo start cat.py
-input word-count.avro
-output tmp
-libjar avro-1.4.1.jar
-libjar avro-utils-1.5.3-SNAPSHOT.jar
-inputformat foo.bar.UnknownClass
-python /home/sites/sci-env/0.0.5/bin/python
-hadoop /usr/lib/hadoop
...
-inputformat : class not found : foo.bar.UnknownClass
Streaming Command Failed!

Local mode:

$ dumbo start cat.py
-input word-count.avro
-output tmp
-libjar avro-1.4.1.jar
-libjar avro-utils-1.5.3-SNAPSHOT.jar
-inputformat foo.bar.UnknownClass
-python /home/sites/sci-env/0.0.5/bin/python
INFO: buffersize = 168960

=> no error, tmp was created but it contains the content of the binary avro file as it was read as text...

Is it a limitation of Dumbo that the '-input' format is working only in Hadoop mode or is it a bug?

thanks,
jeff

@klbostee
Copy link
Owner

It's a limitation. Dumbo's local mode only relies on UNIX pipes and doesn't use Hadoop in any way, so specifying a java class as input format for a local run simply cannot work. If you want to test Hadoop helper classes locally, you have to locally install a Hadoop build that is configured to run in local mode (which is the default configuration).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants