You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to run Dumbo with a specific input format (to read from Avro files).
It seems Dumbo does not use the input format specified by '-inputformat' when it is run locally (without specifying '-hadoop'). Instead it uses its default input format.
To check that, I specify a unknown class with '-inputformat foo.bar.UnknownClass'. It fails on hadoop but passes in local mode.
Hadoop mode:
$ dumbo start cat.py
-input word-count.avro
-output tmp
-libjar avro-1.4.1.jar
-libjar avro-utils-1.5.3-SNAPSHOT.jar
-inputformat foo.bar.UnknownClass
-python /home/sites/sci-env/0.0.5/bin/python
-hadoop /usr/lib/hadoop
...
-inputformat : class not found : foo.bar.UnknownClass
Streaming Command Failed!
It's a limitation. Dumbo's local mode only relies on UNIX pipes and doesn't use Hadoop in any way, so specifying a java class as input format for a local run simply cannot work. If you want to test Hadoop helper classes locally, you have to locally install a Hadoop build that is configured to run in local mode (which is the default configuration).
hi,
I want to run Dumbo with a specific input format (to read from Avro files).
It seems Dumbo does not use the input format specified by '-inputformat' when it is run locally (without specifying '-hadoop'). Instead it uses its default input format.
To check that, I specify a unknown class with '-inputformat foo.bar.UnknownClass'. It fails on hadoop but passes in local mode.
Hadoop mode:
$ dumbo start cat.py
-input word-count.avro
-output tmp
-libjar avro-1.4.1.jar
-libjar avro-utils-1.5.3-SNAPSHOT.jar
-inputformat foo.bar.UnknownClass
-python /home/sites/sci-env/0.0.5/bin/python
-hadoop /usr/lib/hadoop
...
-inputformat : class not found : foo.bar.UnknownClass
Streaming Command Failed!
Local mode:
$ dumbo start cat.py
-input word-count.avro
-output tmp
-libjar avro-1.4.1.jar
-libjar avro-utils-1.5.3-SNAPSHOT.jar
-inputformat foo.bar.UnknownClass
-python /home/sites/sci-env/0.0.5/bin/python
INFO: buffersize = 168960
=> no error, tmp was created but it contains the content of the binary avro file as it was read as text...
Is it a limitation of Dumbo that the '-input' format is working only in Hadoop mode or is it a bug?
thanks,
jeff
The text was updated successfully, but these errors were encountered: