Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The filenames don't get escaped in output #59

Open
poison opened this issue Sep 6, 2012 · 0 comments
Open

The filenames don't get escaped in output #59

poison opened this issue Sep 6, 2012 · 0 comments

Comments

@poison
Copy link

poison commented Sep 6, 2012

I was running a job that outputted to 'twoo/flowanalysis/2012/09/*', but this gives issues because when dumbo runs the hdfs (re)move operations (on overwrite="yes" for instance), it doesn't escape it properly and this results in an error.

See the output below:

12/09/06 14:00:00 INFO streaming.StreamJob: map 100% reduce 100%
12/09/06 14:00:32 INFO streaming.StreamJob: Job complete: job_201208201604_77368
12/09/06 14:00:32 INFO streaming.StreamJob: Output: twoo/flowanalysis/2012/09/__pre1
Moved to trash: hdfs://hadoopname02/user/poison/twoo/flowanalysis/2012/09/__pre1
Moved to trash: hdfs://hadoopname02/user/poison/twoo/flowanalysis/2012/09/04
EXEC: HADOOP_CLASSPATH="/home/jeroen/mm.metrics/jars/tusks.jar:$HADOOP_CLASSPATH" /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop/contrib/streaming/hadoop-streaming-0.20.2-cdh3u4.jar -outputformat 'org.apache.hadoop.mapred.SequenceFileOutputFormat' -inputformat 'org.apache.hadoop.streaming.AutoInputFormat' -mapper 'python -m base64_users map 1 629145600' -reducer 'python -m base64_users red 1 629145600' -numReduceTasks '60' -file '/home/poison/scripts/base64_users.py' -file '/usr/lib/dumbo/eggs/ctypedbytes-0.1.8-py2.6-linux-x86_64.egg' -file '/usr/lib/dumbo/lib/python2.6/site-packages/dumbo-0.21.34-py2.6.egg' -file '/usr/lib/dumbo/lib/python2.6/site-packages/typedbytes-0.3.8-py2.6.egg' -file '/home/jeroen/mm.metrics/jars/tusks.jar' -output 'twoo/flowanalysis/2012/09/_' -jobconf 'stream.map.input=typedbytes' -jobconf 'stream.reduce.input=typedbytes' -jobconf 'stream.map.output=typedbytes' -jobconf 'stream.reduce.output=typedbytes' -jobconf 'mapred.job.name=base64_users.py (2/2)' -input 'twoo/flowanalysis/2012/09/__pre1' -cmdenv 'dumbo_mrbase_class=dumbo.backends.common.MapRedBase' -cmdenv 'dumbo_jk_class=dumbo.backends.common.JoinKey' -cmdenv 'dumbo_runinfo_class=dumbo.backends.streaming.StreamingRunInfo' -cmdenv 'PYTHON_EGG_CACHE=/tmp/eggcache' -cmdenv 'PYTHONPATH=ctypedbytes-0.1.8-py2.6-linux-x86_64.egg:dumbo-0.21.34-py2.6.egg:typedbytes-0.3.8-py2.6.egg'
12/09/06 14:00:33 WARN streaming.StreamJob: -jobconf option is deprecated, please use -D instead.
packageJobJar: [/home/poison/scripts/base64_users.py, /usr/lib/dumbo/eggs/ctypedbytes-0.1.8-py2.6-linux-x86_64.egg, /usr/lib/dumbo/lib/python2.6/site-packages/dumbo-0.21.34-py2.6.egg, /usr/lib/dumbo/lib/python2.6/site-packages/typedbytes-0.3.8-py2.6.egg, /home/jeroen/mm.metrics/jars/tusks.jar, /tmp/hadoop-poison/hadoop-unjar6857496811056060599/] [] /tmp/streamjob4826297620725394446.jar tmpDir=null
12/09/06 14:00:33 INFO mapred.JobClient: Cleaning up the staging area hdfs://hadoopname02/tmp/hadoop-mapred/mapred/staging/poison/.staging/job_201208201604_77492
12/09/06 14:00:33 ERROR security.UserGroupInformation: PriviledgedActionException as:poison (auth:SIMPLE) cause:org.apache.hadoop.mapred.InvalidInputException: Input Pattern hdfs://hadoopname02/user/poison/twoo/flowanalysis/2012/09/__pre1 matches 0 files
12/09/06 14:00:33 ERROR streaming.StreamJob: Error Launching job : Input Pattern hdfs://hadoopname02/user/poison/twoo/flowanalysis/2012/09/__pre1 matches 0 files
Streaming Command Failed!

By the way; thanks a lot for this great contribution. I use it almost every day and works like a charm. I really like using the hadoop streaming in python!

Nicolas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant