You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was running a job that outputted to 'twoo/flowanalysis/2012/09/*', but this gives issues because when dumbo runs the hdfs (re)move operations (on overwrite="yes" for instance), it doesn't escape it properly and this results in an error.
By the way; thanks a lot for this great contribution. I use it almost every day and works like a charm. I really like using the hadoop streaming in python!
Nicolas
The text was updated successfully, but these errors were encountered:
I was running a job that outputted to 'twoo/flowanalysis/2012/09/*', but this gives issues because when dumbo runs the hdfs (re)move operations (on overwrite="yes" for instance), it doesn't escape it properly and this results in an error.
See the output below:
12/09/06 14:00:00 INFO streaming.StreamJob: map 100% reduce 100%
12/09/06 14:00:32 INFO streaming.StreamJob: Job complete: job_201208201604_77368
12/09/06 14:00:32 INFO streaming.StreamJob: Output: twoo/flowanalysis/2012/09/__pre1
Moved to trash: hdfs://hadoopname02/user/poison/twoo/flowanalysis/2012/09/__pre1
Moved to trash: hdfs://hadoopname02/user/poison/twoo/flowanalysis/2012/09/04
EXEC: HADOOP_CLASSPATH="/home/jeroen/mm.metrics/jars/tusks.jar:$HADOOP_CLASSPATH" /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop/contrib/streaming/hadoop-streaming-0.20.2-cdh3u4.jar -outputformat 'org.apache.hadoop.mapred.SequenceFileOutputFormat' -inputformat 'org.apache.hadoop.streaming.AutoInputFormat' -mapper 'python -m base64_users map 1 629145600' -reducer 'python -m base64_users red 1 629145600' -numReduceTasks '60' -file '/home/poison/scripts/base64_users.py' -file '/usr/lib/dumbo/eggs/ctypedbytes-0.1.8-py2.6-linux-x86_64.egg' -file '/usr/lib/dumbo/lib/python2.6/site-packages/dumbo-0.21.34-py2.6.egg' -file '/usr/lib/dumbo/lib/python2.6/site-packages/typedbytes-0.3.8-py2.6.egg' -file '/home/jeroen/mm.metrics/jars/tusks.jar' -output 'twoo/flowanalysis/2012/09/_' -jobconf 'stream.map.input=typedbytes' -jobconf 'stream.reduce.input=typedbytes' -jobconf 'stream.map.output=typedbytes' -jobconf 'stream.reduce.output=typedbytes' -jobconf 'mapred.job.name=base64_users.py (2/2)' -input 'twoo/flowanalysis/2012/09/__pre1' -cmdenv 'dumbo_mrbase_class=dumbo.backends.common.MapRedBase' -cmdenv 'dumbo_jk_class=dumbo.backends.common.JoinKey' -cmdenv 'dumbo_runinfo_class=dumbo.backends.streaming.StreamingRunInfo' -cmdenv 'PYTHON_EGG_CACHE=/tmp/eggcache' -cmdenv 'PYTHONPATH=ctypedbytes-0.1.8-py2.6-linux-x86_64.egg:dumbo-0.21.34-py2.6.egg:typedbytes-0.3.8-py2.6.egg'
12/09/06 14:00:33 WARN streaming.StreamJob: -jobconf option is deprecated, please use -D instead.
packageJobJar: [/home/poison/scripts/base64_users.py, /usr/lib/dumbo/eggs/ctypedbytes-0.1.8-py2.6-linux-x86_64.egg, /usr/lib/dumbo/lib/python2.6/site-packages/dumbo-0.21.34-py2.6.egg, /usr/lib/dumbo/lib/python2.6/site-packages/typedbytes-0.3.8-py2.6.egg, /home/jeroen/mm.metrics/jars/tusks.jar, /tmp/hadoop-poison/hadoop-unjar6857496811056060599/] [] /tmp/streamjob4826297620725394446.jar tmpDir=null
12/09/06 14:00:33 INFO mapred.JobClient: Cleaning up the staging area hdfs://hadoopname02/tmp/hadoop-mapred/mapred/staging/poison/.staging/job_201208201604_77492
12/09/06 14:00:33 ERROR security.UserGroupInformation: PriviledgedActionException as:poison (auth:SIMPLE) cause:org.apache.hadoop.mapred.InvalidInputException: Input Pattern hdfs://hadoopname02/user/poison/twoo/flowanalysis/2012/09/__pre1 matches 0 files
12/09/06 14:00:33 ERROR streaming.StreamJob: Error Launching job : Input Pattern hdfs://hadoopname02/user/poison/twoo/flowanalysis/2012/09/__pre1 matches 0 files
Streaming Command Failed!
By the way; thanks a lot for this great contribution. I use it almost every day and works like a charm. I really like using the hadoop streaming in python!
Nicolas
The text was updated successfully, but these errors were encountered: