Wonderdog fails in Pig 0.10? #6

Closed
rjurney opened this Issue Jul 5, 2012 · 1 comment

Comments

Projects
None yet
2 participants
Contributor

rjurney commented Jul 5, 2012

---------- Forwarded message ----------
From: Russell Jurney russell.jurney@gmail.com
Date: Fri, Jun 22, 2012 at 4:05 PM
Subject: Weird problem in Pig 0.10 with STOR'ing JSON and then LOADing it as PigStorage chararray
To: user@pig.apache.org

The script that has worked in the past is thus:

/* Load Avro emails */
emails = load '/me/tmp/emails_big' using AvroStorage();
emails = filter emails by message_id IS NOT NULL;

/* JSONify the emails for ElasticSearch */
store emails into '/tmp/emails.json' using JsonStorage();

/* LOAD JSON as single field for storage in ElasticSearch with Wonderpig */
json_emails = load '/tmp/emails.json' using PigStorage() AS (json_record:chararray);
store json_emails into 'es://email/email?id=message_id&json=true&size=1000' using ElasticSearch();

Now I get this error:

grunt> json_emails = load '/tmp/emails.json' AS (json_record:chararray);
2012-06-22 15:45:34,136 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema: left is "json_record:chararray", right is "message_id:chararray,thread_id:chararray,in_reply_to:chararray,subject:chararray,body:chararray,date:chararray,froms:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},tos:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},ccs:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},bccs:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},reply_tos:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)}"
2012-06-22 15:45:34,136 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1031: Incompatable schema: left is "json_record:chararray", right is "message_id:chararray,thread_id:chararray,in_reply_to:chararray,subject:chararray,body:chararray,date:chararray,froms:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},tos:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},ccs:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},bccs:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)},reply_tos:bag{ARRAY_ELEM:tuple(real_name:chararray,address:chararray)}"
at org.apache.pig.newplan.logical.relational.LogicalSchema.merge(LogicalSchema.java:760)
at org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:114)
at org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100)
at org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:219)
at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
at org.apache.pig.newplan.logical.visitor.CastLineageSetter.(CastLineageSetter.java:57)
at org.apache.pig.PigServer$Graph.compile(PigServer.java:1635)
at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1566)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1538)
at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:490)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

I tried copying the file from /tmp/emails.json to /tmp/json_emails and loading it then - but that doesn't work. I tried calling PigStorage() explicitly, and that doesn't work either.

How am I supposed to pull this off?

I figured it out:

grunt> rm /tmp/emails.json/.pig_header
grunt> rm /tmp/emails.json/.pig_schema

Then I can load my JSON as chararray. Interesting problem.

Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Contributor

rjurney commented Jul 7, 2012

Fixed by #8

mrflip closed this Mar 2, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment