Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

A Pig to JSON UDF for Pig that converts tuples and bags to JSON strings

branch: master

Update README.md

latest commit 0789613730
Russell Jurney authored September 27, 2012
Octocat-spinner-32 dist Bugfix September 26, 2012
Octocat-spinner-32 src Ship it September 26, 2012
Octocat-spinner-32 .gitignore Committing for askign Thejas September 21, 2012
Octocat-spinner-32 Pig-to-json.iml Pushing what I got September 06, 2012
Octocat-spinner-32 README.md Update README.md September 27, 2012
Octocat-spinner-32 build.xml Pushing what I got September 06, 2012
Octocat-spinner-32 ivy.xml Pushing what I got September 06, 2012
Octocat-spinner-32 test.pig More examples September 26, 2012
README.md

pig-to-json

A Pig to JSON UDF for Pig that converts tuples and bags to JSON strings. This code is free under the Apache 2.0 license.

This project likely borrows (I'm not sure, its been a while since I started it and I looked at lots of stuff) from the following projects:

https://github.com/danharvey/pigJsonUtils http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/JsonStorage.java https://github.com/Ganglion/sounder/blob/master/udf/src/main/java/sounder/pig/json/ToJson.java

Building the Project

ant clean
ant dist

Using the ToJson UDF

The file test.pig is illustrative:

/* Load Avro jars and define shortcut */
register /me/Software/pig/build/ivy/lib/Pig/avro-1.5.3.jar
register /me/Software/pig/build/ivy/lib/Pig/json-simple-1.1.jar
register /me/Software/pig/contrib/piggybank/java/piggybank.jar
define AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();

register /me/Software/pig-to-json/dist/lib/pig-to-json.jar

-- Available at https://s3.amazonaws.com/rjurney_public_web/hadoop/enron.avro
emails = load '/me/Data/enron.avro' using AvroStorage();
emails = limit emails 10;
json_test = foreach emails generate message_id, com.hortonworks.pig.udf.ToJson(tos) as bag_json;
dump json_test

emails2 = load '/me/Data/enron.avro' using AvroStorage();
emails2 = limit emails2 10;
json_test2 = foreach emails2 generate message_id, com.hortonworks.pig.udf.ToJson(from) as tuple_json;
dump json_test2
Something went wrong with that request. Please try again.