Pig on Apache Spark
Clone or download
Latest commit 6972363 Dec 1, 2014
Permalink
Failed to load latest commit information.
.eclipse.templates PIG-3015: Rewrite of AvroStorage (jadler via cheolsoo) Jun 28, 2013
bin This codebase works with Spark 1.1.0 version Dec 1, 2014
conf This codebase works with Spark 1.1.0 version Dec 1, 2014
contrib This codebase works with Spark 1.1.0 version Dec 1, 2014
ivy This codebase works with Spark 1.1.0 version Dec 1, 2014
lib-src/bzip2/org/apache This codebase works with Spark 1.1.0 version Dec 1, 2014
license This codebase works with Spark 1.1.0 version Dec 1, 2014
shims This codebase works with Spark 1.1.0 version Dec 1, 2014
src This codebase works with Spark 1.1.0 version Dec 1, 2014
test This codebase works with Spark 1.1.0 version Dec 1, 2014
tutorial This codebase works with Spark 1.1.0 version Dec 1, 2014
.gitignore PIG-2764: Add a biginteger and bigdecimal type to pig (jcoveney) Jan 25, 2013
CHANGES.txt This codebase works with Spark 1.1.0 version Dec 1, 2014
KEYS This codebase works with Spark 1.1.0 version Dec 1, 2014
LICENSE This codebase works with Spark 1.1.0 version Dec 1, 2014
NOTICE.txt This codebase works with Spark 1.1.0 version Dec 1, 2014
README.md Added README.md Dec 1, 2014
RELEASE_NOTES.txt updated external reference to point to hadoop's new common dir Jun 22, 2009
autocomplete PIG-692 When running a job from a script, use that script name as the… Mar 5, 2009
build.xml This codebase works with Spark 1.1.0 version Dec 1, 2014
doap_Pig.rdf Added doap file. This will be used in listing Pig on Apache's index o… May 16, 2011
ivy.xml This codebase works with Spark 1.1.0 version Dec 1, 2014

README.md

Apache Pig

Pig is a dataflow programming environment for processing very large files. Pig's language is called Pig Latin. A Pig Latin program consists of a directed acyclic graph where each node represents an operation that transforms data. Operations are of two flavors: (1) relational-algebra style operations such as join, filter, project; (2) functional-programming style operators such as map, reduce.

Pig compiles these dataflow programs into (sequences of) map-reduce or Apache Tez jobs and executes them using Hadoop. It is also possible to execute Pig Latin programs in a "local" mode (without Hadoop cluster), in which case all processing takes place in a single local JVM.

General Info

For the latest information about Pig, please visit our website at:

http://pig.apache.org/

and our wiki, at:

http://wiki.apache.org/pig/

Getting Started

  1. To learn about Pig, try http://wiki.apache.org/pig/PigTutorial
  2. To build and run Pig, try http://wiki.apache.org/pig/BuildPig and http://wiki.apache.org/pig/RunPig
  3. To check out the function library, try http://wiki.apache.org/pig/PiggyBank

Contributing to the Project

We welcome all contributions. For the details, please, visit http://wiki.apache.org/pig/HowToContribute.