Pig Latin script that I used to process 0.5 TB of data (billion triple data set) using Amazon Elastic MapReduce. This was one of the optional assignments on the Introduction to Data Science course on Coursera.
What this script does: Reads in RDF data from the billion triple dataset, groups data by counts, and outputs a histogram of the distribution of counts across the subjects.