Pigitos is a set of tiny, but highly useful UDFs for Apache Pig.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.settings
src
.project
Pigitos-1.0-SNAPSHOT.jar
README.md
pom.xml

README.md

Pigitos

About

Pigitos is a set of tiny, but highly useful Java UDFs for Apache Pig.

Contents

UDFs for manipulating maps

Pigitos provides UDFs to manipulate maps such as calculating the size of the map or retrieving keys (or values, or key/value pairs) as a bag. Such UDFs are very useful when working with dynamically created column qualifiers (that hold some meaningful information that you want to process) in Apache HBase tables.

It seems that there is no such UDFs in Apache Pig itself or Piggybank library. I have found only UDFs like TOBAG or TOTUPLE, but they do not take a map as an input parameter.

Currently, it contains following UDFs:

  • MapSize – takes a map and returns the number of entries in the map
  • MapKeysToBag – takes a map and produces a bag that contains all keys from that map
  • MapValuesToBag -takes a map and produces a bag that contains all values from that map
  • MapEntriesToBag – takes a map and produces a bag that contains tuples, where each tuple consists of two field: key and value (each tuple corresponds to one key/value pair from a map)

Here is a quick example:

User = LOAD 'hbase://user' USING HBaseStorage('friend:*', '-loadKey true') 
  AS (username:chararray, friendMap:map[]);
UserFriend = FOREACH User
  GENERATE username, FLATTEN(MapKeysToBag(friendsMap)) AS friendUsername;

Acknowledges

It is primarily developed at Centre for Open Science (CeON) at Interdisciplinary Centre for Mathematical and Computational Modelling (ICM), University of Warsaw (UW).