Skip to content
This repository has been archived by the owner on Nov 4, 2019. It is now read-only.
/ cqlstorage-udf Public archive

UDF to work around loading data from CqlStorage into Pig

Notifications You must be signed in to change notification settings

iamthechad/cqlstorage-udf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Loading Cassandra data using Pig and CqlStorage

The Problem

Currently, CqlStorage has an apparent issue with loading CQL3 table data. This problem is seen with Cassandra 1.2.8 and Pig 0.11.1.

The following JIRAs are open for this issue:

Loading a data structure seems to work:

data = LOAD 'cql://bookdata/books' USING CqlStorage();
DESCRIBE data;

results in this:

data: {isbn: chararray,bookauthor: chararray,booktitle: chararray,publisher: chararray,yearofpublication: int}

However, DUMPing the data gets results like these:

((isbn,0425093387),(bookauthor,Georgette Heyer),(booktitle,Death in the Stocks),(publisher,Berkley Pub Group),(yearofpublication,1986))

Clearly the results from Cassandra are key/value pairs, as would be expected. The schema generated by CqlStorage() is different - trying to operate on data per the schema yields wrongs results, and trying to operate on data per the actual structure causes runtime errors.

This UDF is a temporary workaround until the issue is solved.

Using this UDF

Run mvn target to generate the jar file. Place it somewhere that your Pig script has access to it, and modify your Pig script like this:

-- Register the UDF
REGISTER /path/to/cqlstorageudf-1.0-SNAPSHOT

-- FromCqlColumn will convert chararray, int, long, float, double
DEFINE FromCqlColumn com.megatome.pig.piggybank.tuple.FromCqlColumn();

-- Load data as normal
data_raw = LOAD 'cql://bookcrossing/books' USING CqlStorage();

-- Use the UDF
data = FOREACH data_raw GENERATE
    FromCqlStorage(isbn) AS ISBN,
    FromCqlStorage(bookauthor) AS BookAuthor,
    FromCqlStorage(booktitle) AS BookTitle,
    FromCqlStorage(publisher) AS Publisher,
    FromCqlStorage(yearofpublication) AS YearOfPublication;
    
-- Process data as desired

About

UDF to work around loading data from CqlStorage into Pig

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages