Skip to content
This repository has been archived by the owner on Jul 17, 2019. It is now read-only.
/ crunch-lib Public archive

Useful reusable pipeline components for Crunch jobs

License

Notifications You must be signed in to change notification settings

spotify/crunch-lib

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

crunch-lib

This repository contains useful reusable high-level components for common use-cases in processing data with Apache Crunch

If you want to try it, it's in the central Maven repo so you can use this snippet (or equivalent for gradle/sbt/...)

<dependency>
   <groupId>com.spotify.crunch</groupId>
   <artifactId>crunch-lib</artifactId>
   <version>0.0.5</version>
</dependency>

AvroCollections

  • extract pulls out individual fields from a PCollection of Avro records by their field names without the need for trivial MapFns
  • keyByAvroField keys a PCollection of Avro records by a specific field using it's name without the need for trivial MapFns

SPTables

  • swapKeyValue swaps the key and the value parts of a PTable
  • negateCounts negates the value part of a long-valued table to facilitate easy sort-descending

TopLists

  • topNYbyX Creates a top-list of elements in the provided PTable, categorised by the key of the input table and using the count of the value part of the input table.
  • globalTopList Create a list of unique items in the input collection with their count, sorted descending by their frequency.

Averages

  • meanValue Calculates the mean value for each key in the provided numerically-valued PTable.

Percentiles

  • distributed / inMemory Calculates a set of percentiles for each key in the provided numerically-valued PTable.

DoFns

  • detach wrap a DoFn operating as a reducer such that each value given by the Iterable is already detached (preventing object reuse problems)

About

Useful reusable pipeline components for Crunch jobs

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages