Skip to content
This repository

Hadoop library for large-scale data processing, now an Apache Incubator project

Octocat-spinner-32 .settings Support for embedding Pig scripts in Java test files June 26, 2013
Octocat-spinner-32 cobertura Initial commit September 22, 2011
Octocat-spinner-32 contrib Hourglass: GenerateIds should allow optional id range December 17, 2013
Octocat-spinner-32 eclipselibs Support for embedding Pig scripts in Java test files June 26, 2013
Octocat-spinner-32 examples Add examples January 28, 2013
Octocat-spinner-32 ivy Package small subset of google's guava in DataFu November 26, 2013
Octocat-spinner-32 licenses Initial commit September 22, 2011
Octocat-spinner-32 plugin Add source of multiline java code July 03, 2013
Octocat-spinner-32 src Add missing license headers January 02, 2014
Octocat-spinner-32 staticlibs Merge with linkedin master July 01, 2013
Octocat-spinner-32 test Add missing license headers January 02, 2014
Octocat-spinner-32 tools Initial commit September 22, 2011
Octocat-spinner-32 .classpath.template Fix eclipse .classpath November 26, 2013
Octocat-spinner-32 .factorypath.template Support for embedding Pig scripts in Java test files June 26, 2013
Octocat-spinner-32 .gitignore Updating other directories in gitignore file to start with a slash October 30, 2013
Octocat-spinner-32 .project Update eclipse files January 21, 2013
Octocat-spinner-32 .travis.yml Parallelize travis build January 02, 2014
Octocat-spinner-32 CONTRIBUTORS Add Jarcec to contributors October 23, 2013
Octocat-spinner-32 LICENSE Initial commit September 22, 2011
Octocat-spinner-32 NOTICE Update NOTICE December 17, 2013
Octocat-spinner-32 README.md Update README January 23, 2014
Octocat-spinner-32 build.xml Released 1.2.0 of datafu, hence bumping version December 06, 2013
Octocat-spinner-32 changes.md Update changes for version 1.2.0 December 06, 2013
Octocat-spinner-32 check-license-headers.sh Update file headers November 25, 2013
Octocat-spinner-32 ivy.xml Package small subset of google's guava in DataFu November 26, 2013
Octocat-spinner-32 ivysettings.xml Add classes to tools resolver for mac os July 01, 2013
Octocat-spinner-32 releasing.md Move release notes to separate file December 16, 2013
Octocat-spinner-32 settings.xml.template Update README, add settings.xml template January 25, 2013
Octocat-spinner-32 test.sh Clean up test classpaths November 25, 2013
Octocat-spinner-32 test_in_background.sh Clean up test classpaths November 25, 2013
README.md

Apache DataFu

Apache DataFu is a collection of libraries for working with large-scale data in Hadoop. The project was inspired by the need for stable, well-tested libraries for data mining and statistics.

It consists of two libraries:

  • Apache DataFu Pig: a collection of user-defined functions for Apache Pig
  • Apache DataFu Hourglass: an incremental processing framework for Apache Hadoop in MapReduce

DataFu is currently undergoing incubation with Apache. A mirror of the official git repository can be found on GitHub at https://github.com/apache/incubator-datafu.

For more information please visit the website:

If you'd like to jump in and get started, check out the corresponding guides for each library:

Blog Posts

Presentations

Papers

Getting Help

Bugs and feature requests can be filed here. For other help please see the discussion group.

Building the Code

The Apache DataFu Pig library can be built by running the command below. More information about working with the source code can be found in the DataFu Pig Contributing Guide.

ant jar

The Apache DataFu Pig library can be built by running the commands below. More information about working with the source code can be found in the DataFu Hourglass Contributing Guide.

cd contrib/hourglass
ant jar
Something went wrong with that request. Please try again.