No description, website, or topics provided.
Branch: master
Clone or download
oliverhu Merge pull request #243 from linkedin/fix-codenarc-dependency-issue
Fix "Build fails when run in LinkedIn multiproduct #242"
Latest commit fdc62ed Feb 13, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
azkaban-client Addressed a bug in ready status in Azkaban client. Cleaned up the cod… Jan 12, 2018
example-project
gradle/wrapper Bump to Gradle 4.1 Nov 24, 2017
hadoop-jobs Addressed a bug in ready status in Azkaban client. Cleaned up the cod… Jan 12, 2018
hadoop-plugin-test LIHADOOP-43954: Rename TensorFlowTonyJob to TonyJob, add "tonyJob" to… Feb 13, 2019
hadoop-plugin Added JavaDoc to BaseNamedScopeContainer.tonyJob Feb 13, 2019
li-hadoop-plugin-test Unit tests removed Mar 7, 2018
li-hadoop-plugin fix variable substitution issue when generating YAML files. Sep 7, 2018
.gitignore Updates to use external resource files (#150) May 17, 2017
.travis.yml Update example-project to use applyUserProfile Hadoop DSL method and … Jun 18, 2017
CONTRIBUTORS.md Updating the version info after deprecating dataset.group property fr… Oct 23, 2018
LICENSE HADOOP-10914 Add Apache License Files to Hadoop Plugin Jun 2, 2015
NOTICE Update NOTICE and README.md files (#169) Jun 15, 2017
PUBLISHING.md Correct minor typo in PUBLISHING.md file Apr 28, 2016
README.md
VERSIONS.md Bump version to 0.15.0, update VERSIONS.md with TonyJob info Feb 13, 2019
build.gradle Fix "Build fails when run in LinkedIn multiproduct #242" Feb 14, 2019
gradle.properties Bump version to 0.15.0, update VERSIONS.md with TonyJob info Feb 13, 2019
gradlew Bump to Gradle 4.1 Nov 24, 2017
gradlew.bat Bump to Gradle 4.1 Nov 24, 2017
settings.gradle

README.md

Build Status Download

LinkedIn Gradle Plugin for Apache Hadoop

The LinkedIn Gradle Plugin for Apache Hadoop (which we shall refer to as simply the "Hadoop Plugin" for brevity) will help you more effectively build, test and deploy Hadoop applications.

In particular, the Plugin will help you easily work with Hadoop applications like Apache Pig and build workflows for Hadoop workflow schedulers such as Azkaban and Apache Oozie.

The Plugin includes the LinkedIn Gradle DSL for Apache Hadoop (which we shall refer to as simply the "Hadoop DSL" for brevity), a language for specifying jobs and workflows for Azkaban.

Hadoop Plugin User Guide

The Hadoop Plugin User Guide is available at User Guide.

Hadoop DSL Language Reference

The Hadoop DSL Language Reference is available at Hadoop DSL Language Reference.

Getting the Hadoop Plugin

The Hadoop Plugin is now published at plugins.gradle.org. Click on the link for a short snippet to add to your build.gradle file to start using the Hadoop Plugin.

Can I Benefit from the Hadoop Plugin and Hadoop DSL?

You must use Gradle as your build system to use the Hadoop Plugin. If you are using Azkaban, you should start using the Hadoop Plugin immediately and you should use the Hadoop DSL to develop all of your Azkaban workflows.

If you are using Apache Pig, the Plugin includes features that will statically validate your Pig scripts, saving you time by finding errors at build time instead of when you run your Pig script.

If you run Apache Pig or Apache Spark on a Hadoop cluster through a gateway node, the Plugin includes tasks that will automate the process of launching your Pig or Spark jobs on the gateway without you having to manually download your code and dependencies there first.

If you are using Gradle and you feel that you might benefit from any of the above features, consider using the Hadoop Plugin and the Hadoop DSL.

Example Project

We have added an Example Project that uses the Hadoop Plugin and DSL to build an example Azkaban workflow consisting of Apache Pig, Apache Hive and Java Map-Reduce jobs.

Apache Oozie Status

The Hadoop Plugin includes Gradle tasks for Apache Oozie, including the ability to upload versioned directories to HDFS, as well as Gradle tasks for issuing Oozie commands. If you are using Gradle as your build system and Apache Oozie as your Hadoop workflow scheduler, you might find the Hadoop Plugin useful. However, we would like to mention the fact that since we are no longer actively using Oozie at LinkedIn, it is possible that the Oozie tasks may fall into a non-working state.

Although we started on a Hadoop DSL compiler for Oozie, we did not complete it, and it is currently not in a usable form. We are not currently working on it and it is unlikely to be completed.

Recent News

  • May 2017 We have added an Example Project that uses the Hadoop Plugin and DSL
  • April 2016 We have made a refresh of the User Guide and Hadoop DSL Language Reference Wiki pages
  • January 2016 The Hadoop Plugin is now published on plugins.gradle.org
  • November 2015 Gradle version bumped to 2.7 and the Gradle daemon enabled - tests run much, much faster
  • August 2015 Initial pull requests for Oozie versioned deployments and the Oozie Hadoop DSL compiler have been merged
  • August 2015 The Hadoop Plugin and Hadoop DSL were released on Github! See the LinkedIn Engineering Blog post for the announcement!
  • July 2015 See our talk at the Gradle Summit

Project Structure

The project structure is setup as follows:

  • azkaban-client: Code to work with Azkaban via the Azkaban REST API
  • example-project: Example project that uses the Hadoop Plugin and DSL to build an example Azkaban workflow
  • hadoop-jobs: Code for re-usable Hadoop jobs and implementations of Hadoop DSL job types
  • hadoop-plugin: Code for the various plugins that comprise the Hadoop Plugin
  • hadoop-plugin-test: Test cases for the Hadoop Plugin
  • li-hadoop-plugin: LinkedIn-specific extensions to the Hadoop Plugin
  • li-hadoop-plugin-test: Test cases for the LinkedIn-specific extensions to the Hadoop Plugin

Although the li-hadoop-plugin code is generally specific to LinkedIn, it is included in the project to show you how to use subclassing to extend the core functionality of the Hadoop Plugin for your organization (and to make sure our open-source contributions don't break the LinkedIn customizations).

Building and Running Test Cases

To build the Plugin and run the test cases, run ./gradlew build from the top-level project directory.

To see all the test tasks, run ./gradlew tasks from the top-level project directory. You can run an individual test with ./gradlew test_testName. You can also run multiple tests by running ./gradlew test_testName1 ... test_testNameN.