Skip to content
This repository


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Spring for Apache Hadoop is a framework for application developers to take advantage of the features of both Hadoop and Spring.

Octocat-spinner-32 docs SHDP-317 Better allocation localization April 02, 2014
Octocat-spinner-32 gradle SHDP-302 Align dependency versions with Spring IO projects March 19, 2014
Octocat-spinner-32 samples remove samples and update ref docs October 05, 2012
Octocat-spinner-32 spring-hadoop-batch SHDP-180 Fix javadoc warnings March 18, 2014
Octocat-spinner-32 spring-hadoop-build-tests SHDP-316 Remove ConfigurationUtils dependency on JobConf class April 01, 2014
Octocat-spinner-32 spring-hadoop-config SHDP-180 suppress/fix config warnings March 13, 2014
Octocat-spinner-32 spring-hadoop-core SHDP-316 Remove ConfigurationUtils dependency on JobConf class April 01, 2014
Octocat-spinner-32 spring-hadoop-store SHDP-237 Rearranging tests, clean up of generics / imports April 21, 2014
Octocat-spinner-32 spring-hadoop-test SHDP-306 Adding Hadoop 2.3.0 version as a build option March 20, 2014
Octocat-spinner-32 spring-yarn SHDP-325 Remove need for keepContextAlive in batch April 07, 2014
Octocat-spinner-32 .gitignore SHDP-235 SHDP-182 Support for boot and yarn javaconfig January 31, 2014
Octocat-spinner-32 .springBeans + initial migration to Gradle September 19, 2011
Octocat-spinner-32 add contributing guidelines September 18, 2012
Octocat-spinner-32 Removing cascading references February 28, 2014
Octocat-spinner-32 build.gradle SHDP-319 Add Hortonworks HDP 2.1 as distro hdp21 April 22, 2014
Octocat-spinner-32 SHDP-319 Add Hortonworks HDP 2.1 as distro hdp21 April 22, 2014
Octocat-spinner-32 gradlew upgrade to Gradle 1.3 November 27, 2012
Octocat-spinner-32 gradlew.bat Upgrading to Gradle wrapper 1.9 December 12, 2013
Octocat-spinner-32 maven.gradle refined and minimized number of dependencies January 21, 2013
Octocat-spinner-32 settings.gradle SHDP-319 Add Hortonworks HDP 2.1 as distro hdp21 April 22, 2014

The Spring for Apache Hadoop provides extensions to Spring, Spring Batch, and Spring Integration to build manageable and robust pipeline solutions around Hadoop.

Spring for Apache Hadoop extends Spring Batch by providing support for reading from and writing to HDFS, running various types of Hadoop jobs (Java MapReduce, Streaming, Hive, Pig) and HBase interactions. An important goal is to provide excellent support for non-Java based developers to be productive using Spring Hadoop and not have to write any Java code to use the core feature set.

Spring for Apache Hadoop also applies the familiar Spring programming model to Java MapReduce jobs by providing support for dependency injection of simple jobs as well as a POJO based MapReduce programming model that decouples your MapReduce classes from Hadoop specific details such as base classes and data types.


You can find out more details from the user documentation or by browsing the javadocs. If you have ideas about how to improve or extend the scope, please feel free to contribute.


  • Maven:

<!-- used for nightly builds -->
  <name>Springframework Maven SNAPSHOT Repository</name>

<!-- used for milestone/rc releases -->
  <name>Springframework Maven Milestone Repository</name>
  • Gradle:

Based on the artifact type, pick one of the repos below:

repositories {
  maven { url "" }
  maven { url "" }
  maven { url "" }

dependencies {
   compile "${version}"

The dependency shown above is the standard one that includes the namespace support as well as core and batch support. If you don't use the namespace then you can use either the spring-data-hadoop-batch or spring-data-hadoop-core artifacts, depending on if you use any of the batch features or not.

The available releases can be seen in the SpringSource Repository


Spring for Apache Hadoop uses Gradle as its build system. To build the system simply run:


from the project root folder. This will compile the sources, run the tests and create the artifacts.

Supported distros

By default Spring for Apache Hadoop compiles against the Apache Hadoop 2.2.x stable relase (hadoop22) *.

The following distros and versions are also supported:

  • Apache Hadoop 1.2.x (hadoop12)
  • Pivotal HD 1.1 (phd1)
  • Cloudera CDH4 MR1 (cdh4, cdh4mr1)
  • Cloudera CDH4 YARN (cdh4yarn)
  • Cloudera CDH5 YARN (cdh5, cdh5yarn) *
  • Cloudera CDH5 MR1 (cdh5mr1)
  • Hortonworks HDP 1.3 (hdp13)
  • Hortonworks HDP 2.0 (hdp20) *

* The distributions noted with and asterisk will include spring-yarn support in the build.

To compile against a specific distro version pass the -Pdistro=<label> project property, like so:

gradlew -Pdistro=hadoop12 build

Note that the chosen distro is displayed on the screen:

Using Apache Hadoop 1.2.x [1.2.1]

In this case, the specified Hadoop distribution (above Apache Hadoop 1.2.x) is used to create the project binaries.

CI Builds

The status of the CI builds are available at Status Summary Screen

We are currently running tests against the following distributions:

  • Apache Hadoop 1.2.1
  • Apache Hadoop 2.2.0
  • Cloudera CDH4
  • Cloudera CDH5
  • Hortonworks HDP 1.3
  • Hortonworks HDP 2.0
  • Pivotal HD 1.1


For its testing, Spring for Apache Hadoop expects a pseudo-distributed/local Hadoop instalation available on localhost configured with a port of 8020 for HDFS. The local Hadoop setup allows the project classpath to be automatically used by the Hadoop job tracker. These settings can be customized in two ways:

  • Build properties

From the command-line, use hd.fs for the file-system (to avoid confusion, specify the protocol such as 'hdfs://', 's3://', etc - if none is specified, hdfs:// will be used), hd.jt for the jobtracker, hd.rm for the YARN resourcemanager and hd.hive for the Hive host/port information, to override the defaults. For example to run against HDFS at dumbo:8020 one would use:

gradlew -Phd.fs=hdfs://dumbo:8020 build
  • Properties file

Through the file under src/test/resources folder (further tweaks can be applied through hadoop-ctx.xml file under src/test/resources/org/springframework/data/hadoop).

Enabling Hbase/Hive/Pig/WebHdfs Tests

Note that by default, only the vanilla Hadoop tests are running - you can enable additional tests (such as Hive or Pig) by adding the tasks enableHBaseTests, enableHiveTests, enablePigTests or enableWebHdfsTests (or enableAllTests in short). Use file for customizing the default location for these services as well.

Disabling test execution

You can disable all tests by skipping the test task:

gradlew -x test


Here are some ways for you to get involved in the community:

  • Get involved with the Spring community on the Spring Community Forums. Please help out on the forum by responding to questions and joining the debate.
  • Create JIRA tickets for bugs and new features and comment and vote on the ones that you are interested in.
  • Watch for upcoming articles on Spring by subscribing to

Github is for social coding: if you want to write code, we encourage contributions through pull requests from forks of this repository. If you want to contribute code this way, read the Spring Framework contributor guidelines.

Staying in touch

Follow the project team (Costin, Mark, Thomas) on Twitter.

In-depth articles can be found at the SpringSource team blog, and releases are announced via our news feed.

Something went wrong with that request. Please try again.