Skip to content
This repository has been archived by the owner on Apr 5, 2022. It is now read-only.

Commit

Permalink
Add docs for scheduling and job listeners.
Browse files Browse the repository at this point in the history
  • Loading branch information
markpollack committed Feb 6, 2012
1 parent d4f593c commit bda3cde
Showing 1 changed file with 107 additions and 38 deletions.
145 changes: 107 additions & 38 deletions docs/src/reference/docbook/reference/dev-guide/dev-spring-hadoop.xml
Original file line number Diff line number Diff line change
@@ -1,39 +1,108 @@
<?xml version="1.0" encoding="UTF-8"?>
<chapter xmlns="http://docbook.org/ns/docbook" version="5.0"
xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xi="http://www.w3.org/2001/XInclude"
xml:id="dev-guidance">
<title>Guidance and Examples</title>
<para>In this Chapter we will provide some guidance on how you can use the Spring
Hadoop project in conjunction with other Spring projects, starting with the Spring Framework
itself, then Spring Batch, and then Spring Integration. </para>
<para>Spring Hadoop provides integration with the Spring Framework to create and run Hadoop
MapReduce, Hive, and Pig jobs as well as work with HDFS and HBase. If you have simple needs
to work with Hadoop, including basic scheduling, you can add the Spring Hadoop namespace to
your Spring based project and get going quickly using Hadoop.</para>
<para>As the complexity of your Hadoop application increases, you may want to use Spring Batch
to regin in the complexity of developing a large Hadoop application. Spring Batch provides
an extension to the Spring programming model to support common batch job scenarios
characterized by the processing of large amounts of data from flat files, databases and
messaging systems. It also provides a workflow style processing model, persistent tracking
of steps within the workflow, event notification, as well as administrative functionality to
start/stop/restart a workflow. As Spring Batch was designed to be extended, Spring Hadoop
plugs into those extensibilty points, allowing for Hadoop related processing to be a first
class citizen in the Spring Batch processing model.</para>
<para>Another project of interest to Hadoop developers is Spring Integration. Spring
Integration provides an extension of the Spring programming model to support the well-known
<link xlink:href="http://www.eaipatterns.com">Enterprise Integration Patterns</link>. It
enables lightweight messaging <emphasis role="italic">within</emphasis> Spring-based
applications and supports integration with external systems via declarative adapters. These
adapters are of particular interest to Hadoop developers, as they directly support common
Hadoop use-cases such as polling a directory or FTP folder for the presence of a file or
group of files. Then once the files are present, a message is sent internal to the
application to do additional processing. This additional processing can be calling a Hadoop
MapReduce job directly or starting a more complex Spring Batch based workflow. Similarly, a
step in a Spring Batch workflow can invoke functionality in Spring Integration, for example
to send a message though an email adapter.</para>
<para>Not matter if you use the Spring Batch project with the Spring Framework by itself or with
additional extentions such as Spring Batch and Spring Integration that focus on a particular
domain, you will you benefit from the core values that Spring projects bring to the table,
namely enabling modularity, reuse and extensive support for unit and integration
testing.</para>
</chapter>
<chapter version="5.0" xml:id="dev-guidance"
xmlns="http://docbook.org/ns/docbook"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:ns5="http://www.w3.org/1999/xhtml"
xmlns:ns4="http://www.w3.org/2000/svg"
xmlns:ns3="http://www.w3.org/1998/Math/MathML"
xmlns:ns="http://docbook.org/ns/docbook">
<title>Guidance and Examples</title>

<para>In this Chapter we will provide some guidance on how you can use the
Spring Hadoop project in conjunction with other Spring projects, starting
with the Spring Framework itself, then Spring Batch, and then Spring
Integration.</para>

<para>Spring Hadoop provides integration with the Spring Framework to create
and run Hadoop MapReduce, Hive, and Pig jobs as well as work with HDFS and
HBase. If you have simple needs to work with Hadoop, including basic
scheduling, you can add the Spring Hadoop namespace to your Spring based
project and get going quickly using Hadoop.</para>

<para>As the complexity of your Hadoop application increases, you may want
to use Spring Batch to regin in the complexity of developing a large Hadoop
application. Spring Batch provides an extension to the Spring programming
model to support common batch job scenarios characterized by the processing
of large amounts of data from flat files, databases and messaging systems.
It also provides a workflow style processing model, persistent tracking of
steps within the workflow, event notification, as well as administrative
functionality to start/stop/restart a workflow. As Spring Batch was designed
to be extended, Spring Hadoop plugs into those extensibilty points, allowing
for Hadoop related processing to be a first class citizen in the Spring
Batch processing model.</para>

<para>Another project of interest to Hadoop developers is Spring
Integration. Spring Integration provides an extension of the Spring
programming model to support the well-known <link
xlink:href="http://www.eaipatterns.com">Enterprise Integration
Patterns</link>. It enables lightweight messaging <emphasis
role="italic">within</emphasis> Spring-based applications and supports
integration with external systems via declarative adapters. These adapters
are of particular interest to Hadoop developers, as they directly support
common Hadoop use-cases such as polling a directory or FTP folder for the
presence of a file or group of files. Then once the files are present, a
message is sent internal to the application to do additional processing.
This additional processing can be calling a Hadoop MapReduce job directly or
starting a more complex Spring Batch based workflow. Similarly, a step in a
Spring Batch workflow can invoke functionality in Spring Integration, for
example to send a message though an email adapter.</para>

<para>Not matter if you use the Spring Batch project with the Spring
Framework by itself or with additional extentions such as Spring Batch and
Spring Integration that focus on a particular domain, you will you benefit
from the core values that Spring projects bring to the table, namely
enabling modularity, reuse and extensive support for unit and integration
testing.</para>

<section>
<title>Scheduling</title>

<para>Spring Batch integrates with a variety of job schedulers and is not
a scheduling framework. There are many good enterprise schedulers
available in both the commercial and open source spaces such as Quartz,
Tivoli, Control-M, etc. It is intended to work in conjunction with a
scheduler, not replace a scheduler. As a lightweight solution, you can use
Spring's built in scheduling support that will give you cron like and
other basic scheduling trigger functionality. See the <link
xlink:href="http://static.springsource.org/spring-batch/faq.html#schedulers">Task
Execution and Scheduling</link> documention for more informaiton. A middle
ground it to use Spring's Quartz integration, see <link
xlink:href="http://static.springsource.org/spring/docs/3.1.x/spring-framework-reference/html/scheduling.html#scheduling-quartz">Using
the OpenSymphony Quartz Scheduler</link> for more information. The Spring
Batch distribution contains an example, but this documentation will be
updated to provide some more directed examples with Hadoop, check for
updates on the <link
xlink:href="http://www.springsource.org/spring-data/hadoop">main web site
of Spring Hadoop</link>.</para>
</section>

<section>
<title>Batch Job Listeners</title>

<para>Spring Batch let's you attached listeners at the job and step levels
to perform additional processing. For example, at the end of a job you can
perform some notification or perhaps even start another Spring Batch Jobs.
As a brief example, implement the interface <link
xlink:href="http://static.springsource.org/spring-batch/apidocs/org/springframework/batch/core/JobExecutionListener.html">JobExecutionListener</link>
and configure it into the Spring Batch job as shown below.</para>

<programlisting language="XML"> &lt;batch:job id="job1"&gt;
&lt;batch:step id="import" next="wordcount"&gt;
&lt;batch:tasklet ref="script-tasklet"/&gt;
&lt;/batch:step&gt;

&lt;batch:step id="wordcount"&gt;
&lt;batch:tasklet ref="wordcount-tasklet" /&gt;
&lt;/batch:step&gt;

&lt;batch:listeners&gt;
&lt;batch:listener ref="simpleNotificatonListener"/&gt;
&lt;/batch:listeners&gt;

&lt;/batch:job&gt;
&lt;bean id="simpleNotificatonListener" class="com.mycompany.myapp.SimpleNotificationListener"/&gt;</programlisting>

<para></para>
</section>
</chapter>

0 comments on commit bda3cde

Please sign in to comment.