This repository has been archived by the owner on Apr 5, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 359
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add docs for scheduling and job listeners.
- Loading branch information
1 parent
d4f593c
commit bda3cde
Showing
1 changed file
with
107 additions
and
38 deletions.
There are no files selected for viewing
145 changes: 107 additions & 38 deletions
145
docs/src/reference/docbook/reference/dev-guide/dev-spring-hadoop.xml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,39 +1,108 @@ | ||
<?xml version="1.0" encoding="UTF-8"?> | ||
<chapter xmlns="http://docbook.org/ns/docbook" version="5.0" | ||
xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xi="http://www.w3.org/2001/XInclude" | ||
xml:id="dev-guidance"> | ||
<title>Guidance and Examples</title> | ||
<para>In this Chapter we will provide some guidance on how you can use the Spring | ||
Hadoop project in conjunction with other Spring projects, starting with the Spring Framework | ||
itself, then Spring Batch, and then Spring Integration. </para> | ||
<para>Spring Hadoop provides integration with the Spring Framework to create and run Hadoop | ||
MapReduce, Hive, and Pig jobs as well as work with HDFS and HBase. If you have simple needs | ||
to work with Hadoop, including basic scheduling, you can add the Spring Hadoop namespace to | ||
your Spring based project and get going quickly using Hadoop.</para> | ||
<para>As the complexity of your Hadoop application increases, you may want to use Spring Batch | ||
to regin in the complexity of developing a large Hadoop application. Spring Batch provides | ||
an extension to the Spring programming model to support common batch job scenarios | ||
characterized by the processing of large amounts of data from flat files, databases and | ||
messaging systems. It also provides a workflow style processing model, persistent tracking | ||
of steps within the workflow, event notification, as well as administrative functionality to | ||
start/stop/restart a workflow. As Spring Batch was designed to be extended, Spring Hadoop | ||
plugs into those extensibilty points, allowing for Hadoop related processing to be a first | ||
class citizen in the Spring Batch processing model.</para> | ||
<para>Another project of interest to Hadoop developers is Spring Integration. Spring | ||
Integration provides an extension of the Spring programming model to support the well-known | ||
<link xlink:href="http://www.eaipatterns.com">Enterprise Integration Patterns</link>. It | ||
enables lightweight messaging <emphasis role="italic">within</emphasis> Spring-based | ||
applications and supports integration with external systems via declarative adapters. These | ||
adapters are of particular interest to Hadoop developers, as they directly support common | ||
Hadoop use-cases such as polling a directory or FTP folder for the presence of a file or | ||
group of files. Then once the files are present, a message is sent internal to the | ||
application to do additional processing. This additional processing can be calling a Hadoop | ||
MapReduce job directly or starting a more complex Spring Batch based workflow. Similarly, a | ||
step in a Spring Batch workflow can invoke functionality in Spring Integration, for example | ||
to send a message though an email adapter.</para> | ||
<para>Not matter if you use the Spring Batch project with the Spring Framework by itself or with | ||
additional extentions such as Spring Batch and Spring Integration that focus on a particular | ||
domain, you will you benefit from the core values that Spring projects bring to the table, | ||
namely enabling modularity, reuse and extensive support for unit and integration | ||
testing.</para> | ||
</chapter> | ||
<chapter version="5.0" xml:id="dev-guidance" | ||
xmlns="http://docbook.org/ns/docbook" | ||
xmlns:xlink="http://www.w3.org/1999/xlink" | ||
xmlns:xi="http://www.w3.org/2001/XInclude" | ||
xmlns:ns5="http://www.w3.org/1999/xhtml" | ||
xmlns:ns4="http://www.w3.org/2000/svg" | ||
xmlns:ns3="http://www.w3.org/1998/Math/MathML" | ||
xmlns:ns="http://docbook.org/ns/docbook"> | ||
<title>Guidance and Examples</title> | ||
|
||
<para>In this Chapter we will provide some guidance on how you can use the | ||
Spring Hadoop project in conjunction with other Spring projects, starting | ||
with the Spring Framework itself, then Spring Batch, and then Spring | ||
Integration.</para> | ||
|
||
<para>Spring Hadoop provides integration with the Spring Framework to create | ||
and run Hadoop MapReduce, Hive, and Pig jobs as well as work with HDFS and | ||
HBase. If you have simple needs to work with Hadoop, including basic | ||
scheduling, you can add the Spring Hadoop namespace to your Spring based | ||
project and get going quickly using Hadoop.</para> | ||
|
||
<para>As the complexity of your Hadoop application increases, you may want | ||
to use Spring Batch to regin in the complexity of developing a large Hadoop | ||
application. Spring Batch provides an extension to the Spring programming | ||
model to support common batch job scenarios characterized by the processing | ||
of large amounts of data from flat files, databases and messaging systems. | ||
It also provides a workflow style processing model, persistent tracking of | ||
steps within the workflow, event notification, as well as administrative | ||
functionality to start/stop/restart a workflow. As Spring Batch was designed | ||
to be extended, Spring Hadoop plugs into those extensibilty points, allowing | ||
for Hadoop related processing to be a first class citizen in the Spring | ||
Batch processing model.</para> | ||
|
||
<para>Another project of interest to Hadoop developers is Spring | ||
Integration. Spring Integration provides an extension of the Spring | ||
programming model to support the well-known <link | ||
xlink:href="http://www.eaipatterns.com">Enterprise Integration | ||
Patterns</link>. It enables lightweight messaging <emphasis | ||
role="italic">within</emphasis> Spring-based applications and supports | ||
integration with external systems via declarative adapters. These adapters | ||
are of particular interest to Hadoop developers, as they directly support | ||
common Hadoop use-cases such as polling a directory or FTP folder for the | ||
presence of a file or group of files. Then once the files are present, a | ||
message is sent internal to the application to do additional processing. | ||
This additional processing can be calling a Hadoop MapReduce job directly or | ||
starting a more complex Spring Batch based workflow. Similarly, a step in a | ||
Spring Batch workflow can invoke functionality in Spring Integration, for | ||
example to send a message though an email adapter.</para> | ||
|
||
<para>Not matter if you use the Spring Batch project with the Spring | ||
Framework by itself or with additional extentions such as Spring Batch and | ||
Spring Integration that focus on a particular domain, you will you benefit | ||
from the core values that Spring projects bring to the table, namely | ||
enabling modularity, reuse and extensive support for unit and integration | ||
testing.</para> | ||
|
||
<section> | ||
<title>Scheduling</title> | ||
|
||
<para>Spring Batch integrates with a variety of job schedulers and is not | ||
a scheduling framework. There are many good enterprise schedulers | ||
available in both the commercial and open source spaces such as Quartz, | ||
Tivoli, Control-M, etc. It is intended to work in conjunction with a | ||
scheduler, not replace a scheduler. As a lightweight solution, you can use | ||
Spring's built in scheduling support that will give you cron like and | ||
other basic scheduling trigger functionality. See the <link | ||
xlink:href="http://static.springsource.org/spring-batch/faq.html#schedulers">Task | ||
Execution and Scheduling</link> documention for more informaiton. A middle | ||
ground it to use Spring's Quartz integration, see <link | ||
xlink:href="http://static.springsource.org/spring/docs/3.1.x/spring-framework-reference/html/scheduling.html#scheduling-quartz">Using | ||
the OpenSymphony Quartz Scheduler</link> for more information. The Spring | ||
Batch distribution contains an example, but this documentation will be | ||
updated to provide some more directed examples with Hadoop, check for | ||
updates on the <link | ||
xlink:href="http://www.springsource.org/spring-data/hadoop">main web site | ||
of Spring Hadoop</link>.</para> | ||
</section> | ||
|
||
<section> | ||
<title>Batch Job Listeners</title> | ||
|
||
<para>Spring Batch let's you attached listeners at the job and step levels | ||
to perform additional processing. For example, at the end of a job you can | ||
perform some notification or perhaps even start another Spring Batch Jobs. | ||
As a brief example, implement the interface <link | ||
xlink:href="http://static.springsource.org/spring-batch/apidocs/org/springframework/batch/core/JobExecutionListener.html">JobExecutionListener</link> | ||
and configure it into the Spring Batch job as shown below.</para> | ||
|
||
<programlisting language="XML"> <batch:job id="job1"> | ||
<batch:step id="import" next="wordcount"> | ||
<batch:tasklet ref="script-tasklet"/> | ||
</batch:step> | ||
|
||
<batch:step id="wordcount"> | ||
<batch:tasklet ref="wordcount-tasklet" /> | ||
</batch:step> | ||
|
||
<batch:listeners> | ||
<batch:listener ref="simpleNotificatonListener"/> | ||
</batch:listeners> | ||
|
||
</batch:job> | ||
<bean id="simpleNotificatonListener" class="com.mycompany.myapp.SimpleNotificationListener"/></programlisting> | ||
|
||
<para></para> | ||
</section> | ||
</chapter> |