Useful Batch Libraries

devashishshankar edited this page Feb 12, 2013 · 5 revisions

The Batch libraries include implementations of design patterns, wrappers/extensions over existing Trooper and other 3rd party libraries that may be used in application batch jobs. Here is a list of such libraries:

High Availability wrapper

Batch jobs written on the Trooper runtime may be scheduled for distributed execution on a number of identical batch nodes, all hosting the same set of deployed jobs. This ability of distributed execution significantly enhances Availability and Scalability of the batch jobs. The Trooper HA batch job wrapper is non-intrusive and uses the NetFlix Curator library (https://github.com/Netflix/curator) to perform "Leader Election" - a reference implementation of an Apache ZooKeeper recipe. The batch job developer is shielded from this complexity and is required to do the following steps to run jobs in distributed mode:

  • Add Maven dependency to the Trooper batch HA module:
<dependency>
    <groupId>org.trpr</groupId>
    <artifactId>batch-ha</artifactId>
    <version>1.0.0</version>
</dependency>
  • Configure the HABatchJob job wrapper in the job declaration Use org.trpr.platform.batch.impl.spring.job.ha.HABatchJob instead of org.trpr.platform.batch.impl.spring.job.BatchJob in declaring the job detail bean as shown in the example below:
<bean name="shellJobDetailBean" class="org.springframework.scheduling.quartz.JobDetailBean">
    <property name="jobClass" value="org.trpr.platform.batch.impl.spring.job.ha.HABatchJob" />
    <property name="group" value="sample-batch" />
    <property name="jobDataAsMap">
        <map>
            <entry key="jobName" value-ref="shellTaskletsJobHA" />
            <entry key="jobLocator" value-ref="jobRepository" />
            <entry key="jobLauncher" value-ref="jobLauncher" />
            <entry key="curatorClient" value-ref="curatorClient" />
        </map>
    </property>
</bean>	
<bean id="curatorClient" class="org.trpr.platform.batch.impl.spring.job.ha.CuratorClientFactory">
    <property name="zkConnectString" value="stage-flo-zk1.nm.flipkart.com:2181" />
</bean>

The complete batch HA example is available under Trooper batch examples as ShellTaskletsJobHA :

Trooper/examples/example-batch-HA/src/main/resources/external/shellTaskletsJobHA/spring-batch-config.xml

Leader Election among all running batch instances for a batch job is done on the job property jobShard. When not specified, the property jobName is used instead. jobShard may be used to partition execution within batch jobs - for e.g. value mapped to "Apparel" as one shard in a batch instance and "Mobile" in another. The Job name is used when shard name is not specified, implying that distribution/partitioning is at the job level (as shown in the example above).

Server mode

When running HA Jobs in server mode, all the Job hosts are kept in sync. Thus, modification of an HA Job or addition of a new HA Job in any one of the Job host pushes it automatically to all the Job hosts linked to the same Zookeeper server. Also in the console, the list of servers to which a job is deployed is displayed. Console and configuration pages of all these job hosts are accessible.

HA mode

Batched Partition Reader

The Spring Batch Reader interfaces described here : http://static.springsource.org/spring-batch/reference/html/readersAndWriters.html retrieve/create work for Processor and Writer implementations. The existing interfaces are quite suitable for single item reading - either individual or as a stream. The support for partitions enhances this somewhat. Multi-threaded partition readers are not directly supported, especially the ability to read items in batch from certain sources - for example a RPC or service call for N data records. The Trooper Batched Partition Reader implements batch based reading using the "Composite" Design Pattern. It leverages Spring Batch multi-threaded Task executor (if configured) to perform parallel batch reads of each partition. The Trooper batch CompositeItemStreamReader is interface compatible with the Spring ItemStream and ItemReader interfaces. A sample configuration of the composite reader is shown below :

<batch:step id="step1" xmlns="http://www.springframework.org/schema/batch">
    <batch:tasklet task-executor="greetingJobTaskExecutor">
        <batch:chunk reader="compositeGreetingDataReader" processor="compositeGreetingDataProcessor"  writer="compositeGreetingDataWriter" commit-interval="10"/>
    </batch:tasklet>
</batch:step>		
<bean id="compositeGreetingDataReader" class="org.trpr.platform.batch.impl.spring.reader.CompositeItemStreamReader">
    <constructor-arg><ref bean="greetingDataReader"/></constructor-arg>
</bean>	
<bean id="greetingDataReader" class="org.trpr.example.batch.greeting.reader.GreetingJobReader">
    <property name="batchSize" value="1000" />
</bean>	

The complete batch partitioned reader example is available under Trooper batch examples as GreetingJob :

Trooper/examples/example-batch/src/main/resources/external/greetingWorkSchedulerJob/spring-batch-config.xml