-
Notifications
You must be signed in to change notification settings - Fork 198
readers
To read records from a data source, you should register an implementation of the RecordReader
interface:
Job job = new JobBuilder()
.reader(new MyRecordReader(myDataSource))
.build();
There are several built-in record readers to read data from a variety of sources:
- flat files (delimited and fixed length)
- xml, json and yaml files
- MS Excel files
- in-memory strings
- databases
- JMS queues
- BlockingQueue and Iterable objects
- Java 8 streams
- and standard input
Here is a table of built-in readers and how to use them:
Data source | Reader | Record type | Module |
---|---|---|---|
String | StringRecordReader | StringRecord | easy-batch-core |
Directory | FileRecordReader | FileRecord | easy-batch-core |
Iterable | IterableRecordReader | GenericRecord | easy-batch-core |
Standard input | StandardInputRecordReader | StringRecord | easy-batch-core |
Java 8 Stream | StreamRecordReader | GenericRecord | easy-batch-stream |
Flat file | FlatFileRecordReader | StringRecord | easy-batch-flatfile |
MS Excel file | MsExcelRecordReader | MsExcelRecord | easy-batch-msexcel |
Xml stream | XmlRecordReader | XmlRecord | easy-batch-xml |
Xml file | XmlFileRecordReader | XmlRecord | easy-batch-xml |
Json stream | JsonRecordReader | JsonRecord | easy-batch-json |
Json file | JsonFileRecordReader | JsonRecord | easy-batch-json |
Yaml stream | YamlRecordReader | YamlRecord | easy-batch-yaml |
Yaml file | YamlFileRecordReader | YamlRecord | easy-batch-yaml |
Relational database | JdbcRecordReader | JdbcRecord | easy-batch-jdbc |
Relational database | JpaRecordReader | GenericRecord | easy-batch-jpa |
Relational database | HibernateRecordReader | GenericRecord | easy-batch-hibernate |
BlockingQueue | BlockingQueueRecordReader | GenericRecord | easy-batch-core |
JmsQueue | JmsRecordReader | JmsRecord | easybatch-jms |
Sometimes, the data source may be temporarily unavailable. In this case, the record reader will fail to read data and the job will be aborted.
The RetryableRecordReader
can be used to retry reading data using a delegate RecordReader
with a RetryPolicy
.
Job job = new JobBuilder()
.reader(new RetryableRecordReader(unreliableDataSourceReader, new RetryPolicy(5, 1, SECONDS)))
.build();
This will make the reader retries at most 5 times waiting one second between each attempt. If after 5 attempts the data source is still unreachable, the job will be aborted.
-
The
JdbcRecordReader
reads records in chunks. For large data sets, you can set themaxRows
andfetchSize
parameters to prevent loading data entirely in memory. -
The
JpaRecordReader
loads all data fetched by the JPQL query into ajava.util.List
object. You should pay attention to large data sets with the JPQL query you specify to theJpaRecordReader
. You can specify the maximum number of rows to fetch using themaxResults
parameter. -
The
HibernateRecordReader
uses theorg.hibernate.ScrollableResults
behind the scene to stream records in chunks. You can specify the fetch size and the maximum rows to fetch using thefetchSize
andmaxResult
parameters.
It is possible to read data from multiple files using a MultiFileRecordReader
. This assumes that all files have the same format. A MultiFileRecordReader
reads files in sequence and all records are passed to the processing pipeline as if they were read from the same file. There are 4 MultiFileRecordReader
s : MultiFlatFileRecordReader
, MultiXmlFileRecordReader
, MultiJsonFileRecordReader
and MultiYamlFileRecordReader
to read multiples flat, xml, json and yaml files respectively.
The JdbcRecordReader
produces records of type JdbcRecord
. A JdbcRecord
has a java.sql.ResultSet
as payload. In a scenario where you have a master job that reads data from a relational database and dispatch them to workers, the master job could have finished reading the data source and dispatched all records to worker queues, while workers are still processing those records. Hence, the master job will close the database connection and the dispatched JDBC records are no more usable since their payload depend on the connection that has been closed by the master job!
A solution to this problem is to make the master job map JDBC records to domain objects and dispatch those objects safely to workers. You can find an example in the fork/join tutorial.
Easy Batch is created by Mahmoud Ben Hassine with the help of some awesome contributors
-
Introduction
-
User guide
-
Job reference
-
Component reference
-
Get involved