readers

To read records from a data source, you should register an implementation of the RecordReader interface:

Job job = new JobBuilder()
    .reader(new MyRecordReader(myDataSource))
    .build();

There are several built-in record readers to read data from a variety of sources:

flat files (delimited and fixed length)
xml, json and yaml files
MS Excel files
in-memory strings
databases
JMS queues
BlockingQueue and Iterable objects
Java 8 streams
and standard input

Here is a table of built-in readers and how to use them:

Data source	Reader	Record type	Module
String	StringRecordReader	StringRecord	easy-batch-core
Directory	FileRecordReader	FileRecord	easy-batch-core
Iterable	IterableRecordReader	GenericRecord	easy-batch-core
Standard input	StandardInputRecordReader	StringRecord	easy-batch-core
Java 8 Stream	StreamRecordReader	GenericRecord	easy-batch-stream
Flat file	FlatFileRecordReader	StringRecord	easy-batch-flatfile
MS Excel file	MsExcelRecordReader	MsExcelRecord	easy-batch-msexcel
Xml stream	XmlRecordReader	XmlRecord	easy-batch-xml
Xml file	XmlFileRecordReader	XmlRecord	easy-batch-xml
Json stream	JsonRecordReader	JsonRecord	easy-batch-json
Json file	JsonFileRecordReader	JsonRecord	easy-batch-json
Yaml stream	YamlRecordReader	YamlRecord	easy-batch-yaml
Yaml file	YamlFileRecordReader	YamlRecord	easy-batch-yaml
Relational database	JdbcRecordReader	JdbcRecord	easy-batch-jdbc
Relational database	JpaRecordReader	GenericRecord	easy-batch-jpa
Relational database	HibernateRecordReader	GenericRecord	easy-batch-hibernate
BlockingQueue	BlockingQueueRecordReader	GenericRecord	easy-batch-core
JmsQueue	JmsRecordReader	JmsRecord	easybatch-jms

Handling data reading failures

Sometimes, the data source may be temporarily unavailable. In this case, the record reader will fail to read data and the job will be aborted. The RetryableRecordReader can be used to retry reading data using a delegate RecordReader with a RetryPolicy.

Job job = new JobBuilder()
    .reader(new RetryableRecordReader(unreliableDataSourceReader, new RetryPolicy(5, 1, SECONDS)))
    .build();

This will make the reader retries at most 5 times waiting one second between each attempt. If after 5 attempts the data source is still unreachable, the job will be aborted.

Performance notes

The JdbcRecordReader reads records in chunks. For large data sets, you can set the maxRows and fetchSize parameters to prevent loading data entirely in memory.
The JpaRecordReader loads all data fetched by the JPQL query into a java.util.List object. You should pay attention to large data sets with the JPQL query you specify to the JpaRecordReader. You can specify the maximum number of rows to fetch using the maxResults parameter.
The HibernateRecordReader uses the org.hibernate.ScrollableResults behind the scene to stream records in chunks. You can specify the fetch size and the maximum rows to fetch using the fetchSize and maxResult parameters.

Reading data from multiple files

It is possible to read data from multiple files using a MultiFileRecordReader. This assumes that all files have the same format. A MultiFileRecordReader reads files in sequence and all records are passed to the processing pipeline as if they were read from the same file. There are 4 MultiFileRecordReaders : MultiFlatFileRecordReader, MultiXmlFileRecordReader, MultiJsonFileRecordReader and MultiYamlFileRecordReader to read multiples flat, xml, json and yaml files respectively.

`JdbcRecord` caveat

The JdbcRecordReader produces records of type JdbcRecord. A JdbcRecord has a java.sql.ResultSet as payload. In a scenario where you have a master job that reads data from a relational database and dispatch them to workers, the master job could have finished reading the data source and dispatched all records to worker queues, while workers are still processing those records. Hence, the master job will close the database connection and the dispatched JDBC records are no more usable since their payload depend on the connection that has been closed by the master job!

A solution to this problem is to make the master job map JDBC records to domain objects and dispatch those objects safely to workers. You can find an example in the fork/join tutorial.

Easy Batch is created by Mahmoud Ben Hassine with the help of some awesome contributors

Introduction
User guide
- Key APIs
- Job reference
- Component reference
  - Readers
  - Writers
  - Filters
  - Mappers
  - Marshallers
  - Processors
  - Validators
  - Listeners
Tutorials
Get involved
- Release notes
- FAQs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readers

Handling data reading failures

Performance notes

Reading data from multiple files

`JdbcRecord` caveat

Clone this wiki locally

readers

Handling data reading failures

Performance notes

Reading data from multiple files

JdbcRecord caveat

Clone this wiki locally

`JdbcRecord` caveat