Skip to content

filters

Mahmoud Ben Hassine edited this page Feb 7, 2020 · 3 revisions

You can filter records using a RecordFilter. This interface allows you to skip next stages of the pipeline if the record satisfies a given predicate. Typical examples are:

  • filter comment records (those beginning with # for example) in a flat file
  • filter log files (with extension .log) when processing a set of files in a directory
  • etc

To register a record filter, you can use the JobBuilder API as follows:

Job job = new JobBuilder()
    .filter(new myRecordFilter())
    .build();

You can register as many filters as you want anywhere in the pipeline. Next stages of the pipeline will be skipped for each filtered record. There are several built-in implementations for commonly used filters:

Filter Record type Module Description
EmptyStringRecordFilter StringRecord easy-batch-core Filter String records with empty payload
StartsWithStringRecordFilter StringRecord easy-batch-core Filter String records starting with a given prefix
EndsWithStringRecordFilter StringRecord easy-batch-core Filter String records ending with a given suffix
GrepFilter StringRecord easy-batch-core Keep String records containing the given pattern
HeaderRecordFilter Record easy-batch-core Filter the header record (first record in the data source)
FilteredRecordsCollector Record easy-batch-core Saves filtered records for later use
FileExtensionFilter FileRecord easy-batch-core Filter File records having a file name ending with a given extension
Clone this wiki locally