Skip to content

ChunkOrientedStepBuilder: All Throwables (including Errors) are retried when only retryLimit() is configured without retry() #5078

@KILL9-NO-MERCY

Description

@KILL9-NO-MERCY

Hello Spring Batch team,
Following up on previous issue #5068, I discovered a related but opposite scenario that poses a potential risk.

While reviewing the fix for #5068, I realized that when retryLimit() is configured without retry(), all Throwables (including critical Errors like OutOfMemoryError, StackOverflowError) become retryable. It would have been great to catch this alongside the previous issue.

Bug description
When configuring retryLimit() without specifying retry() in Spring Batch 6, the retry mechanism attempts to retry all Throwables, including critical JVM Errors. This occurs because ExceptionTypeFilter(used by DefaultRetryPolicy) uses matchIfEmpty = true when both includes(configured by retry()) and excludes are empty.

Environment

  • Spring Batch version: 6.0.0-RC2

Steps to reproduce

  1. Configure a chunk-oriented step with retryLimit() but WITHOUT retry():
    @bean
    public Step step() {
    return new StepBuilder("step", jobRepository)
    .chunk(10, transactionManager)
    .reader(reader())
    .processor(processor())
    .writer(writer())
    .faultTolerant()
    .retryLimit(3)
    // No retry() configuration
    .build();
    }

  2. Throw a critical Error (e.g., OutOfMemoryError) from any component(ItemReader or ItemProcessor etc)

  3. Observe that even critical Errors are being retried

Expected behavior
When only retryLimit() is configured without retry():

  • Either no exceptions should be retried
  • Or only Exception and its subclasses should be retried (excluding Error)

Actual behavior
All Throwables (including Errors) are retried due to matchIfEmpty = true.

Minimal Complete Reproducible example

@Slf4j
@Configuration
public class IssueReproductionJobConfiguration {
    @Bean
    public Job issueReproductionJob(JobRepository jobRepository, Step issueReproductionStep) {
        return new JobBuilder(jobRepository)
                .start(issueReproductionStep)
                .build();
    }

    @Bean
    public Step issueReproductionStep(JobRepository jobRepository) {
        return new StepBuilder(jobRepository)
                .chunk(3)
                .reader(issueReproductionReader())
                .processor(issueReproductionProcessor())
                .writer(issueReproductionWriter())
                .faultTolerant()
                .retryLimit(2)
                // No retry() - expecting no retry or Exception-only retry
                .build();
    }

    @Bean
    public ItemReader issueReproductionReader() {
        return new ListItemReader<>(List.of("Item_1", "Item_2", "Item_3"));
    }

    @Bean
    public ItemProcessor issueReproductionProcessor() {
        return item -> {
            if ("Item_3".equals(item)) {
                log.error("OutOfMemoryError thrown for: {}", item);
                throw new OutOfMemoryError("Processing failed for " + item);
            }

            log.info("Successfully processed: {}", item);
            return item;
        };
    }

    @Bean
    public ItemWriter issueReproductionWriter() {
        return items -> {
            log.info("Writing items: {}", items.getItems());
            items.getItems().forEach(item -> log.info("Written: {}", item));
        };
    }
}

Actual output:

Successfully processed: Item_1
Successfully processed: Item_2
OutOfMemoryError thrown for: Item_3
OutOfMemoryError thrown for: Item_3  ← Retry 1
OutOfMemoryError thrown for: Item_3  ← Retry 2
Writing items: [Item_1, Item_2] ← Item_3 is skipped and writer proceeds (reported separately in #5077)
Written: Item_1
Written: Item_2


The `OutOfMemoryError` is retried 2 times, which could worsen the system state.

**Root cause analysis**
In `ChunkOrientedStepBuilder`:
```java
if (this.retryPolicy == null) {
    if (!this.retryableExceptions.isEmpty() || this.retryLimit > 0) {
       this.retryPolicy = RetryPolicy.builder()
          .maxAttempts(this.retryLimit)
          .includes(this.retryableExceptions)  // ← Empty set!
          .build();
    }
    else {
       this.retryPolicy = throwable -> false;
    }
}

When `retryableExceptions` is empty, `DefaultRetryPolicy` uses `ExceptionTypeFilter` with both `includes` and `excludes` empty.
In `ExceptionTypeFilter.matchTraversingCauses()`:
```java
private boolean matchTraversingCauses(Throwable exception) {
    boolean emptyIncludes = super.includes.isEmpty();
    boolean emptyExcludes = super.excludes.isEmpty();

    if (emptyIncludes && emptyExcludes) {
        return super.matchIfEmpty;  // ← Returns true!
    }
    // ...
}

Since matchIfEmpty = true, all Throwables match, including critical Errors.

Suggested fix

When retryLimit() is configured without retry(), default to Exception.class to exclude Errors:

if (this.retryPolicy == null) {
    if (!this.retryableExceptions.isEmpty() || this.retryLimit > 0) {
       Set<Class> exceptions = this.retryableExceptions.isEmpty()
             ? Set.of(Exception.class)
             : this.retryableExceptions;

       this.retryPolicy = RetryPolicy.builder()
          .maxAttempts(this.retryLimit)
          .includes(exceptions)
          .build();
    }
    else {
       this.retryPolicy = throwable -> false;
    }
}

This ensures:

  • Only Exception and its subclasses are retried by default
  • Critical JVM Errors are not retried
  • Users can still explicitly include specific exceptions via retry()

Could you please review this behavior? This seems like a potential risk when users configure retry limits without specifying which exceptions to retry.

Thank you for your time and consideration!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions