Skip to content

Automatic Retry Scheduler for FAILED Events (Complement to Staleness Monitor) #1439

@raviprakashmishra

Description

@raviprakashmishra

Feature Request: Automatic Retry Scheduler for FAILED Events

Requested: Add configurable automatic retry scheduler that leverages the new v2 schema infrastructure (completion_attempts, explicit states, multi-instance safety).

Context

Spring Modulith 2.0 M1 introduced:

  • Staleness monitor (detects stuck events, marks as FAILED) ✅
  • Completion attempts tracking ✅
  • Multi-instance safety ✅

Missing piece: Automatic retry of FAILED events

Current Behavior

  1. Event fails or gets stuck → Staleness monitor marks as FAILED ✅
  2. Event remains in FAILED state forever ❌
  3. Manual intervention required to retry ❌

Requested Feature

Add configurable retry scheduler:

spring:
  modulith:
    events:
      staleness:
        enabled: true
        processing-timeout: 5m
      
      # NEW: Automatic retry configuration
      retry:
        enabled: true
        fixed-delay: 60s
        max-attempts: 10              # ← Uses completion_attempts
        max-age: 7d                   # ← Age-based limit
        batch-size: 100
        states-to-retry:
          - FAILED                    # ← Retry events marked by staleness monitor
          - PUBLISHED                 # ← Retry immediate failures

How It Would Work

  1. External system fails → Event marked FAILED by staleness monitor
  2. Retry scheduler (every 60s):
    • Queries events in FAILED state
    • Filters by completion_attempts < 10
    • Filters by age < 7 days
    • Changes state to RESUBMITTED
    • Republishes events
  3. If successful → State changes to COMPLETED
  4. If fails again → completion_attempts++, back to FAILED
  5. After 10 attempts or 7 days → Marked as BLOCKED

Propoesed state diagram-

Image

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions