Skip to content

Phase 5: Reaper for stalled job recovery#7

Merged
mcollina merged 1 commit intomainfrom
phase-5-reaper
Feb 14, 2026
Merged

Phase 5: Reaper for stalled job recovery#7
mcollina merged 1 commit intomainfrom
phase-5-reaper

Conversation

@mcollina
Copy link
Copy Markdown
Member

Summary

  • Add Reaper class that monitors for stalled jobs and requeues them for retry
  • Subscribes to storage events to track when jobs start processing
  • Starts visibility timeout timers for each processing job
  • On timeout, checks if job is still processing and requeues if stalled
  • Cancels timers when jobs complete or fail
  • Periodic check of all workers' processing queues for recovery
  • Emits 'stalled' events when jobs are recovered

This completes M1: Core Working milestone - queue now works with MemoryStorage including stall recovery.

Test plan

  • Reaper lifecycle tests (start/stop)
  • Stalled job detection and recovery
  • Timer cancellation on job completion/failure
  • Periodic check for stalled jobs on startup
  • Multiple workers' processing queues checked
  • Timer cleanup on stop (no leaks)

🤖 Generated with Claude Code

The Reaper monitors for jobs that have been in processing state longer
than the visibility timeout and requeues them for retry.

Features:
- Subscribe to storage events to track when jobs start processing
- Start visibility timeout timers for each processing job
- On timeout, check if job is still processing and requeue if stalled
- Cancel timers when jobs complete or fail
- Periodic check of all workers' processing queues for recovery
- Emit 'stalled' events when jobs are recovered

This enables recovery from worker crashes or unresponsive handlers.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@mcollina mcollina merged commit 05f4432 into main Feb 14, 2026
10 checks passed
@mcollina mcollina deleted the phase-5-reaper branch February 14, 2026 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant