feat: migrate to pull-based Worker + EventHandler pattern#45
Merged
Conversation
Replace push-based EventListener with unified pull-based architecture: - Add EventHandler base class with config via classvars (__routing_key__, __batch_size__, __poll_interval__, __max_retries__, __claim_timeout__) - Simplify Worker to accept handler_type directly, read config from classvars - Add WorkerPool.register() for handler-based registration - Rename listener/ to handler/ across all domains (source, validation, curation, record, index) - Add VectorIndexHandler and KeywordIndexHandler with routing key support - Add database migration for worker columns (routing_key, retry_count, claimed_at, updated_at) and partial indexes for efficient claiming Workers now claim events using FOR UPDATE SKIP LOCKED, enabling concurrent processing without coordination. Each handler declares its own batch size and polling configuration. Test coverage: 108 unit tests, 15 integration tests for event claiming and concurrent worker behavior.
Use Pydantic Field constraints (ge, gt) for numeric validation instead of __post_init__. This provides consistent validation behavior with other domain models and better error messages.
|
- Add exponential backoff to event claiming (min(30, 5^retry_count) seconds) - Rename SkippedEventsError to SkippedEvents (control flow, not error) - Handlers now raise SkippedEvents when backend unavailable instead of silently returning (which incorrectly marked events as delivered) - Move sleep outside UoW scope to release DB connection during idle time - Update tests for new handler-based pattern
Contributor
Author
|
@greptile |
Contributor
Greptile OverviewGreptile SummaryThis PR migrates from push-based Key Changes
ArchitectureThe migration enables true concurrent processing:
Database MigrationAdds columns for routing_key, retry_count, claimed_at, updated_at with partial indexes optimized for claiming queries. TestingComprehensive test coverage with 108 unit tests and 15 integration tests, including concurrent worker tests and crash recovery scenarios. Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| server/osa/infrastructure/event/worker.py | Migrated to pull-based Worker pattern with EventHandler delegation, FOR UPDATE SKIP LOCKED claiming, and graceful shutdown |
| server/osa/domain/shared/event.py | Added EventHandler base class with classvars for configuration, WorkerConfig as Pydantic model with validation, and ClaimResult dataclass |
| server/osa/domain/shared/outbox.py | Added claim(), mark_failed_with_retry(), and reset_stale_claims() methods for pull-based event processing |
| server/osa/infrastructure/persistence/repository/event.py | Implemented claim() with FOR UPDATE SKIP LOCKED, exponential backoff logic, and retry handling |
| server/migrations/versions/add_worker_columns.py | Added routing_key, retry_count, claimed_at, updated_at columns with partial indexes for efficient claiming queries |
| server/osa/domain/index/handler/fanout_to_index_backends.py | Creates per-backend IndexRecord events with routing keys for independent retry and failure isolation |
Sequence Diagram
sequenceDiagram
participant App as FastAPI App
participant Pool as WorkerPool
participant Worker as Worker
participant Outbox as Outbox
participant Repo as EventRepository
participant Handler as EventHandler
participant DB as PostgreSQL
App->>Pool: start()
Pool->>Pool: emit ServerStarted event
Pool->>Worker: start() (per handler)
loop Poll Loop
Worker->>Worker: _poll_once()
Worker->>Outbox: claim(event_types, limit, routing_key)
Outbox->>Repo: claim()
Repo->>DB: SELECT ... FOR UPDATE SKIP LOCKED
Note over DB: Locks pending events<br/>matching routing_key<br/>respecting backoff
DB-->>Repo: locked rows
Repo->>DB: UPDATE status='claimed', claimed_at=now
Repo-->>Outbox: ClaimResult(events)
Outbox-->>Worker: ClaimResult
alt Events claimed
Worker->>Handler: handle() or handle_batch()
Handler-->>Worker: success
Worker->>Outbox: mark_delivered(event_id)
Outbox->>Repo: update_status('delivered')
Worker->>DB: commit()
else Handler raises SkippedEvents
Worker->>Outbox: mark_skipped(event_id, reason)
Outbox->>Repo: update_status('skipped')
Worker->>DB: commit()
else Handler raises Exception
Worker->>Outbox: mark_failed_with_retry(event_id, error, max_retries)
Outbox->>Repo: mark_failed_with_retry()
alt retry_count < max_retries
Repo->>DB: UPDATE status='pending', retry_count++
Note over DB: Event will be retried<br/>after backoff delay
else retry_count >= max_retries
Repo->>DB: UPDATE status='failed' (permanent)
end
Worker->>DB: commit()
else No events
Worker->>Worker: sleep(poll_interval)
end
end
loop Stale Claim Cleanup (every 60s)
Pool->>Outbox: reset_stale_claims(max_timeout)
Outbox->>Repo: reset_stale_claims()
Repo->>DB: UPDATE claimed events older than timeout<br/>SET status='pending'
Note over DB: Recovers events from<br/>crashed workers
end
The routing_key was duplicated: set on the IndexRecord payload and passed to outbox.append(). Only the DB column (via outbox.append()) is used for routing in claim() - the payload field was never read.
This was referenced Feb 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Migrates from push-based
EventListenerto pull-basedWorker+EventHandlerpattern. Workers claim events usingFOR UPDATE SKIP LOCKED, enabling concurrent processing without coordination.Changes
__routing_key__,__batch_size__,__poll_interval__,__max_retries__,__claim_timeout__)handler_typedirectly, reads config from classvarslistener/→handler/across all domainsDatabase Migration
Adds columns to
eventstable:routing_key,retry_count,claimed_at,updated_atwith partial indexes.Test Plan