feat: pull-based event processing with Worker + EventHandler pattern#44
Closed
rorybyrne wants to merge 5 commits into
Closed
feat: pull-based event processing with Worker + EventHandler pattern#44rorybyrne wants to merge 5 commits into
rorybyrne wants to merge 5 commits into
Conversation
…upport (#35) Replace fan-out-in-service pattern with explicit per-backend IndexRecord events for failure isolation and crash-safe processing. Key changes: - Add BatchEventListener protocol for efficient batch processing - Create IndexRecord event type for per-backend indexing - Add FanOutToIndexBackends listener (RecordPublished → IndexRecord per backend) - Add IndexRecordBatch batch listener (groups IndexRecords by backend) - Remove in-memory buffer from VectorStorageBackend (crash-safe) - Update BackgroundWorker to detect and dispatch to batch listeners - Add configurable batch_size to WorkerConfig This enables: - Per-backend failure isolation (retry just failed backend) - Crash-safe processing (events stay in outbox until committed) - Efficient batch embedding generation - Clear failure visibility with backend name and record SRN Closes #35
Follow-up improvements to event processing redesign: - Add round-robin fair queuing to fetch_pending() for parallel pipeline stages - Fix source service infinite loop when offset >= limit (remaining <= 0) - Add OSA_CONFIG_FILE env var to Dockerfile (config was being ignored) - Reduce log noise: batch summaries at INFO, per-event details at DEBUG - Add logging to GEO source for UID fetching and limit edge cases
* feat: migrate to pull-based Worker + EventHandler pattern Replace push-based EventListener with unified pull-based architecture: - Add EventHandler base class with config via classvars (__routing_key__, __batch_size__, __poll_interval__, __max_retries__, __claim_timeout__) - Simplify Worker to accept handler_type directly, read config from classvars - Add WorkerPool.register() for handler-based registration - Rename listener/ to handler/ across all domains (source, validation, curation, record, index) - Add VectorIndexHandler and KeywordIndexHandler with routing key support - Add database migration for worker columns (routing_key, retry_count, claimed_at, updated_at) and partial indexes for efficient claiming Workers now claim events using FOR UPDATE SKIP LOCKED, enabling concurrent processing without coordination. Each handler declares its own batch size and polling configuration. Test coverage: 108 unit tests, 15 integration tests for event claiming and concurrent worker behavior. * refactor: convert WorkerConfig from dataclass to Pydantic model Use Pydantic Field constraints (ge, gt) for numeric validation instead of __post_init__. This provides consistent validation behavior with other domain models and better error messages. * ci: run CI on PRs to any branch, not just main
Contributor
Author
|
Closing - will re-PR from 041 branch directly against main since PR #40 already merged the original 035 work. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Complete redesign of event processing from push-based
EventListenerto pull-basedWorker+EventHandlerpattern. Workers claim events usingFOR UPDATE SKIP LOCKED, enabling concurrent processing without coordination.Key Changes
__routing_key__,__batch_size__,__poll_interval__,__max_retries__,__claim_timeout__)handler_typedirectly and reads config from classvarsFanOutToIndexBackendscreates oneIndexRecordper backendclaim_timeoutArchitecture
Database Migration
Adds columns to
eventstable:routing_key- for per-backend event targetingretry_count- for tracking retry attemptsclaimed_at- for stale claim detectionupdated_at- for state trackingPlus partial indexes for efficient claiming.
Test Plan