feat: redesign event processing with per-backend indexing and batch support#40
Merged
Conversation
…upport (#35) Replace fan-out-in-service pattern with explicit per-backend IndexRecord events for failure isolation and crash-safe processing. Key changes: - Add BatchEventListener protocol for efficient batch processing - Create IndexRecord event type for per-backend indexing - Add FanOutToIndexBackends listener (RecordPublished → IndexRecord per backend) - Add IndexRecordBatch batch listener (groups IndexRecords by backend) - Remove in-memory buffer from VectorStorageBackend (crash-safe) - Update BackgroundWorker to detect and dispatch to batch listeners - Add configurable batch_size to WorkerConfig This enables: - Per-backend failure isolation (retry just failed backend) - Crash-safe processing (events stay in outbox until committed) - Efficient batch embedding generation - Clear failure visibility with backend name and record SRN Closes #35
Follow-up improvements to event processing redesign: - Add round-robin fair queuing to fetch_pending() for parallel pipeline stages - Fix source service infinite loop when offset >= limit (remaining <= 0) - Add OSA_CONFIG_FILE env var to Dockerfile (config was being ignored) - Reduce log noise: batch summaries at INFO, per-event details at DEBUG - Add logging to GEO source for UID fetching and limit edge cases
|
There was a problem hiding this comment.
Pull request overview
This is a significant architectural redesign of the event processing system to address critical reliability issues. The PR introduces per-backend failure isolation, crash-safe processing, and efficient batch operations. It's well-designed and comprehensively tested with 67 passing tests.
Changes:
- Adds
BatchEventListenerprotocol for batch event processing - Introduces
IndexRecordevent type for per-backend indexing requests - Replaces direct indexing with event-driven fan-out (
FanOutToIndexBackends→IndexRecordBatch) - Removes in-memory buffers from
VectorStorageBackend, making it stateless - Implements round-robin fair queuing in event repository
- Fixes infinite loop in source service when
offset >= limit - Adds worker configuration and environment variable for Docker
Reviewed changes
Copilot reviewed 28 out of 29 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| server/osa/domain/shared/event.py | Added BatchEventListener protocol with metaclass support |
| server/osa/domain/index/event/index_record.py | New event type for per-backend indexing |
| server/osa/domain/index/listener/fanout_listener.py | Fan-out listener creates IndexRecord per backend |
| server/osa/domain/index/listener/index_batch_listener.py | Batch processes IndexRecord events by backend |
| server/osa/infrastructure/event/worker.py | Enhanced to support batch event dispatch |
| server/osa/infrastructure/persistence/repository/event.py | Added round-robin fair queuing |
| server/osa/infrastructure/index/vector/backend.py | Removed in-memory buffer, added ingest_batch |
| server/osa/domain/index/service/index.py | Simplified to query-only operations |
| server/osa/domain/source/service/source.py | Fixed infinite loop when offset >= limit |
| server/sources/geo_entrez/source.py | Added early exit for empty effective_limit |
| server/osa/config.py | Added WorkerConfig for poll settings |
| server/Dockerfile | Updated config file reference and added OSA_CONFIG_FILE |
| server/pyproject.toml | Added aarch64 platform support |
| server/osa/sdk/index/backend.py | Added ingest_batch to protocol, deprecated flush |
| server/tests/* | Comprehensive test coverage for new features |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Redesigns the event processing system to address critical reliability issues:
IndexRecordevents per backend, enabling independent retryBatchEventListenerprotocol for batch embedding generationKey Changes
BatchEventListenerprotocol andIndexRecordevent typeFanOutToIndexBackendslistener (RecordPublished → IndexRecord per backend)IndexRecordBatchbatch listener (groups by backend, calls ingest_batch)fetch_pending()offset >= limitOSA_CONFIG_FILEenv var to DockerfileTest Plan
Closes #35