Queue refactor#462
Conversation
There was a problem hiding this comment.
PR Summary
This PR refactors the queue and batching system in the infinity-emb library, focusing on improved type safety, memory efficiency, and more robust error handling.
- Removed
batch_delayparameter from BatchHandler initialization, potentially impacting throughput optimization - Changed
pop_optimal_batchesin/libs/infinity_emb/infinity_emb/inference/queue.pyto use generator pattern for better memory efficiency - Increased result queue size from 4 to 8 in
/libs/infinity_emb/infinity_emb/inference/batch_handler.pyfor improved throughput - Added
QUEUE_TIMEOUTconstant (0.5s) in/libs/infinity_emb/infinity_emb/inference/batch_handler.pyfor consistent timeout handling - Introduced
BaseTypeHintin/libs/infinity_emb/infinity_emb/transformer/abstract.pyfor improved type safety across transformer classes
4 file(s) reviewed, 2 comment(s)
Edit PR Review Bot Settings | Greptile
| self._batch_handler = BatchHandler( | ||
| max_batch_size=self._engine_args.batch_size, | ||
| model_replicas=self._model_replicas, | ||
| batch_delay=self._min_inference_t / 2, | ||
| # batch_delay=self._min_inference_t / 2, | ||
| vector_disk_cache_path=self._engine_args.vector_disk_cache_path, |
There was a problem hiding this comment.
logic: Removing batch_delay could lead to aggressive batching and potential resource exhaustion. Consider adding a configurable minimum delay or documenting why this was removed.
| ] | ||
|
|
||
|
|
||
| def run_warmup(model, inputs) -> tuple[float, float, str]: |
There was a problem hiding this comment.
style: model parameter should be typed with BaseTypeHint
|
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #462 +/- ##
==========================================
- Coverage 79.08% 79.04% -0.04%
==========================================
Files 42 42
Lines 3414 3408 -6
==========================================
- Hits 2700 2694 -6
Misses 714 714 ☔ View full report in Codecov by Sentry. |
This pull request includes multiple changes to improve type hinting, optimize batch processing, and enhance queue handling in the
infinity_emblibrary. The most important changes include the introduction of a new type hint, adjustments to queue timeouts, and modifications to batch processing logic.Type Hinting Improvements:
BaseTypeHintto replaceBaseTransformerin type hints, improving code clarity and flexibility. [1] [2] [3]TYPE_CHECKINGimport and conditional imports of type hints to avoid runtime overhead.Queue Handling Enhancements:
QUEUE_TIMEOUTfor consistency and easier adjustments. [1] [2] [3] [4] [5] [6]_result_queuefrom 4 to 8 to handle more results concurrently.Batch Processing Optimizations:
pop_optimal_batchesmethod by removing unnecessary sorting and returning a generator instead of a list. [1] [2] [3]latest_firstparameter from thepop_optimal_batchescall to streamline the batch selection process.Miscellaneous:
batch_delayparameter inastartmethod to potentially remove unnecessary delays in batch handling.These changes collectively aim to improve the performance, maintainability, and clarity of the
infinity_emblibrary.<!--Congratulations! You've made it this far! Thanks for submitting a PR to Infinity!
License & CLA
By submitting this PR, I confirm that my contribution is made under the terms of the MIT license.
-->
Related Issue
Checklist
Additional Notes
Add any other context about the PR here.