Model: Device-id and data-parallel inference in CLI and Torch#452
Model: Device-id and data-parallel inference in CLI and Torch#452michaelfeil merged 5 commits intomainfrom
Conversation
There was a problem hiding this comment.
PR Summary
Based on my analysis of the pull request, here is a concise summary of the key changes:
Added device-id and data-parallel inference capabilities to enable running models across multiple GPUs/devices:
- Added new
--device-idCLI option that accepts comma-separated device IDs (e.g. "0,1") for model placement across multiple GPUs/devices - Introduced LoadingStrategy class to manage device mapping, dtype configuration, and quantization settings across different hardware
- Modified BatchHandler to support multiple model replicas running in parallel across specified devices
- Updated test suite with retry logic and adjusted tolerance parameters to handle numerical differences from parallel processing
- Added proper error handling for device validation and unavailable hardware configurations
The changes enable better scaling and performance through parallel inference while maintaining the existing API interface.
23 file(s) reviewed, 32 comment(s)
Edit PR Review Bot Settings | Greptile
| │ --device-id TEXT device id defines the model │ | ||
| │ placement. e.g. `0,1` will │ | ||
| │ place the model on │ | ||
| │ MPS/CUDA/GPU 0 and 1 each │ |
There was a problem hiding this comment.
style: The phrase 'each' at the end of this line is ambiguous - does it mean the model is replicated on each device or split across devices?
| def update_loading_stategy(self): | ||
| """Assign a device id to the EngineArgs object.""" | ||
| from infinity_emb.inference import loading_strategy # type: ignore |
There was a problem hiding this comment.
syntax: update_loading_stategy() has a typo in its name (should be 'strategy')
| if self._loading_strategy is None: | ||
| self.update_loading_stategy() | ||
| elif isinstance(self._loading_strategy, dict): | ||
| object.__setattr__(self, "_loading_strategy", LoadingStrategy(**self._loading_strategy)) |
There was a problem hiding this comment.
style: loading strategy initialization should happen before pydantic validation to ensure the complete object is validated
| embedding_dtype: EmbeddingDtype = EmbeddingDtype[MANAGER.embedding_dtype[0]] | ||
| served_model_name: str = MANAGER.served_model_name[0] | ||
|
|
||
| _loading_strategy: Optional[LoadingStrategy] = None |
There was a problem hiding this comment.
style: _loading_strategy should be documented with a type hint comment explaining its purpose and structure
|
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #452 +/- ##
==========================================
- Coverage 79.18% 79.15% -0.04%
==========================================
Files 41 42 +1
Lines 3248 3363 +115
==========================================
+ Hits 2572 2662 +90
- Misses 676 701 +25 ☔ View full report in Codecov by Sentry. |
Description
Please provide a clear and concise description of the changes in this PR.
Related Issue
If applicable, link the issue this PR addresses.
Types of Change
Checklist
Additional Notes
Add any other context about the PR here.
License
By submitting this PR, I confirm that my contribution is made under the terms of the MIT license.