Colbert#456
Conversation
There was a problem hiding this comment.
PR Summary
Here's a concise summary of the key changes in this PR that adds ColBERT model support:
Added support for multiple model replicas and device placement across GPUs:
- Introduced new
--device-idCLI argument to specify GPU/MPS device placement (e.g. "0,1") for model distribution - Added LoadingStrategy class to manage device mapping, dtype selection, and quantization across replicas
- Modified BatchHandler to support multiple ModelWorker instances running parallel model replicas
- Updated AsyncEmbeddingEngine to handle multiple model replicas instead of single instance
Added ColBERT-specific functionality:
- Introduced mode_colbert flag in SentenceTransformerPatched for ColBERT model detection
- Modified embedding processing to handle ColBERT's token-level embeddings and attention masks
- Added normalize_embeddings flag to control embedding normalization behavior
- Added test coverage for ColBERT models with token-level embedding verification
The changes enable efficient parallel processing across devices while maintaining backward compatibility with existing models.
25 file(s) reviewed, 9 comment(s)
Edit PR Review Bot Settings | Greptile
| model_warmup: bool = MANAGER.model_warmup[0] | ||
| vector_disk_cache_path: str = "" | ||
| device: Device = Device[MANAGER.device[0]] | ||
| device_id: DeviceID = field(default_factory=lambda: DeviceID(MANAGER.device_id[0])) |
There was a problem hiding this comment.
logic: device_id is not included in from_env() method's zip_longest parameters, causing potential initialization issues
| embeddings: "Tensor" = out_features.to(torch.float32) | ||
| if self.normalize_embeddings: |
There was a problem hiding this comment.
logic: redundant check for not self.mode_colbert since it's already in an if not self.mode_colbert block
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #456 +/- ##
==========================================
+ Coverage 79.12% 79.19% +0.07%
==========================================
Files 42 42
Lines 3367 3379 +12
==========================================
+ Hits 2664 2676 +12
Misses 703 703 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Description
Please provide a clear and concise description of the changes in this PR.
Related Issue
If applicable, link the issue this PR addresses.
Types of Change
Checklist
Additional Notes
Add any other context about the PR here.
License
By submitting this PR, I confirm that my contribution is made under the terms of the MIT license.