add dtype-based loading#461
Merged
michaelfeil merged 1 commit intomainfrom Nov 13, 2024
Merged
Conversation
Contributor
There was a problem hiding this comment.
PR Summary
This PR implements dtype-based loading strategies and device placement across transformer models, replacing manual dtype/device management with a more consistent approach.
- Added loading strategy support in
/libs/infinity_emb/infinity_emb/transformer/embedder/sentence_transformer.pywithloading_dtypeparameter for model initialization - Integrated quantization interface via
quant_interfacein/libs/infinity_emb/infinity_emb/transformer/classifier/torch.pyand/libs/infinity_emb/infinity_emb/transformer/crossencoder/torch.py - Added
torch.compilesupport in classifier and crossencoder implementations - Standardized float32 numpy output in CrossEncoder's encode_post method
- Removed manual half-precision conversion in favor of loading_dtype across transformer classes
3 file(s) reviewed, 4 comment(s)
Edit PR Review Bot Settings | Greptile
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #461 +/- ##
==========================================
+ Coverage 78.97% 79.08% +0.10%
==========================================
Files 42 42
Lines 3392 3414 +22
==========================================
+ Hits 2679 2700 +21
- Misses 713 714 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request includes several changes to improve the handling of loading strategies, device placement, and quantization in the
infinity_emblibrary. The most important changes involve updates to theSentenceClassifier,CrossEncoder, andSentenceTransformerclasses to incorporate new loading strategies and device placements, as well as handling different data types and quantization.Improvements to handling loading strategies and device placement:
libs/infinity_emb/infinity_emb/transformer/classifier/torch.py: Added support for loading strategies, device placement, and quantization in theSentenceClassifierclass. [1] [2]libs/infinity_emb/infinity_emb/transformer/crossencoder/torch.py: Updated theCrossEncoderclass to handle loading strategies, device placement, and quantization. [1] [2] [3] [4]libs/infinity_emb/infinity_emb/transformer/embedder/sentence_transformer.py: Enhanced theSentenceTransformerclass to support loading strategies and device placement.<!--Congratulations! You've made it this far! Thanks for submitting a PR to Infinity!
License & CLA
By submitting this PR, I confirm that my contribution is made under the terms of the MIT license.
-->
Related Issue
Checklist
Additional Notes
Add any other context about the PR here.