Conversation
There was a problem hiding this comment.
PR Summary
This PR introduces support for matryoshka embeddings and the nomic-ai/nomic-embed-text-v1.5 model through several key changes:
- Added matryoshka dimension support with new
dimensionsfield across text, audio and image embedding models, including validation and slicing logic - Added
einopsdependency and updated version to 0.0.73 to support nomic-ai/nomic-embed-text-v1.5 model requirements - Introduced
MatryoshkaDimErrorexception class for proper dimension validation handling - Simplified error handling in audio/vision utils by removing redundant try-except blocks
- Standardized error message format across endpoints using
{ex.__class__} -> {ex}pattern
The changes appear well-structured and maintain backward compatibility while adding new capabilities.
12 file(s) reviewed, 10 comment(s)
Edit PR Review Bot Settings | Greptile
| model: Union[Unset, str] = "default/not-specified" | ||
| encoding_format: Union[Unset, EmbeddingEncodingFormat] = UNSET | ||
| user: Union[None, Unset, str] = UNSET | ||
| dimensions: Union[Unset, int] = 0 |
There was a problem hiding this comment.
style: dimensions default value of 0 should be documented in docstring to match other attributes
libs/client_infinity/infinity_client/infinity_client/models/open_ai_embedding_input_text.py
Show resolved
Hide resolved
libs/client_infinity/infinity_client/infinity_client/models/open_ai_embedding_input_audio.py
Show resolved
Hide resolved
| RootModel, | ||
| Tag, | ||
| conlist, | ||
| conint, |
There was a problem hiding this comment.
style: conint is imported but not used in the changes shown
| encoding_format: EmbeddingEncodingFormat = EmbeddingEncodingFormat.float | ||
| user: Optional[str] = None | ||
| dimensions: Optional[Annotated[int, Field(strict=True, gt=0, lt=8193)]] = None | ||
| dimensions: int = 0 |
There was a problem hiding this comment.
logic: changing from Optional[int] to int with default=0 is a breaking change for API clients expecting null values
| def matryososka_slice( | ||
| embeddings: list[np.ndarray], matryoshka_dim: Optional[int] | ||
| ) -> list[np.ndarray]: |
There was a problem hiding this comment.
syntax: function name 'matryososka_slice' is misspelled, should be 'matryoshka_slice'
| getattr(self.model_worker[0]._model, "sampling_rate", -42), | ||
| ) |
There was a problem hiding this comment.
style: sampling rate fallback of -42 should be documented or use a more meaningful default
| class MatryoshkaDimError(Exception): | ||
| pass |
There was a problem hiding this comment.
style: Empty exception class needs docstring explaining when this error is raised and what it means
| resolved_audios = await asyncio.gather( | ||
| *[resolve_audio(audio, allowed_sampling_rate, session) for audio in audio_urls] | ||
| ) |
There was a problem hiding this comment.
style: consider using gather with return_exceptions=True to handle partial failures more gracefully
|
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #494 +/- ##
==========================================
+ Coverage 79.68% 79.92% +0.23%
==========================================
Files 42 42
Lines 3476 3467 -9
==========================================
+ Hits 2770 2771 +1
+ Misses 706 696 -10 ☔ View full report in Codecov by Sentry. |
Related Issue
Checklist
Additional Notes
Add any other context about the PR here.