New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: clarify annotation types to support sparse #2296
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2296 +/- ##
==========================================
+ Coverage 90.88% 90.89% +0.01%
==========================================
Files 222 222
Lines 11757 11789 +32
==========================================
+ Hits 10685 10716 +31
- Misses 1072 1073 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Latency summaryCurrent PR yields:
Breakdown
Backed by latency-tracking. Further commits will update this comment. |
…notations-update
c8f6089
to
73fd095
Compare
4497b66
to
2e4ebab
Compare
2e4ebab
to
8e20514
Compare
@@ -39,8 +39,7 @@ class VectorIndexDriver(BaseIndexDriver): | |||
If `method` is not 'delete', documents without content are filtered out. | |||
""" | |||
|
|||
@staticmethod | |||
def _get_documents_embeddings(docs: 'DocumentSet'): | |||
def _get_documents_embeddings(self, docs: 'DocumentSet'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not static anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the inherited function needs instance attributes
jina/drivers/index.py
Outdated
def exec_sparse_cls_type(self) -> str: | ||
return self.exec.sparse_cls_type | ||
|
||
def _get_documents_embeddings(self, docs: 'DocumentSet'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I see
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But should the driver be storing this spare class type? Maybe it should still be static and just pass the type as another argument? I don't see it as being part of the Driver's expected knowledge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well the problem is then the signature changes between parent and child
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default argument being dense representation class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would not be used in the Dense case. It is handy to access dense embbedding via a property
@@ -11,6 +11,9 @@ | |||
from jina.types.document import Document | |||
|
|||
|
|||
# TODO: Add tests for sparse vectors index driver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a reminder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bwanglzu I have done changes to avoid having extra Driver
classes, so that it is easier for the user to use.
Still the VectorIndexDriver
and VectorSearchDriver
need to be unit tested
@@ -7,6 +7,7 @@ | |||
from jina.drivers.search import VectorSearchDriver | |||
from jina.executors.indexers import BaseVectorIndexer | |||
|
|||
# TODO: Add tests for sparse vectors search driver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a reminder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bwanglzu I have done changes to avoid having extra Driver classes, so that it is easier for the user to use.
Still the VectorIndexDriver and VectorSearchDriver need to be unit tested
jina/drivers/search.py
Outdated
def exec_sparse_cls_type(self) -> str: | ||
"""Get the sparse type from executor. | ||
|
||
:return: Sparse matrix type, default value is `scipy_coo`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not put this note here as it is hard to mantain. The default value will be in the base class of executor, not here
@@ -270,7 +289,7 @@ def add(self, keys: Iterable[str], vectors: 'np.ndarray', *args, **kwargs) -> No | |||
raise NotImplementedError | |||
|
|||
def query( | |||
self, vectors: 'np.ndarray', top_k: int, *args, **kwargs | |||
self, vectors: 'EncodingType', top_k: int, *args, **kwargs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be more clear if we creatr anotjher type var saying EncodeBatchType that is just an alias?
e3c31b6
to
24b1ddc
Compare
24b1ddc
to
40454d1
Compare
…notations-update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
refactor: clarify annotation types to support sparse (jina-ai#2296)
No description provided.