Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: clarify annotation types to support sparse #2296

Merged
merged 16 commits into from Apr 17, 2021

Conversation

JoanFM
Copy link
Member

@JoanFM JoanFM commented Apr 9, 2021

No description provided.

@codecov
Copy link

codecov bot commented Apr 9, 2021

Codecov Report

Merging #2296 (d5b1bd1) into master (1825529) will increase coverage by 0.01%.
The diff coverage is 98.76%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2296      +/-   ##
==========================================
+ Coverage   90.88%   90.89%   +0.01%     
==========================================
  Files         222      222              
  Lines       11757    11789      +32     
==========================================
+ Hits        10685    10716      +31     
- Misses       1072     1073       +1     
Flag Coverage Δ
daemon 51.00% <22.22%> (-0.21%) ⬇️
jina 91.06% <98.76%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
jina/types/sets/document.py 93.52% <96.29%> (+0.24%) ⬆️
jina/drivers/index.py 97.36% <100.00%> (+0.39%) ⬆️
jina/drivers/search.py 98.75% <100.00%> (+0.13%) ⬆️
jina/executors/encoders/__init__.py 100.00% <100.00%> (ø)
jina/executors/indexers/__init__.py 97.89% <100.00%> (ø)
jina/types/document/__init__.py 96.37% <100.00%> (-0.01%) ⬇️
jina/types/ndarray/generic.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1825529...d5b1bd1. Read the comment docs.

@github-actions
Copy link

github-actions bot commented Apr 9, 2021

Latency summary

Current PR yields:

  • 😶 index QPS at 933, delta to last 3 avg.: -1%
  • 🐢🐢 query QPS at 13, delta to last 3 avg.: -6%

Breakdown

Version Index QPS Query QPS
current 933 13
1.1.5 955 14
1.1.4 945 13

Backed by latency-tracking. Further commits will update this comment.

@jina-bot jina-bot added size/M and removed size/S labels Apr 15, 2021
@jina-bot jina-bot added the area/testing This issue/PR affects testing label Apr 15, 2021
@JoanFM JoanFM force-pushed the type-annotations-update branch 2 times, most recently from 4497b66 to 2e4ebab Compare April 15, 2021 08:47
@@ -39,8 +39,7 @@ class VectorIndexDriver(BaseIndexDriver):
If `method` is not 'delete', documents without content are filtered out.
"""

@staticmethod
def _get_documents_embeddings(docs: 'DocumentSet'):
def _get_documents_embeddings(self, docs: 'DocumentSet'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not static anymore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the inherited function needs instance attributes

def exec_sparse_cls_type(self) -> str:
return self.exec.sparse_cls_type

def _get_documents_embeddings(self, docs: 'DocumentSet'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But should the driver be storing this spare class type? Maybe it should still be static and just pass the type as another argument? I don't see it as being part of the Driver's expected knowledge

Copy link
Member Author

@JoanFM JoanFM Apr 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well the problem is then the signature changes between parent and child

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default argument being dense representation class?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would not be used in the Dense case. It is handy to access dense embbedding via a property

jina/drivers/search.py Outdated Show resolved Hide resolved
@@ -11,6 +11,9 @@
from jina.types.document import Document


# TODO: Add tests for sparse vectors index driver.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a reminder

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bwanglzu I have done changes to avoid having extra Driver classes, so that it is easier for the user to use.

Still the VectorIndexDriver and VectorSearchDriver need to be unit tested

@@ -7,6 +7,7 @@
from jina.drivers.search import VectorSearchDriver
from jina.executors.indexers import BaseVectorIndexer

# TODO: Add tests for sparse vectors search driver.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a reminder

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bwanglzu I have done changes to avoid having extra Driver classes, so that it is easier for the user to use.

Still the VectorIndexDriver and VectorSearchDriver need to be unit tested

def exec_sparse_cls_type(self) -> str:
"""Get the sparse type from executor.

:return: Sparse matrix type, default value is `scipy_coo`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not put this note here as it is hard to mantain. The default value will be in the base class of executor, not here

@@ -270,7 +289,7 @@ def add(self, keys: Iterable[str], vectors: 'np.ndarray', *args, **kwargs) -> No
raise NotImplementedError

def query(
self, vectors: 'np.ndarray', top_k: int, *args, **kwargs
self, vectors: 'EncodingType', top_k: int, *args, **kwargs
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be more clear if we creatr anotjher type var saying EncodeBatchType that is just an alias?

@JoanFM JoanFM closed this Apr 16, 2021
@JoanFM JoanFM reopened this Apr 16, 2021
Copy link
Member

@bwanglzu bwanglzu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@JoanFM JoanFM merged commit 399728f into master Apr 17, 2021
@JoanFM JoanFM deleted the type-annotations-update branch April 17, 2021 14:51
sthagen added a commit to sthagen/jina-ai-jina that referenced this pull request Apr 17, 2021
refactor: clarify annotation types to support sparse (jina-ai#2296)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants