Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add traversal paths #750

Merged
merged 15 commits into from
Jun 13, 2022
Merged

feat: add traversal paths #750

merged 15 commits into from
Jun 13, 2022

Conversation

numb3r3
Copy link
Member

@numb3r3 numb3r3 commented Jun 10, 2022

No description provided.

@codecov
Copy link

codecov bot commented Jun 10, 2022

Codecov Report

Merging #750 (ada03bc) into main (d5be8c2) will increase coverage by 0.24%.
The diff coverage is 98.27%.

@@            Coverage Diff             @@
##             main     #750      +/-   ##
==========================================
+ Coverage   81.49%   81.74%   +0.24%     
==========================================
  Files          17       17              
  Lines        1232     1205      -27     
==========================================
- Hits         1004      985      -19     
+ Misses        228      220       -8     
Flag Coverage 螖
cas 81.74% <98.27%> (+0.24%) 猬嗭笍

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage 螖
server/clip_server/executors/clip_onnx.py 85.48% <90.90%> (+0.76%) 猬嗭笍
client/clip_client/client.py 88.13% <100.00%> (-0.20%) 猬囷笍
server/clip_server/executors/clip_hg.py 86.07% <100.00%> (+0.54%) 猬嗭笍
server/clip_server/executors/clip_tensorrt.py 100.00% <100.00%> (+7.01%) 猬嗭笍
server/clip_server/executors/clip_torch.py 87.03% <100.00%> (+1.09%) 猬嗭笍
server/clip_server/executors/helper.py 100.00% <100.00%> (酶)

Continue to review full report at Codecov.

Legend - Click here to learn more
螖 = absolute <relative> (impact), 酶 = not affected, ? = missing data
Powered by Codecov. Last update d5be8c2...ada03bc. Read the comment docs.

num_worker_preprocess: int = 4,
minibatch_size: int = 32,
traversal_paths: str = '@r',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
traversal_paths: str = '@r',

Comment on lines +55 to +56
:param traversal_paths: Default traversal paths for encoding, used if
the traversal path is not passed as a parameter with the request.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:param traversal_paths: Default traversal paths for encoding, used if
the traversal path is not passed as a parameter with the request.

"""
super().__init__(*args, **kwargs)
self._minibatch_size = minibatch_size

self._use_default_preprocessing = use_default_preprocessing
self._max_length = max_length
self._traversal_paths = traversal_paths
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self._traversal_paths = traversal_paths

"""

traversal_paths = parameters.get('traversal_paths', self._traversal_paths)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
traversal_paths = parameters.get('traversal_paths', self._traversal_paths)
traversal_paths = parameters.get('traversal_paths', '@r')

Copy link
Member Author

@numb3r3 numb3r3 Jun 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanxiao I don't agree with this suggestion. It will break the following use case:

gateway -> encoder #1 (work on root_level) -> encoder #2(work on chunk_level)

It is impossible to pass the proper parameters:

client.post(on='/', parameters={'traversal_paths': '?????'})

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By defining the default traversal path in __init__, client.pos(on='/') works

gateway -> encoder #1 (traversal_paths='@r') -> encoder #2(traversal_paths='@c')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why it is impossible? i dont get it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanxiao From the above flow example, there are two encoders both working on different level documents (one on root-level, another on chunk-level).

  • use @r on request parameter at client: client.post(on='/', parameters={'traversal_paths': '@r'})
    -> encoder 2 cannot work
  • use @c on request parameter at client: client.post(on='/', parameters={'traversal_paths': '@c'})
    -> encoder 1 cannot work

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but you should be able to send parameter to one particular Executor

@@ -184,10 +181,15 @@ def _iter_doc(self, content) -> Generator['Document', None, None]:
)

def _get_post_payload(self, content, kwargs):
parameters = {}
if 'batch_size' in kwargs:
parameters['minibatch_size'] = kwargs['batch_size']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i disagree the exposing minibatch_size to public client. It can easily overload a CAS server. Imagine user now has the capability of controlling both request_size and minibatch_size, the user can easily occupy the full GPU usage on our Berlin GPU server. It can easily make our GPU OOM by setting large request_size and minibatch_size

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In C-S architecture, one should not aim to expose every server args to client, it is very risky.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that makes sense. Then we need to update the document about how to control batch size.

@numb3r3 numb3r3 requested a review from hanxiao June 10, 2022 10:58
@@ -187,7 +184,7 @@ def _get_post_payload(self, content, kwargs):
return dict(
on='/',
inputs=self._iter_doc(content),
request_size=kwargs.get('batch_size', 8),
request_size=kwargs.get('batch_size', 32),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
request_size=kwargs.get('batch_size', 32),
request_size=kwargs.get('batch_size', 8),

@numb3r3 numb3r3 merged commit e022bd4 into main Jun 13, 2022
@numb3r3 numb3r3 deleted the feat-traversal-paths branch June 13, 2022 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants