feat: add batching support for rankers #1467

deepampatel · 2020-12-15T17:10:07Z

Previously i had tried to merge all batching decorators into one, after your comments below i have reverted the commit back and added a new @batching_ranker_input decorator only for ranker executors. The @batching and @batching_mutli_input are untouched.

JoanFM

Hey @deepampatel ,

Thank you very much for your contribution.

As a comment, I would just recommend do not try to fit every type of batching pattern in a single function since it can become really unreadable and unmantainable. So if you feel like you need a special function for some special executor or type of input you are welcome to add it!

Great work!

JoanFM · 2020-12-15T17:12:08Z

jina/executors/decorators.py


+            class MultiModalExecutor:
+
+                @batching(batch_size = 64, slice_on=[1,2])


Just as a comment, there is a special batching decorator for MultiModalExecutor itself.

Hey @deepampatel ,

Thank you very much for your contribution.

As a comment, I would just recommend do not try to fit every type of batching pattern in a single function since it can become really unreadable and unmantainable. So if you feel like you need a special function for some special executor or type of input you are welcome to add it!

Great work!

@JoanFM Thanks for the quick feedback. I agree with you on the special function for special executor class. i will add a separate class for ranker input.
WDYT about having slice_on parameter as a Union[int, List[int]] instead of just int value in @batching_multi_input . Dont have any particular example in mind rn where this can be used but this thought came up when i was trying to merge both decorators.

jina/executors/decorators.py

github-actions · 2020-12-27T13:17:48Z

Latency summary

Current PR yields:

😶 index QPS at 1756, delta to last 3 avg.: -3%
😶 query QPS at 31, delta to last 3 avg.: -4%

Breakdown

Version	Index QPS	Query QPS
current	1756	31
`0.8.16`	1796	32
`0.8.15`	1812	32
`0.8.14`	1822	32

Backed by latency-tracking. Further commits will update this comment.

…-support-for-rankers

JoanFM · 2020-12-27T14:56:20Z

This commit is a feat but also a refactor commit .

feat: add support for batching dictionary data, so it can be used to add batching support for rankers.

wip: refactor @batching and @batching_multi_input into a single decorator

TODO : add support for label_on and ordinal_idx_arg

Now I see the description, please unless it looks very very clean, avoid having batching and barching_multi_input inside a single decorator. it was split for better readability

deepampatel · 2020-12-27T15:21:44Z

This commit is a feat but also a refactor commit .

feat: add support for batching dictionary data, so it can be used to add batching support for rankers.

wip: refactor @batching and @batching_multi_input into a single decorator

TODO : add support for label_on and ordinal_idx_arg

Now I see the description, please unless it looks very very clean, avoid having batching and barching_multi_input inside a single decorator. it was split for better readability

Previously i had tried to merge all batching decorators into one, after your comments above i have reverted the commit back and added a new @batching_ranker_input decorator only for ranker executors. The @batchingand @batching_multi_input are untouched. @JoanFM

JoanFM · 2020-12-28T17:12:41Z

jina/executors/decorators.py

+                         batch_size: Union[int, Callable] = None,
+                         num_batch: Optional[int] = None,
+                         split_over_axis: int = 0,
+                         merge_over_axis: int = 0,


since it is only for ranker, this will not be different than 0, so let's remove this. Also the split_over_axis (will it be different than 0?

the same for slice_on?

slice_on i feel can be a configurable parameter.

at least slice_on should be set to the default that the rankers would use

JoanFM · 2020-12-28T17:14:12Z

jina/executors/decorators.py

+            else:
+                if isinstance(_slice_on, int):
+                    _slice_on = [_slice_on]
+                    _num_data = 1


_num_data is not used after here

codecov · 2020-12-29T17:27:16Z

Codecov Report

Merging #1467 (9e28ecc) into master (3f5a588) will increase coverage by 0.32%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1467      +/-   ##
==========================================
+ Coverage   84.37%   84.69%   +0.32%     
==========================================
  Files         108      108              
  Lines        6311     6424     +113     
==========================================
+ Hits         5325     5441     +116     
+ Misses        986      983       -3

Impacted Files	Coverage Δ
jina/executors/decorators.py	`92.22% <100.00%> (+1.31%)`	⬆️
jina/logging/profile.py	`68.33% <0.00%> (-0.58%)`	⬇️
jina/drivers/craft.py	`100.00% <0.00%> (ø)`
jina/proto/jina_pb2.py	`100.00% <0.00%> (ø)`
jina/types/ndarray/generic.py	`100.00% <0.00%> (ø)`
jina/executors/evaluators/rank/recall.py	`100.00% <0.00%> (ø)`
jina/executors/evaluators/rank/precision.py	`100.00% <0.00%> (ø)`
jina/executors/encoders/helper.py
jina/executors/encoders/numeric/__init__.py	`42.42% <0.00%> (ø)`
jina/drivers/encode.py	`94.91% <0.00%> (+0.08%)`	⬆️
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e9a427b...9e28ecc. Read the comment docs.

JoanFM

There is a lot of code that is common (or can be made common) between batching_multi_input and batching_input_ranker. I am sure some helper functions can be made to unify a little bit of both

JoanFM · 2020-12-29T17:30:30Z

jina/executors/decorators.py

+            _num_data = num_data
+            if _num_data is not None:
+                if isinstance(_slice_on, List):
+                    raise ValueError(f'When using num_data in @batching_ranker_input, an integer value '


then please fix the slice_on type hint, since it suggests a List is accepted

I would just adapt the type hint and remove this check

Avoid this noisy part, consider num_data is never None (type hint suggests so)

I was thinking, slice_on and num_data when used together, allow you to batch consecutive parameters. There might be a case where you want to batch non - consecutive parameters, lets say [1,3], in those cases we can have slice_on as list of indices. So whenever slice_on is passed as a list, num_data (type hint needs to be updated to optional) cannot be used. wdyt?

Since there is not yet a case where we require this, I'd prefer to keep it simple

JoanFM · 2020-12-29T17:34:40Z

jina/executors/decorators.py

+                         batch_size: Union[int, Callable] = None,
+                         num_batch: Optional[int] = None,
+                         slice_on: Union[int, List[int]] = 2,
+                         num_data: int = None) -> Any:


num_data default is 3 I think for this case no?

out of self, query_meta, old_match_scores, match_meta we only want to batch old_match_scores . other two are meta which dont need to batched.

JoanFM · 2020-12-30T16:59:33Z

jina/executors/decorators.py

-                for idx,slice_idx in enumerate(_slice_on[1:]):
-                    batch_idx = next(data_iterators[idx+1])
-                    if yield_dict[idx+1]:
+                args[slice_on] = dict(batch) if yield_dict[0] else batch


why is yield_dict needed? what is batch returning?

batch returns a list of tuples of (key, value)

ok, I see. Maybe it is better to consider in batch_iterator the case where a dict is provided?

It feels that until obtaining the iterators the code with batching_multi_input may be merged?

JoanFM · 2020-12-30T19:01:08Z

jina/helper.py

@@ -107,7 +107,10 @@ def batch_iterator(data: Iterable[Any], batch_size: int, axis: int = 0,
        data = iter(data)
        # as iterator, there is no way to know the length of it
        while True:
-            chunk = tuple(islice(data, batch_size))
+            if yield_dict:


better to add before elif(data, Iterable) an elif(data, dict), like this it will work without the need of the yield_dict parameter.

data needs to be sliced before passing it to batch_iterator because we have num_batch that inturn sets the total_size . There are two options i think :

Pass total_size to batch_iterator and slice the data inside the batch_iterator

Slice it before passing it to batch_iterator and pass 'yield_dict`. In this case dict when sliced is converted to a iterable .
wdyt?

OK I see, let's keep passing yield_dict as argument so that we do not affect current behavior (the code is a little verbose already)

JoanFM · 2020-12-30T19:17:46Z

jina/executors/decorators.py

+            total_size = _get_total_size(full_data_size, b_size, num_batch)
+            final_result = []
+            yield_dict = [isinstance(args[slice_on + i], Dict) for i in range(0,num_data)]
+            data_iterators = [batch_iterator(_get_slice(args[slice_on + i],total_size), b_size , yield_dict=yield_dict[i]) for i in range(0, num_data)]


let's adapt this part together with batching_multi_input if possible. get_slice can be used also there right?. Also separating comma before total_size

Since both batch_multi_input and batch_ranker_input are almost similar, we can completely remove batch_ranker_input and update batching and batch_multi_input with get_slice and yield_dict to handle dictionary data. or keep batch_ranker_input separate?

yes lets try, but yield_dict can be avoided in regular batching. it can be set to default false in the batch_decorator

if we are removing batching_ranker_input maybe we should add dictionary support to both batching and batching_multi_input no?

maybe in the future, but now I would prefer to keep it conservative and just touch where it may be used

For rankers we actually need only one parameter old_match_scores to be batched. So we still need to use batch_multi_input ?

not true, you may want to batch also match meta

deepampatel · 2020-12-31T09:47:50Z

#1382

JoanFM · 2020-12-31T09:54:12Z

recheckcla

github-actions · 2020-12-31T09:54:25Z

Jina CLA check ✅ All Contributors have signed the CLA.

JoanFM

Hey @deepampatel ,

thanks for your great contribution, now we just need you to sign the CLA before merging the PR

deepampatel · 2020-12-31T10:00:52Z

I have read the CLA Document and I hereby sign the CLA

jina-bot added size/M area/core This issue/PR affects the core codebase area/testing This issue/PR affects testing component/executor executor/meta labels Dec 15, 2020

JoanFM reviewed Dec 15, 2020

View reviewed changes

feat: add batching_ranker_input decorator

d0fb777

deepampatel force-pushed the feat-add-batching-support-for-rankers branch from cc4a142 to d0fb777 Compare December 27, 2020 13:05

JoanFM reviewed Dec 27, 2020

View reviewed changes

jina/executors/decorators.py Outdated Show resolved Hide resolved

docs: update docstring

1366fde

Merge remote-tracking branch 'upstream/master' into feat-add-batching…

6ccaf0d

…-support-for-rankers

deepampatel marked this pull request as ready for review December 28, 2020 16:06

deepampatel requested a review from a team as a code owner December 28, 2020 16:06

deepampatel requested review from cristianmtr and JoanFM December 28, 2020 16:06

JoanFM requested changes Dec 28, 2020

View reviewed changes

refactor: batching_ranker_input remove unused parameters

d6f74fc

JoanFM requested changes Dec 29, 2020

View reviewed changes

JoanFM reviewed Dec 30, 2020

View reviewed changes

refactor: change slice_on to accept only int values

1918212

deepampatel force-pushed the feat-add-batching-support-for-rankers branch from 35d0a99 to 1918212 Compare December 30, 2020 17:10

refactor: add dict management to batch_iterator

350c431

jina-bot added the area/helper This issue/PR affects the helper functionality label Dec 30, 2020

JoanFM requested changes Dec 30, 2020

View reviewed changes

JoanFM reviewed Dec 30, 2020

View reviewed changes

refactor: remove batch_input_ranker

9e28ecc

jina-bot added size/S and removed size/M labels Dec 31, 2020

JoanFM approved these changes Dec 31, 2020

View reviewed changes

JoanFM requested changes Dec 31, 2020

View reviewed changes

JoanFM approved these changes Dec 31, 2020

View reviewed changes

JoanFM merged commit 4a2c744 into jina-ai:master Dec 31, 2020

jina-ai locked and limited conversation to collaborators Dec 31, 2020


		class MultiModalExecutor:

		@batching(batch_size = 64, slice_on=[1,2])

feat: add batching support for rankers #1467

feat: add batching support for rankers #1467

Conversation

deepampatel commented Dec 15, 2020 • edited Loading

JoanFM left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deepampatel Dec 15, 2020 • edited Loading

Choose a reason for hiding this comment

github-actions bot commented Dec 27, 2020 • edited Loading

Latency summary

Breakdown

JoanFM commented Dec 27, 2020

TODO : add support for label_on and ordinal_idx_arg

deepampatel commented Dec 27, 2020

TODO : add support for label_on and ordinal_idx_arg

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Dec 29, 2020 • edited Loading

Codecov Report

JoanFM left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deepampatel Dec 30, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deepampatel Dec 31, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deepampatel commented Dec 31, 2020

JoanFM commented Dec 31, 2020

github-actions bot commented Dec 31, 2020 • edited Loading

JoanFM left a comment

Choose a reason for hiding this comment

deepampatel commented Dec 31, 2020

deepampatel commented Dec 15, 2020 •

edited

Loading

deepampatel Dec 15, 2020 •

edited

Loading

github-actions bot commented Dec 27, 2020 •

edited

Loading

codecov bot commented Dec 29, 2020 •

edited

Loading

deepampatel Dec 30, 2020 •

edited

Loading

deepampatel Dec 31, 2020 •

edited

Loading

github-actions bot commented Dec 31, 2020 •

edited

Loading