feat: allow rank driver access info in tags #1718

JoanFM · 2021-01-17T18:42:00Z

Changes introduced
Right now, the Driver uses required_keys to extract the metainformation from Matches and Query to pass to the Ranker Executor.

However, there is no way the Executor can have access to the metainfo from tags. The user could choose to select tags but it would get all of them.

I propose to allow the user to access fields from tags using required_keys as tags_* (with a similar syntax as in QueryLangDriver)

TODO

Allow the executors to have 2 sets of required_keys, 1 for queries and 1 for matches.
Have these requred_keys None in the BaseClasses

@hanxiao @nan-wang @maximilianwerk does this feel like the way to go? I think this would be good for the Ranker Executor abstraction

codecov · 2021-01-17T18:46:41Z

Codecov Report

Merging #1718 (f45d71e) into master (d153381) will decrease coverage by 28.30%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           master    #1718       +/-   ##
===========================================
- Coverage   86.52%   58.21%   -28.31%     
===========================================
  Files         148      148               
  Lines        7093     7093               
===========================================
- Hits         6137     4129     -2008     
- Misses        956     2964     +2008

Impacted Files	Coverage Δ
jina/executors/rankers/__init__.py	`58.97% <ø> (-33.34%)`	⬇️
jina/parsers/ping.py	`0.00% <0.00%> (-100.00%)`	⬇️
jina/docker/helper.py	`0.00% <0.00%> (-100.00%)`	⬇️
jina/parsers/hub/new.py	`0.00% <0.00%> (-100.00%)`	⬇️
jina/parsers/hub/list.py	`0.00% <0.00%> (-100.00%)`	⬇️
jina/parsers/hub/build.py	`0.00% <0.00%> (-100.00%)`	⬇️
jina/parsers/hub/login.py	`0.00% <0.00%> (-100.00%)`	⬇️
jina/parsers/optimizer.py	`0.00% <0.00%> (-100.00%)`	⬇️
jina/parsers/helloworld.py	`0.00% <0.00%> (-100.00%)`	⬇️
jina/helloworld/__init__.py	`0.00% <0.00%> (-100.00%)`	⬇️
... and 88 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d153381...f45d71e. Read the comment docs.

github-actions · 2021-01-17T18:53:51Z

Latency summary

Current PR yields:

😶 index QPS at 1105, delta to last 3 avg.: +1%
🐢🐢 query QPS at 18, delta to last 3 avg.: -6%

Breakdown

Version	Index QPS	Query QPS
current	1105	18
`1.0.1`	1092	18
`1.0.0`	1085	20

Backed by latency-tracking. Further commits will update this comment.

nan-wang

I like this idea.

jina/executors/rankers/__init__.py

Co-authored-by: Nan Wang <nan.wang@jina.ai>

JoanFM · 2021-01-21T14:22:18Z

I like this idea.

My doubt now is if we should have the executor take all the tags and use them or we should do this trick so that the complexity is hidden?

nan-wang · 2021-01-21T15:43:13Z

I like this idea.

My doubt now is if we should have the executor take all the tags and use them or we should do this trick so that the complexity is hidden?

required_keys is only used in the Chunk2DocRanker, right?

JoanFM · 2021-01-21T15:51:33Z

I like this idea.

My doubt now is if we should have the executor take all the tags and use them or we should do this trick so that the complexity is hidden?

required_keys is only used in the Chunk2DocRanker, right?

In any Ranker.

I also would like to have 2 set of keys, one for matches and one for queries

…s-info

…rank-access-tags-info

JoanFM · 2021-02-15T09:21:52Z

What do you think @nan-wang about this? is this better than just extracting the tags from document and have the executor developer access it from tags?

JoanFM · 2021-02-15T09:23:40Z

See PR on LightGBMRanker here jina-ai/jina-hub#3670

jina/types/document/__init__.py

tests/unit/types/document/test_document.py

Co-authored-by: Nan Wang <nan.wang@jina.ai>

… into rank-access-tags-info

nan-wang

LGTM👍

…cess-tags-info

jina/types/document/__init__.py

nan-wang

LGTM👍

…cess-tags-info

maximilianwerk

Please catch, if dunder_get raises an Exception.

maximilianwerk · 2021-02-16T07:12:02Z

jina/types/document/__init__.py

        """
-        return {k: getattr(self, k) for k in args if hasattr(self, k)}
+
+        return {k: getattr(self, k) if hasattr(self, k) else dunder_get(self._pb_body, k) for k in args}


We should catch & log, when dunder_get fails here. Otherwise, one malformed document in the index can corrupt a lot of rankings/searches.

the semantics are directly that it returns a None, I can make it fail if it makes us more secure

The errors caused by a malformed document should not be caught by the Document type. Logging in a primitive type is also strange.

DocumentSet does warning in _extract_docs().

I am not sure what is the best option, so whatever we agree works for me.

What agreement we take @nan-wang @maximilianwerk ?

I've no strong opinion of this. I'm ok with the following codes. @maximilianwerk

try: value = dunder_get(self._pb_body, k) ret[k] = value continue except (AttributeError, ValueError): defaultlogger.warning('some warning message') ret[k] = None

Ah I am sorry, I should have read the tests in the first place. I believe it can only be an AttributeError, non? See

with pytest.raises(AttributeError): d.get_attrs(*['inexistant'])

I'd rather catch it, to be honest.

see the updated version

nan-wang

LGTM👍

feat: allow rank driver access info in tags

48208e4

JoanFM requested a review from a team as a code owner January 17, 2021 18:42

JoanFM requested review from CatStark and yuanbit January 17, 2021 18:42

jina-bot added size/S area/core This issue/PR affects the core codebase area/testing This issue/PR affects testing component/driver component/executor component/type executor/ranker labels Jan 17, 2021

JoanFM marked this pull request as draft January 17, 2021 18:51

fix: consider possibility no required keys

791e197

JoanFM force-pushed the rank-access-tags-info branch from 9a8fa31 to 791e197 Compare January 17, 2021 18:54

JoanFM requested review from hanxiao, maximilianwerk, nan-wang and florian-hoenicke and removed request for CatStark and yuanbit January 17, 2021 19:18

nan-wang requested changes Jan 21, 2021

View reviewed changes

jina/executors/rankers/__init__.py Outdated Show resolved Hide resolved

jina/executors/rankers/__init__.py Outdated Show resolved Hide resolved

JoanFM and others added 2 commits January 21, 2021 15:20

refactor: update jina/executors/rankers/__init__.py

ad4cfe2

Co-authored-by: Nan Wang <nan.wang@jina.ai>

refactor: update jina/executors/rankers/__init__.py

2637651

Co-authored-by: Nan Wang <nan.wang@jina.ai>

JoanFM added 2 commits January 21, 2021 18:07

Merge branch 'master' of github.com:jina-ai/jina into rank-access-tag…

74c7a00

…s-info

Merge branch 'rank-access-tags-info' of github.com:jina-ai/jina into …

5c04d60

…rank-access-tags-info

nan-wang reviewed Feb 15, 2021

View reviewed changes

jina/types/document/__init__.py Outdated Show resolved Hide resolved

jina/types/document/__init__.py Outdated Show resolved Hide resolved

tests/unit/types/document/test_document.py Outdated Show resolved Hide resolved

tests/unit/types/document/test_document.py Show resolved Hide resolved

JoanFM and others added 2 commits February 15, 2021 10:26

fix: update jina/types/document/__init__.py

6269dcb

Co-authored-by: Nan Wang <nan.wang@jina.ai>

fix: access only using dunder_get

75fa430

JoanFM requested a review from nan-wang February 15, 2021 11:33

Merge branch 'rank-access-tags-info' of https://github.com/jina-ai/jina…

4836534

… into rank-access-tags-info

JoanFM force-pushed the rank-access-tags-info branch from b5f3fb9 to 4836534 Compare February 15, 2021 12:15

jina-bot added size/M and removed size/S labels Feb 15, 2021

JoanFM closed this Feb 15, 2021

JoanFM reopened this Feb 15, 2021

nan-wang previously approved these changes Feb 15, 2021

View reviewed changes

JoanFM added 2 commits February 15, 2021 15:57

Merge branch 'master' of https://github.com/jina-ai/jina into rank-ac…

cdfc734

…cess-tags-info

fix: fix hub build io test

3362e9c

JoanFM dismissed nan-wang’s stale review via 3362e9c February 15, 2021 15:49

JoanFM requested a review from nan-wang February 15, 2021 16:11

florian-hoenicke requested changes Feb 15, 2021

View reviewed changes

jina/types/document/__init__.py Show resolved Hide resolved

jina/types/document/__init__.py Outdated Show resolved Hide resolved

nan-wang previously approved these changes Feb 16, 2021

View reviewed changes

Merge branch 'master' of https://github.com/jina-ai/jina into rank-ac…

b333c1f

…cess-tags-info

JoanFM dismissed nan-wang’s stale review via a481a44 February 16, 2021 07:06

JoanFM requested review from florian-hoenicke and nan-wang February 16, 2021 07:07

maximilianwerk requested changes Feb 16, 2021

View reviewed changes

fix: fix how to get attrs

f45d71e

JoanFM force-pushed the rank-access-tags-info branch from a481a44 to f45d71e Compare February 16, 2021 09:41

JoanFM requested a review from maximilianwerk February 16, 2021 11:42

nan-wang approved these changes Feb 16, 2021

View reviewed changes

nan-wang merged commit 5bc4955 into master Feb 16, 2021

nan-wang deleted the rank-access-tags-info branch February 16, 2021 13:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: allow rank driver access info in tags #1718

feat: allow rank driver access info in tags #1718

JoanFM commented Jan 17, 2021 •

edited

codecov bot commented Jan 17, 2021 •

edited

github-actions bot commented Jan 17, 2021 •

edited

nan-wang left a comment

JoanFM commented Jan 21, 2021

nan-wang commented Jan 21, 2021

JoanFM commented Jan 21, 2021

JoanFM commented Feb 15, 2021

JoanFM commented Feb 15, 2021

nan-wang left a comment

nan-wang left a comment

maximilianwerk left a comment

maximilianwerk Feb 16, 2021

JoanFM Feb 16, 2021

nan-wang Feb 16, 2021

JoanFM Feb 16, 2021

nan-wang Feb 16, 2021

maximilianwerk Feb 16, 2021

JoanFM Feb 16, 2021

nan-wang left a comment

feat: allow rank driver access info in tags #1718

feat: allow rank driver access info in tags #1718

Conversation

JoanFM commented Jan 17, 2021 • edited

codecov bot commented Jan 17, 2021 • edited

Codecov Report

github-actions bot commented Jan 17, 2021 • edited

Latency summary

Breakdown

nan-wang left a comment

Choose a reason for hiding this comment

JoanFM commented Jan 21, 2021

nan-wang commented Jan 21, 2021

JoanFM commented Jan 21, 2021

JoanFM commented Feb 15, 2021

JoanFM commented Feb 15, 2021

nan-wang left a comment

Choose a reason for hiding this comment

nan-wang left a comment

Choose a reason for hiding this comment

maximilianwerk left a comment

Choose a reason for hiding this comment

maximilianwerk Feb 16, 2021

Choose a reason for hiding this comment

JoanFM Feb 16, 2021

Choose a reason for hiding this comment

nan-wang Feb 16, 2021

Choose a reason for hiding this comment

JoanFM Feb 16, 2021

Choose a reason for hiding this comment

nan-wang Feb 16, 2021

Choose a reason for hiding this comment

maximilianwerk Feb 16, 2021

Choose a reason for hiding this comment

JoanFM Feb 16, 2021

Choose a reason for hiding this comment

nan-wang left a comment

Choose a reason for hiding this comment

JoanFM commented Jan 17, 2021 •

edited

codecov bot commented Jan 17, 2021 •

edited

github-actions bot commented Jan 17, 2021 •

edited