Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow rank driver access info in tags #1718

Merged
merged 23 commits into from
Feb 16, 2021
Merged

Conversation

JoanFM
Copy link
Member

@JoanFM JoanFM commented Jan 17, 2021

Changes introduced
Right now, the Driver uses required_keys to extract the metainformation from Matches and Query to pass to the Ranker Executor.

However, there is no way the Executor can have access to the metainfo from tags. The user could choose to select tags but it would get all of them.

I propose to allow the user to access fields from tags using required_keys as tags_* (with a similar syntax as in QueryLangDriver)

TODO

  • Allow the executors to have 2 sets of required_keys, 1 for queries and 1 for matches.
  • Have these requred_keys None in the BaseClasses

@hanxiao @nan-wang @maximilianwerk does this feel like the way to go? I think this would be good for the Ranker Executor abstraction

@JoanFM JoanFM requested a review from a team as a code owner January 17, 2021 18:42
@codecov
Copy link

codecov bot commented Jan 17, 2021

Codecov Report

Merging #1718 (f45d71e) into master (d153381) will decrease coverage by 28.30%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #1718       +/-   ##
===========================================
- Coverage   86.52%   58.21%   -28.31%     
===========================================
  Files         148      148               
  Lines        7093     7093               
===========================================
- Hits         6137     4129     -2008     
- Misses        956     2964     +2008     
Impacted Files Coverage Δ
jina/executors/rankers/__init__.py 58.97% <ø> (-33.34%) ⬇️
jina/parsers/ping.py 0.00% <0.00%> (-100.00%) ⬇️
jina/docker/helper.py 0.00% <0.00%> (-100.00%) ⬇️
jina/parsers/hub/new.py 0.00% <0.00%> (-100.00%) ⬇️
jina/parsers/hub/list.py 0.00% <0.00%> (-100.00%) ⬇️
jina/parsers/hub/build.py 0.00% <0.00%> (-100.00%) ⬇️
jina/parsers/hub/login.py 0.00% <0.00%> (-100.00%) ⬇️
jina/parsers/optimizer.py 0.00% <0.00%> (-100.00%) ⬇️
jina/parsers/helloworld.py 0.00% <0.00%> (-100.00%) ⬇️
jina/helloworld/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
... and 88 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d153381...f45d71e. Read the comment docs.

@JoanFM JoanFM marked this pull request as draft January 17, 2021 18:51
@github-actions
Copy link

github-actions bot commented Jan 17, 2021

Latency summary

Current PR yields:

  • 😶 index QPS at 1105, delta to last 3 avg.: +1%
  • 🐢🐢 query QPS at 18, delta to last 3 avg.: -6%

Breakdown

Version Index QPS Query QPS
current 1105 18
1.0.1 1092 18
1.0.0 1085 20

Backed by latency-tracking. Further commits will update this comment.

Copy link
Member

@nan-wang nan-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea.

jina/executors/rankers/__init__.py Outdated Show resolved Hide resolved
jina/executors/rankers/__init__.py Outdated Show resolved Hide resolved
JoanFM and others added 2 commits January 21, 2021 15:20
Co-authored-by: Nan Wang <nan.wang@jina.ai>
Co-authored-by: Nan Wang <nan.wang@jina.ai>
@JoanFM
Copy link
Member Author

JoanFM commented Jan 21, 2021

I like this idea.

My doubt now is if we should have the executor take all the tags and use them or we should do this trick so that the complexity is hidden?

@nan-wang
Copy link
Member

I like this idea.

My doubt now is if we should have the executor take all the tags and use them or we should do this trick so that the complexity is hidden?

required_keys is only used in the Chunk2DocRanker, right?

@JoanFM
Copy link
Member Author

JoanFM commented Jan 21, 2021

I like this idea.

My doubt now is if we should have the executor take all the tags and use them or we should do this trick so that the complexity is hidden?

required_keys is only used in the Chunk2DocRanker, right?

In any Ranker.

I also would like to have 2 set of keys, one for matches and one for queries

@JoanFM
Copy link
Member Author

JoanFM commented Feb 15, 2021

What do you think @nan-wang about this? is this better than just extracting the tags from document and have the executor developer access it from tags?

@JoanFM
Copy link
Member Author

JoanFM commented Feb 15, 2021

See PR on LightGBMRanker here jina-ai/jina-hub#3670

jina/types/document/__init__.py Outdated Show resolved Hide resolved
jina/types/document/__init__.py Outdated Show resolved Hide resolved
tests/unit/types/document/test_document.py Outdated Show resolved Hide resolved
tests/unit/types/document/test_document.py Show resolved Hide resolved
@JoanFM JoanFM requested a review from nan-wang February 15, 2021 11:33
@jina-bot jina-bot added size/M and removed size/S labels Feb 15, 2021
@JoanFM JoanFM closed this Feb 15, 2021
@JoanFM JoanFM reopened this Feb 15, 2021
nan-wang
nan-wang previously approved these changes Feb 15, 2021
Copy link
Member

@nan-wang nan-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM👍

jina/types/document/__init__.py Show resolved Hide resolved
jina/types/document/__init__.py Outdated Show resolved Hide resolved
nan-wang
nan-wang previously approved these changes Feb 16, 2021
Copy link
Member

@nan-wang nan-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM👍

Copy link
Member

@maximilianwerk maximilianwerk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please catch, if dunder_get raises an Exception.

"""
return {k: getattr(self, k) for k in args if hasattr(self, k)}

return {k: getattr(self, k) if hasattr(self, k) else dunder_get(self._pb_body, k) for k in args}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should catch & log, when dunder_get fails here. Otherwise, one malformed document in the index can corrupt a lot of rankings/searches.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the semantics are directly that it returns a None, I can make it fail if it makes us more secure

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The errors caused by a malformed document should not be caught by the Document type. Logging in a primitive type is also strange.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DocumentSet does warning in _extract_docs().

I am not sure what is the best option, so whatever we agree works for me.

What agreement we take @nan-wang @maximilianwerk ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've no strong opinion of this. I'm ok with the following codes. @maximilianwerk

            try:
                value = dunder_get(self._pb_body, k)
                ret[k] = value
                continue
            except (AttributeError, ValueError):
                defaultlogger.warning('some warning message')
                ret[k] = None

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I am sorry, I should have read the tests in the first place. I believe it can only be an AttributeError, non? See

    with pytest.raises(AttributeError):
        d.get_attrs(*['inexistant'])

I'd rather catch it, to be honest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the updated version

Copy link
Member

@nan-wang nan-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM👍

@nan-wang nan-wang merged commit 5bc4955 into master Feb 16, 2021
@nan-wang nan-wang deleted the rank-access-tags-info branch February 16, 2021 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants