-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: allow rank driver access info in tags #1718
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1718 +/- ##
===========================================
- Coverage 86.52% 58.21% -28.31%
===========================================
Files 148 148
Lines 7093 7093
===========================================
- Hits 6137 4129 -2008
- Misses 956 2964 +2008
Continue to review full report at Codecov.
|
Latency summaryCurrent PR yields:
Breakdown
Backed by latency-tracking. Further commits will update this comment. |
9a8fa31
to
791e197
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea.
Co-authored-by: Nan Wang <nan.wang@jina.ai>
Co-authored-by: Nan Wang <nan.wang@jina.ai>
My doubt now is if we should have the |
|
In any Ranker. I also would like to have 2 set of keys, one for |
What do you think @nan-wang about this? is this better than just extracting the tags from document and have the executor developer access it from tags? |
See PR on LightGBMRanker here jina-ai/jina-hub#3670 |
Co-authored-by: Nan Wang <nan.wang@jina.ai>
… into rank-access-tags-info
b5f3fb9
to
4836534
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM👍
…cess-tags-info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please catch, if dunder_get
raises an Exception.
jina/types/document/__init__.py
Outdated
""" | ||
return {k: getattr(self, k) for k in args if hasattr(self, k)} | ||
|
||
return {k: getattr(self, k) if hasattr(self, k) else dunder_get(self._pb_body, k) for k in args} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should catch & log, when dunder_get
fails here. Otherwise, one malformed document in the index can corrupt a lot of rankings/searches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the semantics are directly that it returns a None, I can make it fail if it makes us more secure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The errors caused by a malformed document should not be caught by the Document
type. Logging in a primitive type is also strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DocumentSet
does warning in _extract_docs()
.
I am not sure what is the best option, so whatever we agree works for me.
What agreement we take @nan-wang @maximilianwerk ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've no strong opinion of this. I'm ok with the following codes. @maximilianwerk
try:
value = dunder_get(self._pb_body, k)
ret[k] = value
continue
except (AttributeError, ValueError):
defaultlogger.warning('some warning message')
ret[k] = None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I am sorry, I should have read the tests in the first place. I believe it can only be an AttributeError
, non? See
with pytest.raises(AttributeError):
d.get_attrs(*['inexistant'])
I'd rather catch it, to be honest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see the updated version
a481a44
to
f45d71e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM👍
Changes introduced
Right now, the Driver uses
required_keys
to extract the metainformation from Matches and Query to pass to theRanker
Executor.However, there is no way the
Executor
can have access to themetainfo
fromtags
. The user could choose to selecttags
but it would get all of them.I propose to allow the user to access fields from tags using
required_keys
astags_*
(with a similar syntax as in QueryLangDriver)TODO
@hanxiao @nan-wang @maximilianwerk does this feel like the way to go? I think this would be good for the Ranker Executor abstraction