-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encapsulate source filtering #91127
Encapsulate source filtering #91127
Conversation
Pinging @elastic/es-search (Team:Search) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good. But I'm kind of cloudy this morning so maybe don't take me at my word.
*/ | ||
public final class SourceFilter { | ||
|
||
private static final Function<Map<String, Object>, Map<String, Object>> EMPTY_FILTER = v -> v; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may find it slightly more readable to have an empty
version of this class. I see you are using this EMPTY_FILTER
as a sentinel to encode the "do nothing" behavior. But when I read this class I thought "oh, I should tell Alan that the JVM will generate this constant for him. Maybe he doesn't know." Then I read further down and saw it was a sentinel. Which is fine. I thought "oh, he should leave a comment on this so no one else wonders why it's a constant instead of inline." And then I thought - oh, maybe it'd be more clear if this had an implementation that always did nothing. I dunno if that's actually true, but maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah as a sentinel value it's a bit hinky. I think I originally added it because we had some places looking for Map<String, ?>
and some looking for Map<String, Object>
and then it became convenient to use to check for no-op filtering. But it makes things difficult to read. I've replaced with just a simple boolean empty
field. I played with having an empty impl but that means turning it into an interface or an abstract base class and that just feels too complicated - there only needs to be a single implementation, so a final class makes more sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
This fixes the specific case outlined in the above linked Github issue - namely this: Document: { "myArray": [] } Source filter: myArray.myField Expected filtered result: { "myArray": [] } Actual filtered result: {} This broke when we switched to from map-based source filtering to Jackson-streaming-based source filtering in some code paths - see elastic#91127. Map-based source filtering (using the automata, etc.) correctly included the empty array from the parent. Unfortunately, it's not clear when these two code paths for source filtering diverge in behavior. Note also the long comment in FilterPathBasedFilter. It's not clear what the correct behavior is when we are filtering out all contents of arrays via source filter path exclusion.
We have two implementations of source filtering, one based on Map filtering
and used by SourceLookup, and one based on jackson stream filtering used
in Get and SourceFieldMapper. There are cases when stream filtering could
be usefully applied to source in the fetch phase, for example if the source is
not being used as a Map by any other subphase; and correspondingly if a
source has already been parsed to a Map then map filtering will generally
be more efficient than stream filtering that ends up re-parsing the bytes.
This commit encapsulates all of this filtering logic into a single SourceFilter
class, which can be passed to the
filter
method onSource
. DifferentSource
implementations can choose to use map or stream filteringdepending on whether or not they have map or bytes representations
available.