Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encapsulate source filtering #91127

Merged
merged 27 commits into from
Oct 31, 2022
Merged

Encapsulate source filtering #91127

merged 27 commits into from
Oct 31, 2022

Conversation

romseygeek
Copy link
Contributor

We have two implementations of source filtering, one based on Map filtering
and used by SourceLookup, and one based on jackson stream filtering used
in Get and SourceFieldMapper. There are cases when stream filtering could
be usefully applied to source in the fetch phase, for example if the source is
not being used as a Map by any other subphase; and correspondingly if a
source has already been parsed to a Map then map filtering will generally
be more efficient than stream filtering that ends up re-parsing the bytes.

This commit encapsulates all of this filtering logic into a single SourceFilter
class, which can be passed to the filter method on Source. Different
Source implementations can choose to use map or stream filtering
depending on whether or not they have map or bytes representations
available.

@romseygeek romseygeek added :Search/Search Search-related issues that do not fall into other categories >refactoring v8.6.0 labels Oct 26, 2022
@romseygeek romseygeek self-assigned this Oct 26, 2022
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Oct 26, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good. But I'm kind of cloudy this morning so maybe don't take me at my word.

*/
public final class SourceFilter {

private static final Function<Map<String, Object>, Map<String, Object>> EMPTY_FILTER = v -> v;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may find it slightly more readable to have an empty version of this class. I see you are using this EMPTY_FILTER as a sentinel to encode the "do nothing" behavior. But when I read this class I thought "oh, I should tell Alan that the JVM will generate this constant for him. Maybe he doesn't know." Then I read further down and saw it was a sentinel. Which is fine. I thought "oh, he should leave a comment on this so no one else wonders why it's a constant instead of inline." And then I thought - oh, maybe it'd be more clear if this had an implementation that always did nothing. I dunno if that's actually true, but maybe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah as a sentinel value it's a bit hinky. I think I originally added it because we had some places looking for Map<String, ?> and some looking for Map<String, Object> and then it became convenient to use to check for no-op filtering. But it makes things difficult to read. I've replaced with just a simple boolean empty field. I played with having an empty impl but that means turning it into an interface or an abstract base class and that just feels too complicated - there only needs to be a single implementation, so a final class makes more sense to me.

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@romseygeek romseygeek merged commit ba7a219 into elastic:main Oct 31, 2022
@romseygeek romseygeek deleted the source/filter branch October 31, 2022 16:48
mccheah added a commit to mccheah/elasticsearch that referenced this pull request Aug 27, 2024
This fixes the specific case outlined in the above linked Github issue - namely this:

Document: { "myArray": [] }
Source filter: myArray.myField
Expected filtered result: { "myArray": [] }
Actual filtered result: {}

This broke when we switched to from map-based source filtering to Jackson-streaming-based
source filtering in some code paths - see elastic#91127. Map-based source filtering (using the
automata, etc.) correctly included the empty array from the parent. Unfortunately, it's
not clear when these two code paths for source filtering diverge in behavior.

Note also the long comment in FilterPathBasedFilter. It's not clear what the correct
behavior is when we are filtering out all contents of arrays via source filter path
exclusion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>refactoring :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v8.6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants