Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory efficient xcontent filtering #77154

Merged
merged 13 commits into from
Sep 13, 2021
Merged

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Sep 1, 2021

I found myself needing support for something like filter_path on
XContentParser. It was simple enough to plug it in so I did. Then I
realized that it might offer more memory efficient source filtering
(#25168) so I put together a quick benchmark comparing the source
filtering that we do in _search. This doesn't actually make the change
to the source fetching phase - it just benchmarks how it'd be.

Filtering using the parser is about 33% faster than how we filter now
when you select a single field from a 300 byte document:

Benchmark                                          (excludes)  (includes)  (source)  Mode  Cnt     Score    Error  Units
FetchSourcePhaseBenchmark.filterObjects                           message     short  avgt    5  2360.342 ±  4.715  ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder                 message     short  avgt    5  2010.278 ± 15.042  ns/op
FetchSourcePhaseBenchmark.filterXContentOnParser                  message     short  avgt    5  1588.446 ± 18.593  ns/op

The top line is the way we filter now. The middle line is adding a
filter to XContentBuilder - something we can do right now without any
of my plumbing work. The bottom line is filtering on the parser,
requiring all the new plumbing.

This isn't particularly impresive. 33% sounds great! But 700
nanoseconds per document isn't going to cut into anyone's search times.
If you fetch a thousand docuents that's .7 milliseconds of savings.

But we mostly advise folks to use source filtering on fetch when the
source is large and you only want a small part of it. So I tried when
the source is about 4.3kb and you want a single field:

Benchmark                                          (excludes)  (includes)      (source)  Mode  Cnt     Score     Error  Units
FetchSourcePhaseBenchmark.filterObjects                           message  one_4k_field  avgt    5  5957.128 ± 117.402  ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder                 message  one_4k_field  avgt    5  4999.073 ±  96.003  ns/op
FetchSourcePhaseBenchmark.filterXContentonParser                  message  one_4k_field  avgt    5  3261.478 ±  48.879  ns/op

That's 45% faster. Put another way, 2.7 microseconds a document. Not
bad!

But have a look at how things come out when you want a single field from
a 4 megabyte document:

Benchmark                                          (excludes)  (includes)      (source)  Mode  Cnt        Score        Error  Units
FetchSourcePhaseBenchmark.filterObjects                           message  one_4m_field  avgt    5  8266343.036 ± 176197.077  ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder                 message  one_4m_field  avgt    5  6227560.013 ±  68306.318  ns/op
FetchSourcePhaseBenchmark.filterXContentonParser                  message  one_4m_field  avgt    5  1617153.472 ±  80164.547  ns/op

These documents are very large. I've encountered documents like them in
real life, but they've always been the outlier for me. But a 6.5
millisecond per document savings ain't anything to sneeze at.

Take a look at what you get when I turn on gc metrics:

FetchSourcePhaseBenchmark.filterObjects                          message  one_4m_field  avgt    5   7036097.561 ±  84721.312   ns/op
FetchSourcePhaseBenchmark.filterObjects:·gc.alloc.rate           message  one_4m_field  avgt    5      2166.613 ±     25.975  MB/sec
FetchSourcePhaseBenchmark.filterXContentOnBuilder                message  one_4m_field  avgt    5   6104595.992 ±  55445.508   ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder:·gc.alloc.rate message  one_4m_field  avgt    5      2496.978 ±     22.650  MB/sec
FetchSourcePhaseBenchmark.filterXContentonParser                 message  one_4m_field  avgt    5   1614980.846 ±  31716.956   ns/op
FetchSourcePhaseBenchmark.filterXContentonParser:·gc.alloc.rate  message  one_4m_field  avgt    5         1.755 ±      0.035  MB/sec

I found myself needing support for something like `filter_path` on
`XContentParser`. It was simple enough to plug it in so I did. Then I
realized that it might offer more memory efficient source filtering
(elastic#25168) so I put together a quick benchmark comparing the source
filtering that we do in `_search`.

Filtering using the parser is about 33% faster than how we filter now
when you select a single field from a 300 byte document:
```
Benchmark                                          (excludes)  (includes)  (source)  Mode  Cnt     Score    Error  Units
FetchSourcePhaseBenchmark.filterObjects                           message     short  avgt    5  2360.342 ±  4.715  ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder                 message     short  avgt    5  2010.278 ± 15.042  ns/op
FetchSourcePhaseBenchmark.filterXContentOnParser                  message     short  avgt    5  1588.446 ± 18.593  ns/op
```

The top line is the way we filter now. The middle line is adding a
filter to `XContentBuilder` - something we can do right now without any
of my plumbing work. The bottom line is filtering on the parser,
requiring all the new plumbing.

This isn't particularly impresive. 33% *sounds* great! But 700
nanoseconds per document isn't going to cut into anyone's search times.
If you fetch a thousand docuents that's .7 milliseconds of savings.

But we mostly advise folks to use source filtering on fetch when the
source is large and you only want a small part of it. So I tried when
the source is about 4.3kb and you want a single field:
```
Benchmark                                          (excludes)  (includes)      (source)  Mode  Cnt     Score     Error  Units
FetchSourcePhaseBenchmark.filterObjects                           message  one_4k_field  avgt    5  5957.128 ± 117.402  ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder                 message  one_4k_field  avgt    5  4999.073 ±  96.003  ns/op
FetchSourcePhaseBenchmark.filterXContentonParser                  message  one_4k_field  avgt    5  3261.478 ±  48.879  ns/op
```

That's 45% faster. Put another way, 2.7 microseconds a document. Not
bad!

But have a look at how things come out when you want a single field from
a 4 *megabyte* document:
```
Benchmark                                          (excludes)  (includes)      (source)  Mode  Cnt        Score        Error  Units
FetchSourcePhaseBenchmark.filterObjects                           message  one_4m_field  avgt    5  8266343.036 ± 176197.077  ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder                 message  one_4m_field  avgt    5  6227560.013 ±  68306.318  ns/op
FetchSourcePhaseBenchmark.filterXContentonParser                  message  one_4m_field  avgt    5  1617153.472 ±  80164.547  ns/op
```

These documents are very large. I've encountered documents like them in
real life, but they've always been the outlier for me. But a 6.5
millisecond per document savings ain't anything to sneeze at.

Take a look at what you get when I turn on gc metrics:
```
FetchSourcePhaseBenchmark.filterObjects                          message  one_4m_field  avgt    5   7036097.561 ±  84721.312   ns/op
FetchSourcePhaseBenchmark.filterObjects:·gc.alloc.rate           message  one_4m_field  avgt    5      2166.613 ±     25.975  MB/sec
FetchSourcePhaseBenchmark.filterXContentOnBuilder                message  one_4m_field  avgt    5   6104595.992 ±  55445.508   ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder:·gc.alloc.rate message  one_4m_field  avgt    5      2496.978 ±     22.650  MB/sec
FetchSourcePhaseBenchmark.filterXContentonParser                 message  one_4m_field  avgt    5   1614980.846 ±  31716.956   ns/op
FetchSourcePhaseBenchmark.filterXContentonParser:·gc.alloc.rate  message  one_4m_field  avgt    5         1.755 ±      0.035  MB/sec
```
@nik9000 nik9000 changed the title Filter source on read Memory efficient xcontent filtering Sep 1, 2021
@nik9000
Copy link
Member Author

nik9000 commented Sep 1, 2021

I'm not sure where to go from here. I think the xcontent changes could be useful outside of source filtering and I don't think I really have time to land them into source filtering. As much as I love that sweet sweet memory savings I'd be worried about this for the same reasons that @tlrx were in #25168 (comment). Though I believe the savings from using the filtering on the reader are more substantial than in the linked issue - he's solution over there lines up fairly closely with my filterXContentOnBuilder implementation.

Like he said there - things like highlighting cause you to lose all the savings. Well, most of the savings at least. We could probably make them ride along with the same filtering. But that feels quite complex.

@nik9000
Copy link
Member Author

nik9000 commented Sep 1, 2021

Keep in mind, these microbenchmarks look good because they are microbenchmarks. I think before we can really know what this'd do we'd have to run it in rally with realistic documents.

@Override
public String toString() {
return "NO_MATCHING";
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added these for easier debugging.

@nik9000
Copy link
Member Author

nik9000 commented Sep 3, 2021

Like the benchmark implies, I think you could use this during the source fetch phase to speed up the fetches.

I've noticed that you get invalid json when you have a ** in your exclude filter. I imagine that if we plugged this into the fetch phase we'd want to either fix that or use the object based filtering in that case.

@nik9000 nik9000 added :Search/Search Search-related issues that do not fall into other categories v7.16.0 >non-issue labels Sep 9, 2021
@nik9000 nik9000 marked this pull request as ready for review September 9, 2021 14:57
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Sep 9, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@nik9000
Copy link
Member Author

nik9000 commented Sep 9, 2021

I've marked this as search/search and draft because it is going in to do search things. It's fairly general though. I'm not sure.

return filterOnBuilder(sample, includes, excludes);
}
FilterPath[] includesFilter = FilterPath.compile(includes);
return filterOnParser(sample, includesFilter, excludesFilter);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests that this enables are super comprehensive. It's kind of a shame to do it randomly though. We could probably use more subclasses or parametersized tests or something to run them all non-randomly. They are plenty fast.

super(xContentRegistry, deprecationHandler, restApiVersion);
JsonParser filtered = parser;
if (exclude != null) {
filtered = new FilteringParserDelegate(filtered, new FilterPathBasedFilter(exclude, false), true, true);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably fail in hasDoubleWildcard. Unless I can figure out a way to make it work!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this test.

@nik9000
Copy link
Member Author

nik9000 commented Sep 10, 2021

The double wildcard on exclude case has been fixed in Jackson 7.13: FasterXML/jackson-core#700

Turns out this is fixed in an unreleased version of jackson!
@jaymode
Copy link
Member

jaymode commented Sep 10, 2021

Drive by comment, this type of filtering may also exclude empty objects and arrays like what happens during response filtering, see #63842. I attempted to fix that but hit issues after opening the PR and never revisited it.

@nik9000
Copy link
Member Author

nik9000 commented Sep 10, 2021

Drive by comment, this type of filtering may also exclude empty objects and arrays like what happens during response filtering, see #63842. I attempted to fix that but hit issues after opening the PR and never revisited it.

Thanks! I hadn't realized that was a thing. This doesn't actually plug it into the fetch phase - only demonstrate that it'd be faster. It also rigs up the plumbing that I want for tsdb, which is totally my ulterior motive for this. But I'll check if it filters out empty unrelated objects anyway.

@nik9000
Copy link
Member Author

nik9000 commented Sep 10, 2021

Drive by comment, this type of filtering may also exclude empty objects and arrays like what happens during response filtering, see #63842. I attempted to fix that but hit issues after opening the PR and never revisited it.

Thanks! I hadn't realized that was a thing. This doesn't actually plug it into the fetch phase - only demonstrate that it'd be faster. It also rigs up the plumbing that I want for tsdb, which is totally my ulterior motive for this. But I'll check if it filters out empty unrelated objects anyway.

I've double checked this - the map based filtering that we're using now always throws out empty arrays. This is what the things look like:

curl -uelastic:password -HContent-Type:application/json -XDELETE localhost:9200/test
curl -uelastic:password -HContent-Type:application/json -XPOST localhost:9200/test/_doc?refresh -d'{"array": []}'
curl -uelastic:password -HContent-Type:application/json -XPOST 'localhost:9200/test/_search?pretty&_source=-**.whatever'
...
    "hits" : [
      {
        "_index" : "test",
        "_id" : "3Fly0XsBELTu-qwH8ixM",
        "_score" : 1.0,
        "_source" : { }
      }
    ]
...

So I think this change can only help here.

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nik9000
Copy link
Member Author

nik9000 commented Sep 13, 2021

Thanks @romseygeek! I've pulling in master and will merge when jenkins is happy.

For those following along at home this doesn't change the actual fetch phase - it just plumbs through the ability to filter an XContentParser, tests it, and adds a benchmark showing that it'd be wicked fast if we used it for the fetch phase. I need this off in tsdb land, but I

@nik9000
Copy link
Member Author

nik9000 commented Sep 13, 2021

The build failures want #77640 and #77643. Those are open but not yet merged.

@nik9000 nik9000 merged commit 7d55449 into elastic:master Sep 13, 2021
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Sep 13, 2021
I found myself needing support for something like `filter_path` on
`XContentParser`. It was simple enough to plug it in so I did. Then I
realized that it might offer more memory efficient source filtering
(elastic#25168) so I put together a quick benchmark comparing the source
filtering that we do in `_search`.

Filtering using the parser is about 33% faster than how we filter now
when you select a single field from a 300 byte document:
```
Benchmark                                          (excludes)  (includes)  (source)  Mode  Cnt     Score    Error  Units
FetchSourcePhaseBenchmark.filterObjects                           message     short  avgt    5  2360.342 ±  4.715  ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder                 message     short  avgt    5  2010.278 ± 15.042  ns/op
FetchSourcePhaseBenchmark.filterXContentOnParser                  message     short  avgt    5  1588.446 ± 18.593  ns/op
```

The top line is the way we filter now. The middle line is adding a
filter to `XContentBuilder` - something we can do right now without any
of my plumbing work. The bottom line is filtering on the parser,
requiring all the new plumbing.

This isn't particularly impresive. 33% *sounds* great! But 700
nanoseconds per document isn't going to cut into anyone's search times.
If you fetch a thousand docuents that's .7 milliseconds of savings.

But we mostly advise folks to use source filtering on fetch when the
source is large and you only want a small part of it. So I tried when
the source is about 4.3kb and you want a single field:
```
Benchmark                                          (excludes)  (includes)      (source)  Mode  Cnt     Score     Error  Units
FetchSourcePhaseBenchmark.filterObjects                           message  one_4k_field  avgt    5  5957.128 ± 117.402  ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder                 message  one_4k_field  avgt    5  4999.073 ±  96.003  ns/op
FetchSourcePhaseBenchmark.filterXContentonParser                  message  one_4k_field  avgt    5  3261.478 ±  48.879  ns/op
```

That's 45% faster. Put another way, 2.7 microseconds a document. Not
bad!

But have a look at how things come out when you want a single field from
a 4 *megabyte* document:
```
Benchmark                                          (excludes)  (includes)      (source)  Mode  Cnt        Score        Error  Units
FetchSourcePhaseBenchmark.filterObjects                           message  one_4m_field  avgt    5  8266343.036 ± 176197.077  ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder                 message  one_4m_field  avgt    5  6227560.013 ±  68306.318  ns/op
FetchSourcePhaseBenchmark.filterXContentonParser                  message  one_4m_field  avgt    5  1617153.472 ±  80164.547  ns/op
```

These documents are very large. I've encountered documents like them in
real life, but they've always been the outlier for me. But a 6.5
millisecond per document savings ain't anything to sneeze at.

Take a look at what you get when I turn on gc metrics:
```
FetchSourcePhaseBenchmark.filterObjects                          message  one_4m_field  avgt    5   7036097.561 ±  84721.312   ns/op
FetchSourcePhaseBenchmark.filterObjects:·gc.alloc.rate           message  one_4m_field  avgt    5      2166.613 ±     25.975  MB/sec
FetchSourcePhaseBenchmark.filterXContentOnBuilder                message  one_4m_field  avgt    5   6104595.992 ±  55445.508   ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder:·gc.alloc.rate message  one_4m_field  avgt    5      2496.978 ±     22.650  MB/sec
FetchSourcePhaseBenchmark.filterXContentonParser                 message  one_4m_field  avgt    5   1614980.846 ±  31716.956   ns/op
FetchSourcePhaseBenchmark.filterXContentonParser:·gc.alloc.rate  message  one_4m_field  avgt    5         1.755 ±      0.035  MB/sec
```
elasticsearchmachine pushed a commit that referenced this pull request Sep 13, 2021
* Memory efficient xcontent filtering (backport of #77154)

I found myself needing support for something like `filter_path` on
`XContentParser`. It was simple enough to plug it in so I did. Then I
realized that it might offer more memory efficient source filtering
(#25168) so I put together a quick benchmark comparing the source
filtering that we do in `_search`.

Filtering using the parser is about 33% faster than how we filter now
when you select a single field from a 300 byte document:
```
Benchmark                                          (excludes)  (includes)  (source)  Mode  Cnt     Score    Error  Units
FetchSourcePhaseBenchmark.filterObjects                           message     short  avgt    5  2360.342 ±  4.715  ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder                 message     short  avgt    5  2010.278 ± 15.042  ns/op
FetchSourcePhaseBenchmark.filterXContentOnParser                  message     short  avgt    5  1588.446 ± 18.593  ns/op
```

The top line is the way we filter now. The middle line is adding a
filter to `XContentBuilder` - something we can do right now without any
of my plumbing work. The bottom line is filtering on the parser,
requiring all the new plumbing.

This isn't particularly impresive. 33% *sounds* great! But 700
nanoseconds per document isn't going to cut into anyone's search times.
If you fetch a thousand docuents that's .7 milliseconds of savings.

But we mostly advise folks to use source filtering on fetch when the
source is large and you only want a small part of it. So I tried when
the source is about 4.3kb and you want a single field:
```
Benchmark                                          (excludes)  (includes)      (source)  Mode  Cnt     Score     Error  Units
FetchSourcePhaseBenchmark.filterObjects                           message  one_4k_field  avgt    5  5957.128 ± 117.402  ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder                 message  one_4k_field  avgt    5  4999.073 ±  96.003  ns/op
FetchSourcePhaseBenchmark.filterXContentonParser                  message  one_4k_field  avgt    5  3261.478 ±  48.879  ns/op
```

That's 45% faster. Put another way, 2.7 microseconds a document. Not
bad!

But have a look at how things come out when you want a single field from
a 4 *megabyte* document:
```
Benchmark                                          (excludes)  (includes)      (source)  Mode  Cnt        Score        Error  Units
FetchSourcePhaseBenchmark.filterObjects                           message  one_4m_field  avgt    5  8266343.036 ± 176197.077  ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder                 message  one_4m_field  avgt    5  6227560.013 ±  68306.318  ns/op
FetchSourcePhaseBenchmark.filterXContentonParser                  message  one_4m_field  avgt    5  1617153.472 ±  80164.547  ns/op
```

These documents are very large. I've encountered documents like them in
real life, but they've always been the outlier for me. But a 6.5
millisecond per document savings ain't anything to sneeze at.

Take a look at what you get when I turn on gc metrics:
```
FetchSourcePhaseBenchmark.filterObjects                          message  one_4m_field  avgt    5   7036097.561 ±  84721.312   ns/op
FetchSourcePhaseBenchmark.filterObjects:·gc.alloc.rate           message  one_4m_field  avgt    5      2166.613 ±     25.975  MB/sec
FetchSourcePhaseBenchmark.filterXContentOnBuilder                message  one_4m_field  avgt    5   6104595.992 ±  55445.508   ns/op
FetchSourcePhaseBenchmark.filterXContentOnBuilder:·gc.alloc.rate message  one_4m_field  avgt    5      2496.978 ±     22.650  MB/sec
FetchSourcePhaseBenchmark.filterXContentonParser                 message  one_4m_field  avgt    5   1614980.846 ±  31716.956   ns/op
FetchSourcePhaseBenchmark.filterXContentonParser:·gc.alloc.rate  message  one_4m_field  avgt    5         1.755 ±      0.035  MB/sec
```

* Fixup benchmark for 7.x
benwtrent added a commit that referenced this pull request Oct 21, 2021
This optimizes the text value extraction from source in categorize_text aggregation.

Early measurements indicate that the bulk of the time spent in this aggregation is inflating and deserializing the source. We can optimize this a bit (for larger sources) by only extracting the text field we care about.

The main downside here is if there is a sub-agg that requires the source, the that agg will need to extract the entire source again. This should be a rare case.

NOTE: opening as draft as measurements need to be done on some realistic data to see if this actually saves us time.

This takes advantage of the work done here: #77154
lockewritesdocs pushed a commit to lockewritesdocs/elasticsearch that referenced this pull request Oct 28, 2021
…tic#79099)

This optimizes the text value extraction from source in categorize_text aggregation.

Early measurements indicate that the bulk of the time spent in this aggregation is inflating and deserializing the source. We can optimize this a bit (for larger sources) by only extracting the text field we care about.

The main downside here is if there is a sub-agg that requires the source, the that agg will need to extract the entire source again. This should be a rare case.

NOTE: opening as draft as measurements need to be done on some realistic data to see if this actually saves us time.

This takes advantage of the work done here: elastic#77154
benwtrent added a commit that referenced this pull request Nov 2, 2021
…es from _source (#79651)

Previously created optimizations:

 - Enabling XContentParser filtering for specific fields: #77154
 - Retrieving only specific fields from _source: #79099

Allow significant_text to only parse out the specific field it requires from `_source`. This allows for a minor optimization when `_source` is small, but can be significant (pun intended) when `_source` contains many fields, or other larger fields that `significant_text` doesn't care about.

Since `significant_text` does not allow sub-aggs, not caching the parsed `_source` is reasonable.
@wchaparro wchaparro assigned nik9000 and unassigned nik9000 Dec 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v7.16.0 v8.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants