Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add xcontent parsing to completion suggestion option #23071

Merged

Conversation

cbuescher
Copy link
Member

This adds parsing from xContent to the CompletionSuggestion.Entry.Option.
The completion suggestion option also inlines the xContent rendering of the
containes SearchHit, so in order to reuse the SearchHit parser this also changes
the way SearchHit is parsed from using a loop-based parser to using a
ConstructingObjectParser that creates an intermediate map representation and
then later uses this output to create either a single SearchHit or use it with
additional fields defined in the parser for the completion suggestion option.

@cbuescher cbuescher added :Search/Search Search-related issues that do not fall into other categories >enhancement review v6.0.0-alpha1 labels Feb 9, 2017
@cbuescher
Copy link
Member Author

@javanna @tlrx since I will also open another PR as WIP with the solution Luca proposed (storing the SearchHits part in a temporary builder while parsing the completion suggestion option). I will cross link this with this PR so we can compare which solution looks better to you.

@cbuescher
Copy link
Member Author

For discussion the alternative option: #23072

@cbuescher
Copy link
Member Author

cbuescher commented Feb 9, 2017

I did some comparisson between the two solutions.
Method:

  • parse 10000 randomly created options (same seed for both variants), measuring the cumulative time for parsing in nanos.
  • do this for 20 runs, skip the first 5 for warmup (thanks to @danielmitterdorfer for the hint), averaging the remaining 15 runs

Since there was still some variance I repeated the above measurements four times, see results below.
Comparing the average of these four experiments (779831853 nanos vs. 1321970303) it seems the approach in this PR (parsing to an intermediate map) is about 40% faster than the one in #23072.
I'm a bit suprised to see so much difference but I will double check this again.

run  #23071 (nanos)  #23072 (nanos)
1 765970079.50 1349812899.36
2 762845337.21 1347532678.00
3 789574650.93 1293939173.21
4 800937347.79 1296596465.21
avg  779831853.86 1321970303.95

@danielmitterdorfer
Copy link
Member

@cbuescher my own tests hint that the performance difference is even more pronounced.

Methodology

We ran the benchmark with JMH on our microbenchmarking infrastructure:

  • CPU: Intel(R) Xeon(R) CPU E3-1246 v3 @ 3.50GHz
  • RAM: 32 GB
  • OS: Linux Kernel version 4.4.0-38
  • JDK: Oracle JDK 1.8.0_101, VM 25.101-b13

Benchmark threads were isolated from the operating system with CPU sets and the benchmarks were pinned to these isolated CPUs. We have also enabled the performance CPU governor and locked the CPU at its base frequency of 3.5GHz.

Each benchmark ran with 3 JVM forks, 10 warmup iterations and 10 measurement iterations. We repeated the experiments three times and varied the execution order of benchmarks in order to avoid biasing.

Results

#23071
Benchmark              (contentType)  (humanReadable)  Mode  Cnt   Score   Error  Units
ParserBenchmark.parse           JSON            false  avgt   30  12.072 ± 0.062  us/op
ParserBenchmark.parse           JSON             true  avgt   30  12.215 ± 0.106  us/op
ParserBenchmark.parse          SMILE            false  avgt   30   5.795 ± 0.055  us/op
ParserBenchmark.parse          SMILE             true  avgt   30   5.847 ± 0.050  us/op
ParserBenchmark.parse           YAML            false  avgt   30  88.637 ± 0.621  us/op
ParserBenchmark.parse           YAML             true  avgt   30  89.987 ± 1.025  us/op
ParserBenchmark.parse           CBOR            false  avgt   30   5.238 ± 0.068  us/op
ParserBenchmark.parse           CBOR             true  avgt   30   5.418 ± 0.115  us/op
#23072
Benchmark              (contentType)  (humanReadable)  Mode  Cnt    Score   Error  Units
ParserBenchmark.parse           JSON            false  avgt   30   23.933 ± 0.586  us/op
ParserBenchmark.parse           JSON             true  avgt   30   23.782 ± 0.295  us/op
ParserBenchmark.parse          SMILE            false  avgt   30   10.790 ± 0.120  us/op
ParserBenchmark.parse          SMILE             true  avgt   30   10.748 ± 0.071  us/op
ParserBenchmark.parse           YAML            false  avgt   30  193.490 ± 1.951  us/op
ParserBenchmark.parse           YAML             true  avgt   30  191.485 ± 1.759  us/op
ParserBenchmark.parse           CBOR            false  avgt   30    9.596 ± 0.132  us/op
ParserBenchmark.parse           CBOR             true  avgt   30    9.240 ± 0.100  us/op

The benchmark code is available as a gist but unfortunately it requires an additional dependency on our the test framework (I see how I can improve this situation). Please just ping me if you want to reproduce the results.

@cbuescher
Copy link
Member Author

@danielmitterdorfer thanks a lot for the detailed tests and your hints yesterday, will surely take a look at the code, this is useful to have available next time such a question pops up.

@danielmitterdorfer
Copy link
Member

Sure, you're welcome. :)

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cbuescher, this is getting close. And sorry for the time it took me to review this!

I left some comments but I didn't find anything big. I'd be happy to have more tests for parsing methods.

searchHit.explanation(explanation);
searchHit.setInnerHits(innerHits);
if (matchedQueries.size() > 0) {
// ------------- Parsing code --------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove this comment

if (shardId != null && nodeId != null) {
searchHit.shard(new SearchShardTarget(nodeId, shardId));
}
searchHit.fields(fields);
return searchHit;
}

private static Explanation parseExplanation(XContentParser parser) throws IOException {
private static <T> T get(String key, Map<String, Object> map, T defaultValue) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could use map.getOrDefault() directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to use the additional implicit casting that map.getOrDefault() doesn't provide. If I use it directly I need to cast in every value assignment e.g. like String id = (String) values.getOrDefault(Fields._ID, null);. If you prefer that I can make the change, but I like the conciseness of this small private helper.

Copy link
Member

@tlrx tlrx Feb 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you at least change it to:

@SuppressWarnings("unchecked")
    private static <T> T get(String key, Map<String, Object> map, T defaultValue) {
        return (T) map.getOrDefault(key, defaultValue);
    }

I don't want us to reimplement core stuff just to avoid an explicit cast

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

return value;
}

private static float parseScore(XContentParser parser, Void context) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Void context argument is not used, can we remove it? I don't think we should add it just to use method reference in the ObjectParser, or maybe I'm missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove these unused Void arguments, they are meant to make the "declareParseFields()" part above more readable. I will need to include a few more lambdas there then. Take a look at the upcomming commit and let me know what you think is better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on removing the Void context from all methods. The declareInnerHitsParseFields is already complex to read I think, that won't add much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I just saw that you remove them already, thanks!

if (parser.currentToken() == XContentParser.Token.VALUE_NUMBER || parser.currentToken() == XContentParser.Token.VALUE_STRING) {
return parser.floatValue();
} else {
return Float.NaN;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should throw an unexpected token type exception here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really I think, toXContent prints the Null token if score is NaN: if (Float.isNaN(score)) { builder.nullField(Fields._SCORE); } and I think we need to parse the same value back. Its also included in the tests I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant: I think we should be more precise when parsing the value here. If token is a VALUE_NUMBER we use floatValue(), if token is VALUE_NULL we use Float.NaN and all other cases must throw an exception. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but that check is already be taken care of by ObjectParser by declaring the ValueType.FLOAT_OR_NULL when declaring the field:

parser.declareField((map, value) -> map.put(Fields._SCORE, value), SearchHit::parseScore, new ParseField(Fields._SCORE),
                ValueType.FLOAT_OR_NULL);

if (fieldMap == null) {
fieldMap = new HashMap<>();
map.put(Fields.FIELDS, fieldMap);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use map.computeIfAbsent()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good suggestion, didn't know that one

}

private static Map<String, Set<CharSequence>> parseContext(XContentParser parser, Void context) throws IOException {
Map<String, Set<CharSequence>> contexts = new HashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this after the ensureExpectedToken()?

ensureExpectedToken(XContentParser.Token.START_OBJECT, parser.currentToken(), parser::getTokenLocation);
while((parser.nextToken()) != XContentParser.Token.END_OBJECT) {
ensureExpectedToken(XContentParser.Token.FIELD_NAME, parser.currentToken(), parser::getTokenLocation);
String key = parser.currentName();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really need this key

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do later, to use it in contexts map as key I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, sorry.

String key = parser.currentName();
ensureExpectedToken(XContentParser.Token.START_ARRAY, parser.nextToken(), parser::getTokenLocation);
Set<CharSequence> values = new HashSet<>();
for (Object value : parser.list()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should parse the values using a while((parser.nextToken()) != XContentParser.Token.END_ARRAY) loop and check the token type for each value. We expect only strings and we should throw an exception if we found something else

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

XContentType xContentType = randomFrom(XContentType.values());
boolean humanReadable = randomBoolean();
BytesReference originalBytes = toXContent(option, xContentType, humanReadable);
Option parsed;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I did in ElasticsearchExceptionTests is to randomly shuffle the fields, just to be sure that parsers do not rely on the order of fields:

if (randomBoolean()) {
 try (XContentParser parser = createParser(xContentType.xContent(), originalBytes)) {
   originalBytes = shuffleXContent(parser, randomBoolean()).bytes();
 }
}

would you be OK to add something like this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

if (matchedQueries.size() > 0) {
// ------------- Parsing code --------------

private static ObjectParser<Map<String, Object>, Void> PARSER = new ObjectParser<>("innerHitsParser", HashMap::new);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have some doc that indicates that it's going to be parsed and stored in a temporary map before being created using createFromMap() method, and the reasoning around why we do this. Also, maybe we could rename to MAP_PARSER? Just an idea.

This adds parsing from xContent to the CompletionSuggestion.Entry.Option.
The completion suggestion option also inlines the xContent rendering of the
containes SearchHit, so in order to reuse the SearchHit parser this also changes
the way SearchHit is parsed from using a loop-based parser to using a
ConstructingObjectParser that creates an intermediate map representation and
then later uses this output to create either a single SearchHit or use it with
additional fields defined in the parser for the completion suggestion option.
@cbuescher cbuescher force-pushed the addParsing-completionSuggestionOption branch from bced78f to b512918 Compare February 14, 2017 13:11
@cbuescher
Copy link
Member Author

@tlrx thanks for the review, I addressed your comments or left questions or explanations where possible.

@tlrx
Copy link
Member

tlrx commented Feb 15, 2017

Thanks @cbuescher. Do you think we could change the parsing methods like parseScore, parseExplanation etc to have package visibility and then add unit tests for each of them? I know it's a lot of work but I'd be more comfortable if they were also unit tested.

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cbuescher, it looks good to me.

We both agreed on not implementing unit tests for the private static parsing methods like parseScore: the current randomized test covers enough cases and those methods are not intended to be reused outside of the SearchHit class.

@cbuescher cbuescher merged commit b963144 into elastic:master Feb 15, 2017
cbuescher added a commit that referenced this pull request Feb 15, 2017
This adds parsing from xContent to the CompletionSuggestion.Entry.Option.
The completion suggestion option also inlines the xContent rendering of the
containes SearchHit, so in order to reuse the SearchHit parser this also changes
the way SearchHit is parsed from using a loop-based parser to using a
ConstructingObjectParser that creates an intermediate map representation and
then later uses this output to create either a single SearchHit or use it with
additional fields defined in the parser for the completion suggestion option.
@cbuescher
Copy link
Member Author

Also merged with 5.x with 5d24bf1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories v5.4.0 v6.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants