Add xcontent parsing to completion suggestion option #23071

cbuescher · 2017-02-09T11:23:22Z

This adds parsing from xContent to the CompletionSuggestion.Entry.Option.
The completion suggestion option also inlines the xContent rendering of the
containes SearchHit, so in order to reuse the SearchHit parser this also changes
the way SearchHit is parsed from using a loop-based parser to using a
ConstructingObjectParser that creates an intermediate map representation and
then later uses this output to create either a single SearchHit or use it with
additional fields defined in the parser for the completion suggestion option.

cbuescher · 2017-02-09T11:25:10Z

@javanna @tlrx since I will also open another PR as WIP with the solution Luca proposed (storing the SearchHits part in a temporary builder while parsing the completion suggestion option). I will cross link this with this PR so we can compare which solution looks better to you.

cbuescher · 2017-02-09T11:49:43Z

For discussion the alternative option: #23072

cbuescher · 2017-02-09T16:52:31Z

I did some comparisson between the two solutions.
Method:

parse 10000 randomly created options (same seed for both variants), measuring the cumulative time for parsing in nanos.
do this for 20 runs, skip the first 5 for warmup (thanks to @danielmitterdorfer for the hint), averaging the remaining 15 runs

Since there was still some variance I repeated the above measurements four times, see results below.
Comparing the average of these four experiments (779831853 nanos vs. 1321970303) it seems the approach in this PR (parsing to an intermediate map) is about 40% faster than the one in #23072.
I'm a bit suprised to see so much difference but I will double check this again.

run	#23071 (nanos)	#23072 (nanos)
1	765970079.50	1349812899.36
2	762845337.21	1347532678.00
3	789574650.93	1293939173.21
4	800937347.79	1296596465.21
avg	779831853.86	1321970303.95

danielmitterdorfer · 2017-02-10T11:53:52Z

@cbuescher my own tests hint that the performance difference is even more pronounced.

Methodology

We ran the benchmark with JMH on our microbenchmarking infrastructure:

CPU: Intel(R) Xeon(R) CPU E3-1246 v3 @ 3.50GHz
RAM: 32 GB
OS: Linux Kernel version 4.4.0-38
JDK: Oracle JDK 1.8.0_101, VM 25.101-b13

Benchmark threads were isolated from the operating system with CPU sets and the benchmarks were pinned to these isolated CPUs. We have also enabled the performance CPU governor and locked the CPU at its base frequency of 3.5GHz.

Each benchmark ran with 3 JVM forks, 10 warmup iterations and 10 measurement iterations. We repeated the experiments three times and varied the execution order of benchmarks in order to avoid biasing.

Results

#23071
Benchmark              (contentType)  (humanReadable)  Mode  Cnt   Score   Error  Units
ParserBenchmark.parse           JSON            false  avgt   30  12.072 ± 0.062  us/op
ParserBenchmark.parse           JSON             true  avgt   30  12.215 ± 0.106  us/op
ParserBenchmark.parse          SMILE            false  avgt   30   5.795 ± 0.055  us/op
ParserBenchmark.parse          SMILE             true  avgt   30   5.847 ± 0.050  us/op
ParserBenchmark.parse           YAML            false  avgt   30  88.637 ± 0.621  us/op
ParserBenchmark.parse           YAML             true  avgt   30  89.987 ± 1.025  us/op
ParserBenchmark.parse           CBOR            false  avgt   30   5.238 ± 0.068  us/op
ParserBenchmark.parse           CBOR             true  avgt   30   5.418 ± 0.115  us/op

#23072
Benchmark              (contentType)  (humanReadable)  Mode  Cnt    Score   Error  Units
ParserBenchmark.parse           JSON            false  avgt   30   23.933 ± 0.586  us/op
ParserBenchmark.parse           JSON             true  avgt   30   23.782 ± 0.295  us/op
ParserBenchmark.parse          SMILE            false  avgt   30   10.790 ± 0.120  us/op
ParserBenchmark.parse          SMILE             true  avgt   30   10.748 ± 0.071  us/op
ParserBenchmark.parse           YAML            false  avgt   30  193.490 ± 1.951  us/op
ParserBenchmark.parse           YAML             true  avgt   30  191.485 ± 1.759  us/op
ParserBenchmark.parse           CBOR            false  avgt   30    9.596 ± 0.132  us/op
ParserBenchmark.parse           CBOR             true  avgt   30    9.240 ± 0.100  us/op

The benchmark code is available as a gist but unfortunately it requires an additional dependency on our the test framework (I see how I can improve this situation). Please just ping me if you want to reproduce the results.

cbuescher · 2017-02-10T12:04:36Z

@danielmitterdorfer thanks a lot for the detailed tests and your hints yesterday, will surely take a look at the code, this is useful to have available next time such a question pops up.

danielmitterdorfer · 2017-02-10T12:06:23Z

Sure, you're welcome. :)

tlrx

Thanks @cbuescher, this is getting close. And sorry for the time it took me to review this!

I left some comments but I didn't find anything big. I'd be happy to have more tests for parsing methods.

tlrx · 2017-02-14T08:30:37Z

core/src/main/java/org/elasticsearch/search/SearchHit.java

-        searchHit.explanation(explanation);
-        searchHit.setInnerHits(innerHits);
-        if (matchedQueries.size() > 0) {
+    // ------------- Parsing code --------------


I think you can remove this comment

tlrx · 2017-02-14T08:36:07Z

core/src/main/java/org/elasticsearch/search/SearchHit.java

        if (shardId != null && nodeId != null) {
            searchHit.shard(new SearchShardTarget(nodeId, shardId));
        }
        searchHit.fields(fields);
        return searchHit;
    }

-    private static Explanation parseExplanation(XContentParser parser) throws IOException {
+    private static <T> T get(String key, Map<String, Object> map, T defaultValue) {


Maybe we could use map.getOrDefault() directly?

I wanted to use the additional implicit casting that map.getOrDefault() doesn't provide. If I use it directly I need to cast in every value assignment e.g. like String id = (String) values.getOrDefault(Fields._ID, null);. If you prefer that I can make the change, but I like the conciseness of this small private helper.

Can you at least change it to:

@SuppressWarnings("unchecked") private static <T> T get(String key, Map<String, Object> map, T defaultValue) { return (T) map.getOrDefault(key, defaultValue); }

I don't want us to reimplement core stuff just to avoid an explicit cast

tlrx · 2017-02-14T08:39:21Z

core/src/main/java/org/elasticsearch/search/SearchHit.java

+        return value;
+    }
+
+    private static float parseScore(XContentParser parser, Void context) throws IOException {


The Void context argument is not used, can we remove it? I don't think we should add it just to use method reference in the ObjectParser, or maybe I'm missing something?

I can remove these unused Void arguments, they are meant to make the "declareParseFields()" part above more readable. I will need to include a few more lambdas there then. Take a look at the upcomming commit and let me know what you think is better.

+1 on removing the Void context from all methods. The declareInnerHitsParseFields is already complex to read I think, that won't add much.

Sorry, I just saw that you remove them already, thanks!

tlrx · 2017-02-14T08:40:56Z

core/src/main/java/org/elasticsearch/search/SearchHit.java

+        if (parser.currentToken() == XContentParser.Token.VALUE_NUMBER || parser.currentToken() == XContentParser.Token.VALUE_STRING) {
+            return parser.floatValue();
+        } else {
+            return Float.NaN;


I think we should throw an unexpected token type exception here?

Not really I think, toXContent prints the Null token if score is NaN: if (Float.isNaN(score)) { builder.nullField(Fields._SCORE); } and I think we need to parse the same value back. Its also included in the tests I think.

Sorry, I meant: I think we should be more precise when parsing the value here. If token is a VALUE_NUMBER we use floatValue(), if token is VALUE_NULL we use Float.NaN and all other cases must throw an exception. What do you think?

Agreed, but that check is already be taken care of by ObjectParser by declaring the ValueType.FLOAT_OR_NULL when declaring the field:

parser.declareField((map, value) -> map.put(Fields._SCORE, value), SearchHit::parseScore, new ParseField(Fields._SCORE), ValueType.FLOAT_OR_NULL);

tlrx · 2017-02-14T08:45:16Z

core/src/main/java/org/elasticsearch/search/SearchHit.java

+                    if (fieldMap == null) {
+                        fieldMap = new HashMap<>();
+                        map.put(Fields.FIELDS, fieldMap);
+                    }


You could use map.computeIfAbsent()

good suggestion, didn't know that one

tlrx · 2017-02-14T09:02:12Z

core/src/main/java/org/elasticsearch/search/suggest/completion/CompletionSuggestion.java

+            }
+
+            private static Map<String, Set<CharSequence>> parseContext(XContentParser parser, Void context) throws IOException {
+                Map<String, Set<CharSequence>> contexts = new HashMap<>();


Can we move this after the ensureExpectedToken()?

tlrx · 2017-02-14T09:02:46Z

core/src/main/java/org/elasticsearch/search/suggest/completion/CompletionSuggestion.java

+                ensureExpectedToken(XContentParser.Token.START_OBJECT, parser.currentToken(), parser::getTokenLocation);
+                while((parser.nextToken()) != XContentParser.Token.END_OBJECT) {
+                    ensureExpectedToken(XContentParser.Token.FIELD_NAME, parser.currentToken(), parser::getTokenLocation);
+                    String key = parser.currentName();


We don't really need this key

We do later, to use it in contexts map as key I think.

Oh right, sorry.

tlrx · 2017-02-14T09:06:21Z

core/src/main/java/org/elasticsearch/search/suggest/completion/CompletionSuggestion.java

+                    String key = parser.currentName();
+                    ensureExpectedToken(XContentParser.Token.START_ARRAY, parser.nextToken(), parser::getTokenLocation);
+                    Set<CharSequence> values = new HashSet<>();
+                    for (Object value : parser.list()) {


I think we should parse the values using a while((parser.nextToken()) != XContentParser.Token.END_ARRAY) loop and check the token type for each value. We expect only strings and we should throw an exception if we found something else

tlrx · 2017-02-14T09:15:25Z

core/src/test/java/org/elasticsearch/search/suggest/CompletionSuggestionOptionTests.java

+        XContentType xContentType = randomFrom(XContentType.values());
+        boolean humanReadable = randomBoolean();
+        BytesReference originalBytes = toXContent(option, xContentType, humanReadable);
+        Option parsed;


Something I did in ElasticsearchExceptionTests is to randomly shuffle the fields, just to be sure that parsers do not rely on the order of fields:

if (randomBoolean()) { try (XContentParser parser = createParser(xContentType.xContent(), originalBytes)) { originalBytes = shuffleXContent(parser, randomBoolean()).bytes(); } }

would you be OK to add something like this?

tlrx · 2017-02-14T09:22:36Z

core/src/main/java/org/elasticsearch/search/SearchHit.java

-        if (matchedQueries.size() > 0) {
+    // ------------- Parsing code --------------
+
+    private static ObjectParser<Map<String, Object>, Void> PARSER = new ObjectParser<>("innerHitsParser", HashMap::new);


It would be nice to have some doc that indicates that it's going to be parsed and stored in a temporary map before being created using createFromMap() method, and the reasoning around why we do this. Also, maybe we could rename to MAP_PARSER? Just an idea.

This adds parsing from xContent to the CompletionSuggestion.Entry.Option. The completion suggestion option also inlines the xContent rendering of the containes SearchHit, so in order to reuse the SearchHit parser this also changes the way SearchHit is parsed from using a loop-based parser to using a ConstructingObjectParser that creates an intermediate map representation and then later uses this output to create either a single SearchHit or use it with additional fields defined in the parser for the completion suggestion option.

cbuescher · 2017-02-14T13:20:03Z

@tlrx thanks for the review, I addressed your comments or left questions or explanations where possible.

tlrx · 2017-02-15T13:40:01Z

Thanks @cbuescher. Do you think we could change the parsing methods like parseScore, parseExplanation etc to have package visibility and then add unit tests for each of them? I know it's a lot of work but I'd be more comfortable if they were also unit tested.

tlrx

Thanks @cbuescher, it looks good to me.

We both agreed on not implementing unit tests for the private static parsing methods like parseScore: the current randomized test covers enough cases and those methods are not intended to be reused outside of the SearchHit class.

This adds parsing from xContent to the CompletionSuggestion.Entry.Option. The completion suggestion option also inlines the xContent rendering of the containes SearchHit, so in order to reuse the SearchHit parser this also changes the way SearchHit is parsed from using a loop-based parser to using a ConstructingObjectParser that creates an intermediate map representation and then later uses this output to create either a single SearchHit or use it with additional fields defined in the parser for the completion suggestion option.

cbuescher · 2017-02-15T16:55:13Z

Also merged with 5.x with 5d24bf1

cbuescher added :Search/Search Search-related issues that do not fall into other categories >enhancement review v6.0.0-alpha1 labels Feb 9, 2017

tlrx requested changes Feb 14, 2017

View reviewed changes

cbuescher added 2 commits February 14, 2017 14:09

Addressing review comments

b512918

cbuescher force-pushed the addParsing-completionSuggestionOption branch from bced78f to b512918 Compare February 14, 2017 13:11

Changing field name constants in NestedIdentity

28d399b

Simplify internal helper method

89e03ed

tlrx approved these changes Feb 15, 2017

View reviewed changes

cbuescher merged commit b963144 into elastic:master Feb 15, 2017

cbuescher added the v5.4.0 label Feb 15, 2017

Add xcontent parsing to completion suggestion option #23071

Add xcontent parsing to completion suggestion option #23071

Conversation

cbuescher commented Feb 9, 2017

cbuescher commented Feb 9, 2017

cbuescher commented Feb 9, 2017

cbuescher commented Feb 9, 2017 • edited Loading

danielmitterdorfer commented Feb 10, 2017

Methodology

Results

cbuescher commented Feb 10, 2017

danielmitterdorfer commented Feb 10, 2017

tlrx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlrx Feb 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cbuescher commented Feb 14, 2017

tlrx commented Feb 15, 2017

tlrx left a comment

Choose a reason for hiding this comment

cbuescher commented Feb 15, 2017

cbuescher commented Feb 9, 2017 •

edited

Loading

tlrx Feb 15, 2017 •

edited

Loading