-
Notifications
You must be signed in to change notification settings - Fork 24.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add xcontent parsing to completion suggestion option #23071
Add xcontent parsing to completion suggestion option #23071
Conversation
For discussion the alternative option: #23072 |
I did some comparisson between the two solutions.
Since there was still some variance I repeated the above measurements four times, see results below.
|
@cbuescher my own tests hint that the performance difference is even more pronounced. MethodologyWe ran the benchmark with JMH on our microbenchmarking infrastructure:
Benchmark threads were isolated from the operating system with CPU sets and the benchmarks were pinned to these isolated CPUs. We have also enabled the performance CPU governor and locked the CPU at its base frequency of 3.5GHz. Each benchmark ran with 3 JVM forks, 10 warmup iterations and 10 measurement iterations. We repeated the experiments three times and varied the execution order of benchmarks in order to avoid biasing. Results
The benchmark code is available as a gist but unfortunately it requires an additional dependency on our the test framework (I see how I can improve this situation). Please just ping me if you want to reproduce the results. |
@danielmitterdorfer thanks a lot for the detailed tests and your hints yesterday, will surely take a look at the code, this is useful to have available next time such a question pops up. |
Sure, you're welcome. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cbuescher, this is getting close. And sorry for the time it took me to review this!
I left some comments but I didn't find anything big. I'd be happy to have more tests for parsing methods.
searchHit.explanation(explanation); | ||
searchHit.setInnerHits(innerHits); | ||
if (matchedQueries.size() > 0) { | ||
// ------------- Parsing code -------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can remove this comment
if (shardId != null && nodeId != null) { | ||
searchHit.shard(new SearchShardTarget(nodeId, shardId)); | ||
} | ||
searchHit.fields(fields); | ||
return searchHit; | ||
} | ||
|
||
private static Explanation parseExplanation(XContentParser parser) throws IOException { | ||
private static <T> T get(String key, Map<String, Object> map, T defaultValue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could use map.getOrDefault()
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to use the additional implicit casting that map.getOrDefault()
doesn't provide. If I use it directly I need to cast in every value assignment e.g. like String id = (String) values.getOrDefault(Fields._ID, null);
. If you prefer that I can make the change, but I like the conciseness of this small private helper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you at least change it to:
@SuppressWarnings("unchecked")
private static <T> T get(String key, Map<String, Object> map, T defaultValue) {
return (T) map.getOrDefault(key, defaultValue);
}
I don't want us to reimplement core stuff just to avoid an explicit cast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
return value; | ||
} | ||
|
||
private static float parseScore(XContentParser parser, Void context) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Void context
argument is not used, can we remove it? I don't think we should add it just to use method reference in the ObjectParser, or maybe I'm missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove these unused Void arguments, they are meant to make the "declareParseFields()" part above more readable. I will need to include a few more lambdas there then. Take a look at the upcomming commit and let me know what you think is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on removing the Void context
from all methods. The declareInnerHitsParseFields
is already complex to read I think, that won't add much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I just saw that you remove them already, thanks!
if (parser.currentToken() == XContentParser.Token.VALUE_NUMBER || parser.currentToken() == XContentParser.Token.VALUE_STRING) { | ||
return parser.floatValue(); | ||
} else { | ||
return Float.NaN; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should throw an unexpected token type exception here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really I think, toXContent prints the Null token if score is NaN: if (Float.isNaN(score)) { builder.nullField(Fields._SCORE); }
and I think we need to parse the same value back. Its also included in the tests I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant: I think we should be more precise when parsing the value here. If token is a VALUE_NUMBER we use floatValue(), if token is VALUE_NULL we use Float.NaN and all other cases must throw an exception. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but that check is already be taken care of by ObjectParser by declaring the ValueType.FLOAT_OR_NULL when declaring the field:
parser.declareField((map, value) -> map.put(Fields._SCORE, value), SearchHit::parseScore, new ParseField(Fields._SCORE),
ValueType.FLOAT_OR_NULL);
if (fieldMap == null) { | ||
fieldMap = new HashMap<>(); | ||
map.put(Fields.FIELDS, fieldMap); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use map.computeIfAbsent()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good suggestion, didn't know that one
} | ||
|
||
private static Map<String, Set<CharSequence>> parseContext(XContentParser parser, Void context) throws IOException { | ||
Map<String, Set<CharSequence>> contexts = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this after the ensureExpectedToken()?
ensureExpectedToken(XContentParser.Token.START_OBJECT, parser.currentToken(), parser::getTokenLocation); | ||
while((parser.nextToken()) != XContentParser.Token.END_OBJECT) { | ||
ensureExpectedToken(XContentParser.Token.FIELD_NAME, parser.currentToken(), parser::getTokenLocation); | ||
String key = parser.currentName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't really need this key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do later, to use it in contexts map as key I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, sorry.
String key = parser.currentName(); | ||
ensureExpectedToken(XContentParser.Token.START_ARRAY, parser.nextToken(), parser::getTokenLocation); | ||
Set<CharSequence> values = new HashSet<>(); | ||
for (Object value : parser.list()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should parse the values using a while((parser.nextToken()) != XContentParser.Token.END_ARRAY)
loop and check the token type for each value. We expect only strings and we should throw an exception if we found something else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
XContentType xContentType = randomFrom(XContentType.values()); | ||
boolean humanReadable = randomBoolean(); | ||
BytesReference originalBytes = toXContent(option, xContentType, humanReadable); | ||
Option parsed; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something I did in ElasticsearchExceptionTests is to randomly shuffle the fields, just to be sure that parsers do not rely on the order of fields:
if (randomBoolean()) {
try (XContentParser parser = createParser(xContentType.xContent(), originalBytes)) {
originalBytes = shuffleXContent(parser, randomBoolean()).bytes();
}
}
would you be OK to add something like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
if (matchedQueries.size() > 0) { | ||
// ------------- Parsing code -------------- | ||
|
||
private static ObjectParser<Map<String, Object>, Void> PARSER = new ObjectParser<>("innerHitsParser", HashMap::new); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have some doc that indicates that it's going to be parsed and stored in a temporary map before being created using createFromMap() method, and the reasoning around why we do this. Also, maybe we could rename to MAP_PARSER? Just an idea.
This adds parsing from xContent to the CompletionSuggestion.Entry.Option. The completion suggestion option also inlines the xContent rendering of the containes SearchHit, so in order to reuse the SearchHit parser this also changes the way SearchHit is parsed from using a loop-based parser to using a ConstructingObjectParser that creates an intermediate map representation and then later uses this output to create either a single SearchHit or use it with additional fields defined in the parser for the completion suggestion option.
bced78f
to
b512918
Compare
@tlrx thanks for the review, I addressed your comments or left questions or explanations where possible. |
Thanks @cbuescher. Do you think we could change the parsing methods like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cbuescher, it looks good to me.
We both agreed on not implementing unit tests for the private static parsing methods like parseScore: the current randomized test covers enough cases and those methods are not intended to be reused outside of the SearchHit class.
This adds parsing from xContent to the CompletionSuggestion.Entry.Option. The completion suggestion option also inlines the xContent rendering of the containes SearchHit, so in order to reuse the SearchHit parser this also changes the way SearchHit is parsed from using a loop-based parser to using a ConstructingObjectParser that creates an intermediate map representation and then later uses this output to create either a single SearchHit or use it with additional fields defined in the parser for the completion suggestion option.
Also merged with 5.x with 5d24bf1 |
This adds parsing from xContent to the CompletionSuggestion.Entry.Option.
The completion suggestion option also inlines the xContent rendering of the
containes SearchHit, so in order to reuse the SearchHit parser this also changes
the way SearchHit is parsed from using a loop-based parser to using a
ConstructingObjectParser that creates an intermediate map representation and
then later uses this output to create either a single SearchHit or use it with
additional fields defined in the parser for the completion suggestion option.