Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factor out sort values from InternalSearchHit #22080

Merged
merged 6 commits into from Dec 21, 2016

Conversation

cbuescher
Copy link
Member

This adds fromXContent method and unit test for sort values that are part of
InternalSearchHit. In order to centralize serialisation and xContent parsing and
rendering code, move all relevant parts to a new class which can be unit tested
much better in isolation.This is part of the preparation for parsing search
responses on the client side.

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few minor things, LGTM though.

sortValues = new Object[0];
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move writeTo up here? I like to get reading and writing on the same page if possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

@Override
public void writeTo(StreamOutput out) throws IOException {
out.writeVInt(sortValues.length);
if (sortValues.length > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this if is needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

}
}

public boolean xContentEquals(ToXContentToBytes other) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer to leave this out and do assertEquals(this.toString(), other.toString()) in the test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few comments but I like it

sortValues = new Object[0];
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

assert parser.currentToken() == XContentParser.Token.FIELD_NAME;
assert parser.currentName().equals(Fields.SORT);
XContentParser.Token token = parser.nextToken();
assert token == XContentParser.Token.START_ARRAY;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall these be exceptions? I don't have a strong opinion but I got this same comment from Tanguy in another PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way is fine. I will change it to what you did in #22082.

}

public SearchSortValues(Object[] sortValues, DocValueFormat[] sortValueFormats) {
this.sortValues = Arrays.copyOf(sortValues, sortValues.length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we check that sortValueFormats is not null? also sortValues?

@@ -86,7 +82,7 @@

private Map<String, HighlightField> highlightFields = null;

private Object[] sortValues = EMPTY_SORT_VALUES;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like this empty array could be useful, shall we rather have a SearchSortValues.EMPTY default ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do.

@Override
public void writeTo(StreamOutput out) throws IOException {
out.writeVInt(sortValues.length);
if (sortValues.length > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

}
}

public boolean xContentEquals(ToXContentToBytes other) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

valueSuppliers.add(() -> randomByte());
valueSuppliers.add(() -> randomShort());
valueSuppliers.add(() -> randomBoolean());
valueSuppliers.add(() -> frequently() ? randomAsciiOfLengthBetween(1, 30) : randomRealisticUnicodeOfCodepointLength(30));
Copy link
Member

@javanna javanna Dec 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, more elegant than a switch. I may steal this :)

@cbuescher
Copy link
Member Author

cbuescher commented Dec 12, 2016

@nik9000 @javanna I added a commit that addresses your last comments. For now I exclude SMILE from the xcontent types being tested because it makes roundtrip testing very hard, even if we only compare the json string representation of the object before and after parsing xContent. The problem is that when SMILE sends float values and we don't explicitely cast them back to floats, they are parsed as double and strangely get a slightly different value, if not cast back to float. I wrote a small test showing this in isolation here: https://github.com/cbuescher/elasticsearch/blob/smileFloatIssue/core/src/test/java/org/elasticsearch/common/xcontent/SmileFloatTests.java

SearchSortValues sortValues = createTestItem();
// exclude SMILE: its difficult to compare exact float values after parsing
// they get returned as doubles with a slightly different value if not cast back to float (which we cannot do due to
// losing type information on the rest layer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did quite some digging around these, I did find that the different parsers for SMILE, CBOR and JSON and YAML have a different way of parsing float, either as float, as double, or as double but with double precision. This makes comparisons a bit cumbersome. The issue you were finding with SMILE is that once you parse a float, it is parsed into a double variable, which has double precision. When you print that out it becomes a double, which reparsed it is not a float anymore. The solution isn't to disable SMILE testing though, I resorted to comparing the map representations, essentially by parsing them back into maps. That also solves eventual ordering issue, as json keys ordering doesn't matter. see https://github.com/elastic/elasticsearch/pull/22082/files#diff-3e73f4760da40a9d55345c94edf833c2R108 . I think that we have to move that assertion to some common place, not sure yet where.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this using the common helpers you pointed me to, left TODOs where we need to remove those once they are moved to a common class.

builder.startObject(); // we need to wrap xContent output in proper object to create a parser for it
builder = sortValues.toXContent(builder, ToXContent.EMPTY_PARAMS);
builder.endObject();
return builder.string();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use Strings.toString instead? is pretty printing important here?

builder = sortValues.toXContent(builder, ToXContent.EMPTY_PARAMS);
builder.endObject();

XContentParser parser = xcontentType.xContent().createParser(builder.bytes());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general, we should rather use createParser from ESTestCase (it was added yesterday I think)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
parser.nextToken();
if (parser.currentToken() != XContentParser.Token.START_ARRAY) {
throw new ParsingException(parser.getTokenLocation(), "expected [" + XContentParser.Token.START_ARRAY + "] token");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the static methods in XContentParserUtils instead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@cbuescher
Copy link
Member Author

cbuescher commented Dec 19, 2016

@javanna I hope I adressed your last comments, using two (slightly modified) helper methods from your own PR now but leaving a note to remove these once they are available elsewhere. Not sure if we should hold this PR until those are merged.

public static SearchSortValues fromXContent(XContentParser parser) throws IOException {
XContentParserUtils.ensureFieldName(parser, parser.currentToken(), Fields.SORT);
parser.nextToken();
XContentParserUtils.ensureType(XContentParser.Token.START_ARRAY, parser.currentToken(), () -> parser.getTokenLocation());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just do parser.nextToken() as part of this line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it but decided against it for better readability, I found advancing of the token gets hidden too much otherwise. Matter of taste I guess.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's ok with me

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you could then save the returned token rather than retrieve it back from the parser? super minor nit though, not even sure what it helps

XContentParserUtils.ensureFieldName(parser, parser.currentToken(), Fields.SORT);
parser.nextToken();
XContentParserUtils.ensureType(XContentParser.Token.START_ARRAY, parser.currentToken(), () -> parser.getTokenLocation());
return new SearchSortValues(parser.list().toArray());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parser.list also supports nested arrays and objects. can they appear as part of sort values?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may have the same "problem" that we have in get with stored fields, where we print out potentially anything, and when parsing it back we have to figure out what formats we actually support. Note also that parser list supports binary objects, but it returns byte[] for them, which makes it cumbersome to test them, hence I am assuming that part does not get tested at the moment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is somewhat limited here due to what is possible to go through writeTo/readFrom. Here this is essential single values, they should all be covered by the tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but with serialization we can always read whatever we wrote in the same format. With parsing on the other hand, we can write many different types but we can parse only a few e.g. strings, numbers and not much more. That is why the generic methods hide complexity some times, especially because we end up supporting stuff that may not be needed and is not tested. Maybe it's ok in this case though. Can you double check if byte[] can be a potential value here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had missed that we know exactly which formats are supported thanks to the serialization methods that don't use a generic write/read method. So no byte[], no objects etc. I do wonder if it's cleaner to read the list manually and call parser.objectText() to read each value.

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments, the main point is about formats supported when parsing stuff back, I think we have to figure out exactly what we support and test it rather than support anything which seems to be what we do at the moment.

This adds fromXContent method and unit test for sort values that are part of
InternalSearchHit. In order to centralize serialisation and xContent parsing and
rendering code, move all relevant parts to a new class which can be unit tested
much better in isolation.This is part of the preparation for parsing search
responses on the client side.
@cbuescher
Copy link
Member Author

@javanna I updated this using the helper methods introduced in ElasticsearchAssertions and XContentHelper . What happened to XContentParserUtils.ensureFieldName()? I had to go back to my previous version of manual checks.

@javanna
Copy link
Member

javanna commented Dec 20, 2016

What happened to XContentParserUtils.ensureFieldName()? I had to go back to my previous version of manual checks.

it was removed as there were no usages for it. Feel free to add it back if it's going to be useful in other places too.

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a small comment, LGTM otherwise

*/
public static void ensureFieldName(XContentParser parser, Token token, String fieldName) throws IOException {
ensureExpectedToken(Token.FIELD_NAME, token, parser::getTokenLocation);
String current = parser.currentName() != null ? parser.currentName() : "<null>";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to compare to something?

Copy link
Member Author

@cbuescher cbuescher Dec 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? current is either the current tokens field name or <null> which I thought serves for printing the error message. I might be missing something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if the currentName is null we have a bigger problem. why do we do the comparison even? have you tried passing in fieldName set to ? if the current token is a field name, currentName should not be null ever. I would treat that corner case differently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, got it. Makes sense.

@cbuescher cbuescher merged commit bdecbb5 into elastic:master Dec 21, 2016
cbuescher added a commit that referenced this pull request Dec 21, 2016
This adds fromXContent method and unit test for sort values that are part of 
InternalSearchHit. In order to centralize serialisation and xContent parsing and
rendering code, move all relevant parts to a new class which can be unit tested
much better in isolation.This is part of the preparation for parsing search
responses on the client side.
@cbuescher
Copy link
Member Author

On 5.x with 9404411

jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Dec 21, 2016
* master:
  Simplify Unicast Zen Ping (elastic#22277)
  Replace IndicesQueriesRegistry (elastic#22289)
  Fixed document mistake and fit for 5.1.1 API
  [TEST] improve error message in ESTestCase#assertWarnings
  [TEST] remove deleted test classes from checkstyle suppressions
  [TEST] make ESSingleNodeTestCase tests repeatable (elastic#22283)
  Link for setting page in elasticsearch.yml is outdated
  Factor out sort values from InternalSearchHit (elastic#22080)
  Add ID for percolate query to Java API docs
  x_refresh.yaml tests should use unique index names and doc ids to ease debugging
  IndicesStoreIntegrationIT should not use start recovery sending as an indication that the recovery started
  Added base class for testing aggregators and some initial tests for `terms`, `top_hits` and `min` aggregations.
  Add link to foreach processor to ingest-attachment.asciidoc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants