New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use NamedWritable to enable GeoBoundingBox serialisation #99163
Use NamedWritable to enable GeoBoundingBox serialisation #99163
Conversation
Pinging @elastic/es-analytics-geo (Team:Analytics) |
Hi @craigtaverner, I've created a changelog YAML for you. |
server/src/main/java/org/elasticsearch/common/io/stream/StreamInput.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/io/stream/StreamOutput.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/geo/BoundingBox.java
Outdated
Show resolved
Hide resolved
a142cc8
to
9b9c3fc
Compare
server/src/main/java/org/elasticsearch/common/io/stream/StreamOutput.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/io/stream/StreamOutput.java
Show resolved
Hide resolved
BoundingBox allows inter-node serialization only within aggs results as components of aggregations that themselves use NamedWritable. However, if a GeoBoundingBox is added to search hits serialization fails as this type is not explictly supported by StreamOuput/StreamInput. This was never noticed before because the only way to get a BoundingBox into the hits is through a painless script, something that appears to have not been used by customers, and is only tested internally using a single-node, single-shard yaml test, which does not touch this code path. We only found this issue due to recent effort to run all yaml tests in a serverless environment, which appears to always require serialization from backend data nodes to a coordinator node. Having said that, it is a surface API that could be used by users, and so should be fixed. While the missing type could be added explicitly, a much more flexible approach is to allow named classes to encapsulate their own serialization/deserialization logic. Here we use NamedWritable for that purpose. Fixes elastic#99089
So only classes that implement this specific interface will benefit from this additional capability. Also we implement VersionedNamedWriteable to ensure that TransportVersion checking is done.
Co-authored-by: David Turner <david.turner@elastic.co>
d0e026c
to
e7a534e
Compare
The parent class AbstractNamedWriteableTestCase was not using the writeGenericValue and readGenericValue that this work changed, so we override that behaviour to ensure we're testing the new code. We also add a new test method for negative testing of errors thrown when trying to serialize using an older TransportVersion. We do not throw similar errors on reading, with the assumption that the reader can always read things it knows about, regardless of the age of the sender. Taking account the TransportVersion in the reader only make sense if reading a known type where the format has changed.
This is one reason why we went for a generic approach, allowing plugins to register additional GenericNamedWriteables that can be serialized/deserialized by the StreamInput/StreamOutput code without the server depending on xpack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, just a few small comments.
server/src/main/java/org/elasticsearch/common/io/stream/StreamOutput.java
Show resolved
Hide resolved
|
||
// Test writing fails with older version | ||
try { | ||
copyInstance(testInstance, older, valid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just verify that writing to an older stream fails, no need to make a copyInstance
which tries to read it back again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. After simplifying the test I replaced this with just the writing code.
} | ||
|
||
// Test reading succeeds with older version (we do not cripple the reader) | ||
deserializedInstance = copyInstance(testInstance, valid, older); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this doesn't really make sense, we are always sure that reading and writing always use the same transport version, and the message we're creating isn't actually something an older node would be able to create.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I've simplified this failure test to only test the failure case. The first part with (valid,valid) was already tested in another test testSerialization
, and the last part is unnecessary:
Co-authored-by: David Turner <david.turner@elastic.co>
….com:craigtaverner/elasticsearch into supportNamedWritableSerializationBoundingBox
@Override | ||
protected GenericNamedWriteable copyInstance(GenericNamedWriteable original, TransportVersion version) throws IOException { | ||
return copyInstance(original, version, version); | ||
} | ||
|
||
/** | ||
* Read and write with different serialization versions | ||
*/ | ||
protected GenericNamedWriteable copyInstance( | ||
GenericNamedWriteable original, | ||
TransportVersion writeVersion, | ||
TransportVersion readVersion | ||
) throws IOException { | ||
try (BytesStreamOutput output = new BytesStreamOutput()) { | ||
output.setTransportVersion(writeVersion); | ||
output.writeGenericValue(original); | ||
try (StreamInput in = new NamedWriteableAwareStreamInput(output.bytes().streamInput(), getNamedWriteableRegistry())) { | ||
in.setTransportVersion(readVersion); | ||
return readGenericValue(in); | ||
} | ||
} | ||
} | ||
|
||
private GenericNamedWriteable readGenericValue(StreamInput in) throws IOException { | ||
Object obj = in.readGenericValue(); | ||
if (obj instanceof GenericNamedWriteable result) { | ||
return result; | ||
} | ||
throw new IllegalStateException("Read result of wrong type: " + obj.getClass().getSimpleName()); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer needed, we can just use the copyInstance
implementation in the subclass.
@Override | |
protected GenericNamedWriteable copyInstance(GenericNamedWriteable original, TransportVersion version) throws IOException { | |
return copyInstance(original, version, version); | |
} | |
/** | |
* Read and write with different serialization versions | |
*/ | |
protected GenericNamedWriteable copyInstance( | |
GenericNamedWriteable original, | |
TransportVersion writeVersion, | |
TransportVersion readVersion | |
) throws IOException { | |
try (BytesStreamOutput output = new BytesStreamOutput()) { | |
output.setTransportVersion(writeVersion); | |
output.writeGenericValue(original); | |
try (StreamInput in = new NamedWriteableAwareStreamInput(output.bytes().streamInput(), getNamedWriteableRegistry())) { | |
in.setTransportVersion(readVersion); | |
return readGenericValue(in); | |
} | |
} | |
} | |
private GenericNamedWriteable readGenericValue(StreamInput in) throws IOException { | |
Object obj = in.readGenericValue(); | |
if (obj instanceof GenericNamedWriteable result) { | |
return result; | |
} | |
throw new IllegalStateException("Read result of wrong type: " + obj.getClass().getSimpleName()); | |
} |
(also needs a spotless run to clean up imports)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately the parent version does not call output.writeGenericValue
or in.readGenericValue
which are the methods we added support for this to. Instead it calls directly the writeNamedWriteable
and readNamedWriteable
methods, skipping all the new code and version checks. But I see your point that at least I do not need the two-version version of the method, so I removed that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
acked, I think I can see a way to make this a little less special, but let's merge this and I'll follow up with a suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (left a comment about some now-unnecessary test code tho)
* Use NamedWritable to enable GeoBoundingBox serialization BoundingBox allows inter-node serialization only within aggs results as components of aggregations that themselves use NamedWritable. However, if a GeoBoundingBox is added to search hits serialization fails as this type is not explictly supported by StreamOuput/StreamInput. This was never noticed before because the only way to get a BoundingBox into the hits is through a painless script, something that appears to have not been used by customers, and is only tested internally using a single-node, single-shard yaml test, which does not touch this code path. We only found this issue due to recent effort to run all yaml tests in a serverless environment, which appears to always require serialization from backend data nodes to a coordinator node. Having said that, it is a surface API that could be used by users, and so should be fixed. While the missing type could be added explicitly, a much more flexible approach is to allow named classes to encapsulate their own serialization/deserialization logic. Here we use NamedWritable for that purpose. Fixes elastic#99089 * Update docs/changelog/99163.yaml * Use GenericNamedWriteable to limit this to specific classes So only classes that implement this specific interface will benefit from this additional capability. Also we implement VersionedNamedWriteable to ensure that TransportVersion checking is done. * Code style fix * Explicit checks of version support for GenericNamedWritable Co-authored-by: David Turner <david.turner@elastic.co> * Simplify GenericNamedWriteable checks * Move NamedWriteable methods into concrete classes * Added assert on GenericNamedWriteable min supported version * Unit tests for serialization of GenericNamedWriteable * Ensure serialization tests use the generic code path The parent class AbstractNamedWriteableTestCase was not using the writeGenericValue and readGenericValue that this work changed, so we override that behaviour to ensure we're testing the new code. We also add a new test method for negative testing of errors thrown when trying to serialize using an older TransportVersion. We do not throw similar errors on reading, with the assumption that the reader can always read things it knows about, regardless of the age of the sender. Taking account the TransportVersion in the reader only make sense if reading a known type where the format has changed. * Support CartesianBoundingBox in StreamOutput/Input This is one reason why we went for a generic approach, allowing plugins to register additional GenericNamedWriteables that can be serialized/deserialized by the StreamInput/StreamOutput code without the server depending on xpack. * Update server/src/main/java/org/elasticsearch/TransportVersion.java Co-authored-by: David Turner <david.turner@elastic.co> * Code review updates - simplify negative test to only test failure case * Code review - remove unused method after simplifying test * Simplify GeoBoundsGenericWriteableTests --------- Co-authored-by: David Turner <david.turner@elastic.co>
BoundingBox
allows inter-node serialisation only within aggs results as components of aggregations that themselves useNamedWritable
. However, if aGeoBoundingBox
is added to searchhits
, serialisation fails as this type is not explicitly supported byStreamOuput/StreamInput
. This was never noticed before because the only way to get aBoundingBox
into thehits
is through a painless script, something that appears to have not been used by customers, and is only tested internally using a single-node, single-shard yaml test, which does not touch this code path. We only found this issue due to recent effort to run all yaml tests in a serverless environment, which appears to always require serialisation from backend data nodes to a coordinator node.Having said that, it is a surface API that could be used by users, and so should be fixed. While the missing type could be added explicitly, a much more flexible approach is to allow named classes to encapsulate their own serialization/deserialization logic. Here we use
NamedWritable
for that purpose.Tasks:
Fixes #99089