Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize top-level pipeline aggs as part of InternalAggregations #40177

Merged
merged 5 commits into from
Mar 19, 2019

Conversation

javanna
Copy link
Member

@javanna javanna commented Mar 18, 2019

We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.

With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.

With #40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level InternalAggregations object.

Closes #40059

We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.

With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.

With elastic#40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level `InternalAggregations` object.

Closes elastic#40059
@javanna javanna added >bug :Search/Search Search-related issues that do not fall into other categories v7.0.0 v6.7.0 v8.0.0 v7.2.0 labels Mar 18, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@javanna javanna requested a review from jimczi March 18, 2019 20:20
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some minor comments but the change looks good to me. I wonder if we could, as a follow up, merge the pipeline aggregators and the internal aggregations in QuerySearchResult, might be tricky to handle bwc though so def not in the scope of this pr.

* Constructs a new aggregation providing its {@link InternalAggregation}s and {@link SiblingPipelineAggregator}s
*/
public InternalAggregations(List<InternalAggregation> aggregations, List<SiblingPipelineAggregator> topLevelPipelineAggregators) {
super(aggregations);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other ctr could call this one with: this(aggregations, Collections.emptyList()) ensuring that the topLevelPipelineAggregators list is never null ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added to my TODO list to convert this class to Writeable.

}

@Override
@SuppressWarnings("unchecked")
public void writeTo(StreamOutput out) throws IOException {
out.writeNamedWriteableList((List<InternalAggregation>)aggregations);
//TODO update version after backport
if (out.getVersion().onOrAfter(Version.V_8_0_0)) {
if (topLevelPipelineAggregators == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we rely on an empty list if the aggregations are completely reduced ? This way we don't need the boolean and can call the write list directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I initially did not do it but I just realized that I can. I was worried about cases with CCS where we receive from e.g. 6.6 and then write the same object to e.g. 7.x. I just need to set the list to an empty one in that case too which removes the need for the list to be nullable 100%.

}

//TODO update version and rename after backport
public void testSerializationFromPre_8_0_0() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the intent of this test and I know that we have similar tests elsewhere but I think it should be moved to a rest test or omitted if we are confident that the existing rest tests are enough to test the bwc serialization. This test checks an internal class that we are allowed to change in a minor release (even a patch release) so I don't think we should use a static representation of the serialization that we'll need to change every time we make a modification to the serialization.

Copy link
Member Author

@javanna javanna Mar 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that yaml tests are overkill for this matter, as they are integration tests and take much longer to run. After the backport, this static version of the object is the binary representation of how we serialized the object prior to 6.7.0 (6.7.1 depending on what release the PR makes), which I am pretty sure we will not change. I can add a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that yaml tests are overkill for this matter, as they are integration tests and take much longer to run.

They are overkilled if we write them only to test serialization but since we have some ccs rest tests already it shouldn't be too costly to add one that checks the support for pipeline aggregations. I also agree that we will probably not change the serialization of this class in 6.7.x but my point was more about the general idea of adding serialized bytes from a previous version in a unit test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to do integration tests for this scenario as part of #40038 , I wanted to add coverage there for the the field collapsing bug as well. I prefer the new java test over the yaml ones personally. But our current CCS integration test don't run against multiple versions, while this unit test makes sure that we can read something that was written from e.g. 6.6 compared to simulating that by calling readFrom on master and setting the version to 6.6. Do you see what I mean? Or am I missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the intent but I forgot that we don't run the bwc tests in every module, let's leave it like this for now and we can discuss further in #40038

@javanna
Copy link
Member Author

javanna commented Mar 18, 2019

I wonder if we could, as a follow up, merge the pipeline aggregators and the internal aggregations in QuerySearchResult, might be tricky to handle bwc though so def not in the scope of this pr.

yes that is also my goal, we might be able to do this in master and 7.x, indeed bwc is tricky especially for CCS which spans multiple versions. I will work on this as a followup.

@javanna javanna merged commit 3c8970c into elastic:master Mar 19, 2019
javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019
…astic#40177)

We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.

With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.

With elastic#40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level `InternalAggregations` object.

Closes elastic#40059
javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019
Version conditionals are no longer needed once elastic#40177 is back-ported all the way to 6.7.
javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019
javanna added a commit that referenced this pull request Mar 19, 2019
* Remove version conditionals from InternalAggregations

Version conditionals are no longer needed once #40177 is back-ported all the way to 6.7.

* Disable bwc tests

Relates to #40177

* indentation
This was referenced Mar 19, 2019
javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019
…astic#40177)

We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.

With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.

With elastic#40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level `InternalAggregations` object.

Closes elastic#40059
javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019
javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019
…astic#40177)

We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.

With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.

With elastic#40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level `InternalAggregations` object.

Closes elastic#40059
javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019
javanna added a commit that referenced this pull request Mar 19, 2019
…0177)

We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.

With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.

With #40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level `InternalAggregations` object.

Closes #40059
javanna added a commit that referenced this pull request Mar 19, 2019
javanna added a commit that referenced this pull request Mar 19, 2019
…0177)

We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.

With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.

With #40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level `InternalAggregations` object.

Closes #40059
javanna added a commit that referenced this pull request Mar 19, 2019
javanna added a commit that referenced this pull request Mar 19, 2019
…0177)

We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.

With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.

With #40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level `InternalAggregations` object.

Closes #40059
javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019
Relates to elastic#40177 which is now merged and backported to all branches.
javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019
Relates to elastic#40177 which is now merged and backported to all branches.
javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019
Relates to elastic#40177 which is now merged and backported to all branches.
javanna added a commit that referenced this pull request Mar 19, 2019
Relates to #40177 which is now merged and backported to all branches.
javanna added a commit that referenced this pull request Mar 19, 2019
Relates to #40177 which is now merged and backported to all branches.
javanna added a commit that referenced this pull request Mar 19, 2019
Relates to #40177 which is now merged and backported to all branches.
javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 21, 2019
As part of elastic#40177 we have added top-level pipeline aggs to
`InternalAggregations`. Given that `QuerySearchResult` holds an
`InternalAggregations` instance, there is no need to keep on setting
top-level pipeline aggs separately. Top-level pipeline aggs can then
always be transported through `InternalAggregations`. Such change is
made in a backwards compatible manner.
javanna added a commit that referenced this pull request Mar 28, 2019
As part of #40177 we have added top-level pipeline aggs to
`InternalAggregations`. Given that `QuerySearchResult` holds an
`InternalAggregations` instance, there is no need to keep on setting
top-level pipeline aggs separately. Top-level pipeline aggs can then
always be transported through `InternalAggregations`. Such change is
made in a backwards compatible manner.
javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 29, 2019
As part of elastic#40177 we have added top-level pipeline aggs to
`InternalAggregations`. Given that `QuerySearchResult` holds an
`InternalAggregations` instance, there is no need to keep on setting
top-level pipeline aggs separately. Top-level pipeline aggs can then
always be transported through `InternalAggregations`. Such change is
made in a backwards compatible manner.
javanna added a commit that referenced this pull request Mar 29, 2019
As part of #40177 we have added top-level pipeline aggs to
`InternalAggregations`. Given that `QuerySearchResult` holds an
`InternalAggregations` instance, there is no need to keep on setting
top-level pipeline aggs separately. Top-level pipeline aggs can then
always be transported through `InternalAggregations`. Such change is
made in a backwards compatible manner.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories v6.7.0 v7.0.0-rc1 v7.2.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CCS: sibling pipeline aggregations don't work when minimizing roundtrips
5 participants