Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize top-level pipeline aggs as part of InternalAggregations #40177

Merged
merged 5 commits into from Mar 19, 2019

Conversation

Projects
None yet
4 participants
@javanna
Copy link
Member

commented Mar 18, 2019

We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.

With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.

With #40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level InternalAggregations object.

Closes #40059

Serialize top-level pipeline aggs as part of InternalAggregations
We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.

With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.

With #40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level `InternalAggregations` object.

Closes #40059
@elasticmachine

This comment has been minimized.

Copy link

commented Mar 18, 2019

@javanna javanna requested a review from jimczi Mar 18, 2019

javanna added some commits Mar 18, 2019

@jimczi

jimczi approved these changes Mar 18, 2019

Copy link
Member

left a comment

I left some minor comments but the change looks good to me. I wonder if we could, as a follow up, merge the pipeline aggregators and the internal aggregations in QuerySearchResult, might be tricky to handle bwc though so def not in the scope of this pr.

* Constructs a new aggregation providing its {@link InternalAggregation}s and {@link SiblingPipelineAggregator}s
*/
public InternalAggregations(List<InternalAggregation> aggregations, List<SiblingPipelineAggregator> topLevelPipelineAggregators) {
super(aggregations);

This comment has been minimized.

Copy link
@jimczi

jimczi Mar 18, 2019

Member

The other ctr could call this one with: this(aggregations, Collections.emptyList()) ensuring that the topLevelPipelineAggregators list is never null ?

This comment has been minimized.

Copy link
@javanna

javanna Mar 19, 2019

Author Member

I added to my TODO list to convert this class to Writeable.

}

@Override
@SuppressWarnings("unchecked")
public void writeTo(StreamOutput out) throws IOException {
out.writeNamedWriteableList((List<InternalAggregation>)aggregations);
//TODO update version after backport
if (out.getVersion().onOrAfter(Version.V_8_0_0)) {
if (topLevelPipelineAggregators == null) {

This comment has been minimized.

Copy link
@jimczi

jimczi Mar 18, 2019

Member

can we rely on an empty list if the aggregations are completely reduced ? This way we don't need the boolean and can call the write list directly.

This comment has been minimized.

Copy link
@javanna

javanna Mar 18, 2019

Author Member

yes, I initially did not do it but I just realized that I can. I was worried about cases with CCS where we receive from e.g. 6.6 and then write the same object to e.g. 7.x. I just need to set the list to an empty one in that case too which removes the need for the list to be nullable 100%.

}

//TODO update version and rename after backport
public void testSerializationFromPre_8_0_0() throws IOException {

This comment has been minimized.

Copy link
@jimczi

jimczi Mar 18, 2019

Member

I understand the intent of this test and I know that we have similar tests elsewhere but I think it should be moved to a rest test or omitted if we are confident that the existing rest tests are enough to test the bwc serialization. This test checks an internal class that we are allowed to change in a minor release (even a patch release) so I don't think we should use a static representation of the serialization that we'll need to change every time we make a modification to the serialization.

This comment has been minimized.

Copy link
@javanna

javanna Mar 18, 2019

Author Member

I think that yaml tests are overkill for this matter, as they are integration tests and take much longer to run. After the backport, this static version of the object is the binary representation of how we serialized the object prior to 6.7.0 (6.7.1 depending on what release the PR makes), which I am pretty sure we will not change. I can add a comment.

This comment has been minimized.

Copy link
@jimczi

jimczi Mar 18, 2019

Member

I think that yaml tests are overkill for this matter, as they are integration tests and take much longer to run.

They are overkilled if we write them only to test serialization but since we have some ccs rest tests already it shouldn't be too costly to add one that checks the support for pipeline aggregations. I also agree that we will probably not change the serialization of this class in 6.7.x but my point was more about the general idea of adding serialized bytes from a previous version in a unit test.

This comment has been minimized.

Copy link
@javanna

javanna Mar 18, 2019

Author Member

I plan to do integration tests for this scenario as part of #40038 , I wanted to add coverage there for the the field collapsing bug as well. I prefer the new java test over the yaml ones personally. But our current CCS integration test don't run against multiple versions, while this unit test makes sure that we can read something that was written from e.g. 6.6 compared to simulating that by calling readFrom on master and setting the version to 6.6. Do you see what I mean? Or am I missing something?

This comment has been minimized.

Copy link
@jimczi

jimczi Mar 18, 2019

Member

I understand the intent but I forgot that we don't run the bwc tests in every module, let's leave it like this for now and we can discuss further in #40038

@javanna

This comment has been minimized.

Copy link
Member Author

commented Mar 18, 2019

I wonder if we could, as a follow up, merge the pipeline aggregators and the internal aggregations in QuerySearchResult, might be tricky to handle bwc though so def not in the scope of this pr.

yes that is also my goal, we might be able to do this in master and 7.x, indeed bwc is tricky especially for CCS which spans multiple versions. I will work on this as a followup.

@javanna javanna merged commit 3c8970c into elastic:master Mar 19, 2019

8 checks passed

CLA All commits in pull request signed
Details
elasticsearch-ci/1 Build finished.
Details
elasticsearch-ci/2 Build finished.
Details
elasticsearch-ci/bwc Build finished.
Details
elasticsearch-ci/default-distro Build finished.
Details
elasticsearch-ci/docbldesx Build finished.
Details
elasticsearch-ci/oss-distro-docs Build finished.
Details
elasticsearch-ci/packaging-sample Build finished.
Details

javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019

Serialize top-level pipeline aggs as part of InternalAggregations (el…
…astic#40177)

We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.

With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.

With elastic#40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level `InternalAggregations` object.

Closes elastic#40059

javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019

Serialize top-level pipeline aggs as part of InternalAggregations (el…
…astic#40177)

We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.

With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.

With elastic#40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level `InternalAggregations` object.

Closes elastic#40059

@javanna javanna referenced this pull request Mar 19, 2019

Closed

Cumulative 6.7 backport #40190

javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019

Serialize top-level pipeline aggs as part of InternalAggregations (el…
…astic#40177)

We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.

With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.

With elastic#40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level `InternalAggregations` object.

Closes elastic#40059

javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019

javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019

Remove version conditionals from InternalAggregations
Version conditionals are no longer needed once elastic#40177 is back-ported all the way to 6.7.

javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019

javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019

Serialize top-level pipeline aggs as part of InternalAggregations (el…
…astic#40177)

We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.

With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.

With elastic#40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level `InternalAggregations` object.

Closes elastic#40059

javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019

javanna added a commit that referenced this pull request Mar 19, 2019

Remove version conditionals from InternalAggregations (#40193)
* Remove version conditionals from InternalAggregations

Version conditionals are no longer needed once #40177 is back-ported all the way to 6.7.

* Disable bwc tests

Relates to #40177

* indentation

@javanna javanna referenced this pull request Mar 19, 2019

Closed

Cumulative 7.x backport #40195

@javanna javanna referenced this pull request Mar 19, 2019

Closed

Cumulative 7.0 backport #40196

javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019

Serialize top-level pipeline aggs as part of InternalAggregations (el…
…astic#40177)

We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.

With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.

With elastic#40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level `InternalAggregations` object.

Closes elastic#40059

javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019

javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019

Serialize top-level pipeline aggs as part of InternalAggregations (el…
…astic#40177)

We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.

With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.

With elastic#40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level `InternalAggregations` object.

Closes elastic#40059

javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019

javanna added a commit that referenced this pull request Mar 19, 2019

Serialize top-level pipeline aggs as part of InternalAggregations (#4…
…0177)

We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.

With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.

With #40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level `InternalAggregations` object.

Closes #40059

javanna added a commit that referenced this pull request Mar 19, 2019

javanna added a commit that referenced this pull request Mar 19, 2019

Serialize top-level pipeline aggs as part of InternalAggregations (#4…
…0177)

We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.

With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.

With #40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level `InternalAggregations` object.

Closes #40059

javanna added a commit that referenced this pull request Mar 19, 2019

javanna added a commit that referenced this pull request Mar 19, 2019

Serialize top-level pipeline aggs as part of InternalAggregations (#4…
…0177)

We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.

With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.

With #40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level `InternalAggregations` object.

Closes #40059

javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019

Re-enable bwc tests
Relates to elastic#40177 which is now merged and backported to all branches.

javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019

Re-enable bwc tests
Relates to elastic#40177 which is now merged and backported to all branches.

javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 19, 2019

Re-enable bwc tests
Relates to elastic#40177 which is now merged and backported to all branches.

javanna added a commit that referenced this pull request Mar 19, 2019

Re-enable bwc tests (#40215)
Relates to #40177 which is now merged and backported to all branches.

javanna added a commit that referenced this pull request Mar 19, 2019

Re-enable bwc tests (#40218)
Relates to #40177 which is now merged and backported to all branches.

javanna added a commit that referenced this pull request Mar 19, 2019

Re-enable bwc tests (#40217)
Relates to #40177 which is now merged and backported to all branches.

javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 21, 2019

Move top-level pipeline aggs out of QuerySearchResult
As part of elastic#40177 we have added top-level pipeline aggs to
`InternalAggregations`. Given that `QuerySearchResult` holds an
`InternalAggregations` instance, there is no need to keep on setting
top-level pipeline aggs separately. Top-level pipeline aggs can then
always be transported through `InternalAggregations`. Such change is
made in a backwards compatible manner.

@michaelbaamonde michaelbaamonde added v7.0.0-rc1 and removed v7.0.0 labels Mar 25, 2019

pgomulka added a commit to pgomulka/elasticsearch that referenced this pull request Mar 25, 2019

Re-enable bwc tests (elastic#40215)
Relates to elastic#40177 which is now merged and backported to all branches.

javanna added a commit that referenced this pull request Mar 28, 2019

Move top-level pipeline aggs out of QuerySearchResult (#40319)
As part of #40177 we have added top-level pipeline aggs to
`InternalAggregations`. Given that `QuerySearchResult` holds an
`InternalAggregations` instance, there is no need to keep on setting
top-level pipeline aggs separately. Top-level pipeline aggs can then
always be transported through `InternalAggregations`. Such change is
made in a backwards compatible manner.

javanna added a commit to javanna/elasticsearch that referenced this pull request Mar 29, 2019

Move top-level pipeline aggs out of QuerySearchResult (elastic#40319)
As part of elastic#40177 we have added top-level pipeline aggs to
`InternalAggregations`. Given that `QuerySearchResult` holds an
`InternalAggregations` instance, there is no need to keep on setting
top-level pipeline aggs separately. Top-level pipeline aggs can then
always be transported through `InternalAggregations`. Such change is
made in a backwards compatible manner.

javanna added a commit that referenced this pull request Mar 29, 2019

Move top-level pipeline aggs out of QuerySearchResult (#40319)
As part of #40177 we have added top-level pipeline aggs to
`InternalAggregations`. Given that `QuerySearchResult` holds an
`InternalAggregations` instance, there is no need to keep on setting
top-level pipeline aggs separately. Top-level pipeline aggs can then
always be transported through `InternalAggregations`. Such change is
made in a backwards compatible manner.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.