Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only return aggregations on the first page with scroll and forbidden with scan #7497

Closed
wants to merge 2 commits into from

Conversation

@jpountz
Copy link
Contributor

commented Aug 28, 2014

Aggregations are collection-wide statistics over one or several indices so:

  • scan should not be allowed to use aggregations since it never collects all documents at once,
  • scroll should only return aggregations on the first page (see #1642)
@jpountz jpountz added the review label Aug 28, 2014
@jpountz

This comment has been minimized.

Copy link
Contributor Author

commented Aug 28, 2014

For reference, I plan to add documentation for it, but I'm first looking for feedback to know whether this is the correct approach.

@martijnvg

This comment has been minimized.

Copy link
Member

commented Aug 28, 2014

@jpountz This approach looks good to me. I think this issue should be marked as breaking as well, since for normal scroll we did include the aggs also in subsequent responses.

@jpountz jpountz added the breaking label Aug 28, 2014
@jpountz

This comment has been minimized.

Copy link
Contributor Author

commented Aug 28, 2014

There is one open question about the behavior with search_type=scan: scan works by collecting just enough documents on each round while aggs work by collecting all documents in one go. So I don't think there is some way around collecting matches twice (once for aggs and once for the scan). So potentially there are two options:

  1. prevent the usage of aggs with scan (what the current PR does)
  2. or collect matches twice: once in the first call purely for aggregations, and then in a streaming fashion for the scan.

I like the fact that the first option doesn't do any magic and makes sure that scan is cheap in all cases but on the other hand, 2. could make scan/scroll more consistent with normal scroll (aggs returned in the first page and ignored in subsequent pages).

What do you think?

@jpountz

This comment has been minimized.

Copy link
Contributor Author

commented Aug 28, 2014

Side note: assuming 1. is implemented, 2. could be done from client side with negligible overhead (one round trip) by running first a count request to compute aggs and then a scan request to get hits back.

@costin

This comment has been minimized.

Copy link
Member

commented Aug 28, 2014

As a directly interested client :), I'll like to understand how the two requests would be an alternative to 2 (ideally considering the preference api as well).

@jpountz jpountz added the discuss label Aug 29, 2014
@s1monw

This comment has been minimized.

Copy link
Contributor

commented Aug 29, 2014

I personally like the fact that scan is simply a stream and I think we should keep it that way. IMO you can do the entire request in 2 steps (one for aggs and one for scan) this essentially means that you can potentially get slightly outdated aggregations but I think that is an acceptable solution. Scan requests should not support aggregations at all.

@jpountz

This comment has been minimized.

Copy link
Contributor Author

commented Aug 29, 2014

@costin I think you can have two requests by having one that is a SCAN and only fetches hits while another query would be of type COUNT and would compute aggregations. I don't think the preference API raises particular issues: if you use it to go to particular shards, that actually acts as a filter on the documents that match, so I think that is fine?

@clintongormley

This comment has been minimized.

Copy link
Member

commented Sep 1, 2014

I'm happy with scan not supporting aggs, and scroll-without-scan just returning aggs on the first request.

@jpountz jpountz added v2.0.0 and removed discuss labels Sep 3, 2014
jpountz added 2 commits Aug 28, 2014
Aggregations are collection-wide statistics so they would always be the same.
In order to save CPU/bandwidth, we can just return them on the first page.

Same as #1642 but for aggregations.
…_type=SCAN.

Aggregations are collection-wide statistics, which is incompatible with the
collection mode of search_type=SCAN since it doesn't collect all matches on
calls to the search API.

Close #7429
@jpountz jpountz force-pushed the jpountz:fix/aggs_scan_scroll branch to fc0748f Sep 3, 2014
@jpountz jpountz added enhancement and removed review labels Sep 3, 2014
@jpountz jpountz closed this Sep 3, 2014
@jpountz jpountz changed the title Aggregations: fix behavior when used in conjunction with scroll or scan Aggregations: fixed behavior when used in conjunction with scroll or scan Sep 3, 2014
@jpountz jpountz changed the title Aggregations: fixed behavior when used in conjunction with scroll or scan Aggregations: only returned on the first page with scroll and forbidden with scan Sep 3, 2014
@clintongormley clintongormley changed the title Aggregations: only returned on the first page with scroll and forbidden with scan Aggregations: Only returned on the first page with scroll and forbidden with scan Sep 11, 2014
@clintongormley clintongormley changed the title Aggregations: Only returned on the first page with scroll and forbidden with scan Only return aggregations on the first page with scroll and forbidden with scan Jun 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.