Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore option of supporting more flexible search types #12316

Closed
clintongormley opened this issue Jul 17, 2015 · 6 comments
Closed

Explore option of supporting more flexible search types #12316

clintongormley opened this issue Jul 17, 2015 · 6 comments
Labels
:Analytics/Aggregations Aggregations >enhancement high hanging fruit Meta :Search/Search Search-related issues that do not fall into other categories Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Meta label for search team

Comments

@clintongormley
Copy link

Today we have query_then_fetch and query_and_fetch. This imposes a limit on the types of search functionality we can support. For instance, if you want to auto-adjust the bucket interval so that your documents fit neatly into 10 buckets, you first need to determine the min and max values in order to calculate the correct interval (eg see #9572 and #9531).

This requires two round trips:

  • first determine the min/max values
  • calculate the required interval
  • do a second trip to bucket documents per interval

Or to improve term count accuracy in a terms agg, you could:

  • retrieve eg the top 20 terms from each shard
  • choose the top 10 overall
  • do a second trip (if needed) to get accurate counts for all terms

Or to guarantee that you get the top 10 terms overall:

  • first trip retrieves the top 20 terms per shard
  • calculate the overall top 10
  • take the doc count of the 10th term -> 10th_count
  • second trip retrieves all terms that have at least 10th_count / num_shards
  • third trip calculates accurate counts for all the terms returned by the second trip

Multiple search phases would also help with clustering algorithms

@colings86
Copy link
Contributor

#10217 will be required before we do this as decisions on how many phases are required will need to be made on the coordinating node so the query needs to be parsed there before we can do this.

Also this could get very complex since term count accuracy would require re-running the parent aggregations to get the right context (right documents) for the terms aggregation to work on for the accuracy round and would also require running the sub-aggregations on the accuracy round (and not on the initial round) to get the right values for the sub-aggregations. This gets even more complex if multiple terms aggregations are nested all with accuracy set to true.

@brettlyman
Copy link

We're seeing the same problem mentioned in #1305 that was closed since facets were deprecated, and we're using terms aggregations. We have a pretty complex setup with multiple shards and replicas per index, and the field being aggregated is a nested document.

When we do the terms aggregation we often see buckets with wrong counts, or even no buckets returned at all. If we change the terms aggregation to a filter aggregation looking for a specific value in the nested document that should result in a bucket, we get hits returned. Note that we're not looking for "top X" buckets, just returning all buckets and trying to get an accurate count.

I believe our queries were fine up until a couple of weeks ago, so perhaps there's a shard/routing/etc. setting that causes this to happen? Otherwise, please add my +1 to the request for a parameter to force accurate results, even though execution would be slower.

@colings86
Copy link
Contributor

@clintongormley do you think this could now be closed since we have the composite aggregation?

@clintongormley
Copy link
Author

@colings86 these changes are all about the top-n results, which you can't get with the composite agg without retrieving all results. i think these requests are still valid

@colings86
Copy link
Contributor

@elastic/es-search-aggs

@colings86 colings86 removed the discuss label Mar 13, 2018
@rjernst rjernst added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Meta label for search team labels May 4, 2020
@javanna
Copy link
Member

javanna commented Oct 13, 2022

This is a rather old issue that had no activity in a long while. There are no concrete plans to work on addressing it at this time, hence I am closing it.

@javanna javanna closed this as not planned Won't fix, can't repro, duplicate, stale Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >enhancement high hanging fruit Meta :Search/Search Search-related issues that do not fall into other categories Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests

5 participants