Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

geohex_grid aggregation results missing doc_count_error_upper_bound and sum_other_doc_count fields #304

Open
philvarner opened this issue May 12, 2023 · 4 comments
Assignees
Labels
bug Something isn't working question Further information is requested

Comments

@philvarner
Copy link

What is the bug?

geohex_grid aggregation results missing doc_count_error_upper_bound and sum_other_doc_count fields.

How can one reproduce the bug?

  1. Index data with a geo_point field (named centroid in the example below).
  2. Run geohex_grid aggregation like:
POST /my_index/_search
{
  "aggs": {
    "grid_geohex_frequency": {
      "geohex_grid": {
        "field": "centroid",
        "precision": 3
      }
    }
  },
  "size": 0
}

The response is like:

"aggregations": {
    "grid_geohex_frequency": {
      "buckets": [ ... 10000 buckets ] 
  }
}

What is the expected behavior?

I would expect the response to also have doc_count_error_upper_bound and sum_other_doc_count fields when there are documents which are not covered by by the buckets returned, as is done with term aggregation.

What is your host/environment?

AWS OpenSearch Service 2.5

Do you have any screenshots?

n/a

Do you have any additional context?

no

@philvarner philvarner added bug Something isn't working untriaged labels May 12, 2023
@VijayanB
Copy link
Member

VijayanB commented May 14, 2023

I believe those parameters are part of term aggregation. Geo hex aggregation will consider all documents unlike terms where you can use size parameter to list only top buckets. This is consistent with other geo aggregation types like geo hash and geo tile

@VijayanB VijayanB self-assigned this May 14, 2023
@VijayanB VijayanB added question Further information is requested and removed bug Something isn't working untriaged labels May 14, 2023
@philvarner
Copy link
Author

geohex_grid also has a size parameter per the documentation, which defaults to 10000, so this is exactly like terms.

The geoaggregations should be consistent with terms, since they're just another type of bucket value. Not having these fields, especially sum_other_doc_count, limits the usefulness of geoaggregations, since you have to use a heuristic (bucket count is exactly 10000) to determine if there were documents that are not represented in the bucket values and you'd have to run separate count of determine how many documents were not represented.

@VijayanB VijayanB added the bug Something isn't working label May 15, 2023
@VijayanB
Copy link
Member

@philvarner You are absolutely right. Thanks for detailed explanation and reporting this bug, we will look into this issue. However, feel free to contribute if you know how to fix this issue.

@VijayanB
Copy link
Member

@nknize This is not available with other geo aggregations like geo_hash and geo_tile. I believe this can be added to BaseGeoGrid, so that all geo aggregations can inherit this value in response. Do you see an issue in enabling this feature for geo aggregations conceptually? Please share your feedback. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants