Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filtering aggregations #5458

Closed
jxstanford opened this issue Mar 18, 2014 · 7 comments
Closed

filtering aggregations #5458

jxstanford opened this issue Mar 18, 2014 · 7 comments
Assignees

Comments

@jxstanford
Copy link

When doing a nested aggregation against a specific doc type, and with a filter of a specific doc type, the results still return unfiltered results. Here's an example:

POST _all/summary_phys/_search
{
  "aggs": {
    "summary_phys_events": {
      "filter": {
        "type": {"value": "summary_phys"}
      },
      "aggs": {
        "events_by_date": {
          "date_histogram": {
            "field": "@timestamp",
            "interval": "300s",
            "min_doc_count": 0
          },
          "aggs": {
            "events_by_host": {
              "terms": {
                "field": "host.raw",
                "min_doc_count": 0
              },
              "aggs": {
                "avg_used": {
                  "avg": {
                    "field": "used"
                  }
                },
                "max_used": {
                  "max": {
                    "field": "used"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

I get buckets with entries matching hosts that do not show up in this doc type. For example, I have only 3 values for host in this doc type [compute-4, compute-2, compute-3], but I will get buckets back with hosts from other doc types like:

"events_by_host": {
                  "buckets": [
                     {
                        "key": "compute-4",
                        "doc_count": 11,
                        "max_used": {
                           "value": 4608
                        },
                        "avg_used": {
                           "value": 3677.090909090909
                        }
                     },
                     {
                        "key": "compute-2",
                        "doc_count": 8,
                        "max_used": {
                           "value": 4608
                        },
                        "avg_used": {
                           "value": 2304
                        }
                     },
                     {
                        "key": "compute-3",
                        "doc_count": 2,
                        "max_used": {
                           "value": 4608
                        },
                        "avg_used": {
                           "value": 4608
                        }
                     },
                     {
                        "key": "10.10.11.22:49509",
                        "doc_count": 0,
                        "max_used": {
                           "value": null
                        },
                        "avg_used": {
                           "value": null
                        }
                     },
                     {
                        "key": "controller",
                        "doc_count": 0,
                        "max_used": {
                           "value": null
                        },
                        "avg_used": {
                           "value": null
                        }
                     },
                     {
                        "key": "object-1",
                        "doc_count": 0,
                        "max_used": {
                           "value": null
                        },
                        "avg_used": {
                           "value": null
                        }
                     }
                  ]
            }

I believe that the extra hosts should be picked up by the aggregation filter if not by the URL path.

@jpountz
Copy link
Contributor

jpountz commented Jul 24, 2014

@jxstanford This is a known limitation of min_doc_count=0: it might return terms that don't match the query or filters. We just use the index in order to compute them.

@jpountz jpountz closed this as completed Jul 24, 2014
@bradvido
Copy link
Contributor

bradvido commented Feb 9, 2015

Is this considered a bug? It's certainly unexpected behavior that the terms aggregation will match terms that don't match the query filter when you set min_doc_count=0.

At the very least, this should be (better) documented here: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_minimum_document_count

@FabriZZio
Copy link

Is there an alternative way to retrieve aggregations with a doc count = 0 but using some query or filters?

For example: we have a use case with documents (products) in an ES index. Each product has a locale ("nl_be", "fr_fr", ...). When searching for all locale: "nl_be" documents, we would like to retrieve an aggregation on another field "category".

#...
"query": {
    "filtered": {
      "filter": {
        "term": {
          "locale": "nl_be"
        }
      }
    }
},
"aggs": {
"nested_categories": {
  "nested": {
    "path": "categories"
  },
  "aggs": {
    "categories": {
      "terms": {
        "field": "categories.title",
        "min_doc_count": 0,
        "size": 0
      }
    }
  }
}

This results in categories containing all documents, even with locale other than 'nl_be'.

There probably is a very good reason why this min_doc_count=0 behaves as is, but how can the above scenario be handled (returning category titles, with doc count = 0 and linked to documents with locale: "nl_be".

Thanks in advance!

@snkiran
Copy link

snkiran commented Feb 25, 2016

this is my query below:
but i need to display all the missing or null values in the result or output

IP: 52.8.97.179:9200
GET article_info/article/_search
{
"fields": [
"media_type"
],
"query": {
"filtered": {
"query": {
"query_string": {
"query": "pdf_url:.pdf"
}
},
"filter": {
"range": {
"mediadate": {
"from": "02-21-2016",
"to": "02-24-2016"
}
}
}
}
},
"size": 0,
"aggs": {
"days": {
"date_histogram": {
"field": "mediadate",
"interval": "day",
"format": "MM-dd-yyyy"
},
"aggs": {
"art_count": {
"terms": {
"field": "media_type",
"order" : { "_term" : "desc" }
}
}
}
}
}
}

@snkiran
Copy link

snkiran commented Feb 25, 2016

this is the result output: but i need to display all the missing or null values on date 02-23-2016 AND 02-24-2016,
please help me :

{
"took": 4,
"timed_out": false,
"_shards": {
"total": 10,
"successful": 10,
"failed": 0
},
"hits": {
"total": 13352,
"max_score": 0,
"hits": []
},
"aggregations": {
"days": {
"buckets": [
{
"key_as_string": "02-21-2016",
"key": 1456012800000,
"doc_count": 12496,
"art_count": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "wire",
"doc_count": 68
},
{
"key": "web",
"doc_count": 7371
},
{
"key": "social",
"doc_count": 36
},
{
"key": "service",
"doc_count": 68
},
{
"key": "print",
"doc_count": 101
},
{
"key": "media",
"doc_count": 36
},
{
"key": "blog",
"doc_count": 4920
}
]
}
},
{
"key_as_string": "02-22-2016",
"key": 1456099200000,
"doc_count": 854,
"art_count": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "wire",
"doc_count": 92
},
{
"key": "web",
"doc_count": 223
},
{
"key": "service",
"doc_count": 92
},
{
"key": "blog",
"doc_count": 539
}
]
}
},
{
"key_as_string": "02-23-2016",
"key": 1456185600000,
"doc_count": 1,
"art_count": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "web",
"doc_count": 1
}
]
}
},
{
"key_as_string": "02-24-2016",
"key": 1456272000000,
"doc_count": 1,
"art_count": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "web",
"doc_count": 1
}
]
}
}
]
}
}
}

@dadoonet
Copy link
Member

@snkiran You will have a better chance to get an answer on discuss.elastic.co.

@snkiran
Copy link

snkiran commented Feb 25, 2016

help me some one as soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants