Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Transform] add support for filter aggregation #52151

Closed
hendrikmuhs opened this issue Feb 10, 2020 · 2 comments · Fixed by #52483
Closed

[Transform] add support for filter aggregation #52151

hendrikmuhs opened this issue Feb 10, 2020 · 2 comments · Fixed by #52483
Labels
:ml/Transform Transform

Comments

@hendrikmuhs
Copy link
Contributor

hendrikmuhs commented Feb 10, 2020

Filter aggregation cover some nice usecases.

For example to gather stats for response codes:

   "aggregations": {
      "404": {
        "filter": {
          "term": {
            "response": "404"
          }
        }
      },
      "200": {
        "filter": {
          "term": {
            "response": "200"
          }
        }
      },
      "503": {
        "filter": {
          "term": {
            "response": "503"
          }
        }
      }

Note that filter supports sub aggregations which makes it hard to decide on the right structure in the transform destination index. For the simple example above the output structure could be:

   {
      "geo" : {
        "src" : "CM"
      },
      "200" : 41,
      "404" : 2,
      "503" : 0
    },
    {
      "geo" : {
        "src" : "CN"
      },
      "200" : 2415,
      "404" : 138,
      "503" : 89
    },
    {
      "geo" : {
        "src" : "CO"
      },
      "200" : 76,
      "404" : 8,
      "503" : 3
    },

For this the doc_count of the ouput is used as flat result.

If you specify a sub-aggregation, we can not provide a flat result, because we need a nested object. For this case we could fallback to:

"my_agg_field": {
    "doc_count": 42,
    "sub_agg_field": {
        # sub agg result  
    }  
}

If a sub aggregation is used, the user probably does not care about the doc_count field, but we do not know. For getting rid of id, you can use a pipeline.

Discuss

  • Should we have flattened results if filter specifies no sub aggregation?
  • If a sub-agg is given, how should the result look like?
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml/Transform)

@hendrikmuhs
Copy link
Contributor Author

hendrikmuhs commented Feb 12, 2020

We discussed this issue and agreed on flatten the result:

      "200" : 41,
      "404" : 2,
      "503" : 0

In case of sub-aggregations the result will be nested for the leaf aggregation. The field doc_count will be omitted for all inner aggregation results. It seems natural that you only care about the leaf results, at least for the majority of cases.

I case you want doc_count for an inner aggregation you can add a value_count sub-aggregation which also has the advantage of naming the output field.

Example:

"aggregations": {
      "os": {
        "terms": {
          "field": "machine.os.keyword"
        },
        "aggs": {
          "agent": {
            "terms": {
              "field": "agent.keyword"
            }
          },
          "count": {
            "value_count": {
              "field": "machine.os.keyword"
            }
          }
        }
      }
    }

results in:

"os" : {
        "osx" : {
          "agent" : {
            "Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1" : 2,
            "Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.50 Safari/534.24" : 2,
            "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" : 1
          },
          "count" : 5.0
        },
        "win xp" : {
          "agent" : {
            "Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.50 Safari/534.24" : 2
          },
          "count" : 2.0
        },
...

hendrikmuhs pushed a commit that referenced this issue Feb 21, 2020
add support for filter aggregations, refactor code for sub-aggregation support in mapping
deduction

fixes #52151
hendrikmuhs pushed a commit that referenced this issue Feb 21, 2020
add support for filter aggregations, refactor code for sub-aggregation support in mapping
deduction

fixes #52151
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml/Transform Transform
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants