Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] properly nesting objects in document source (#41901) #42077

Conversation

benwtrent
Copy link
Member

While working through use-cases, I found it impossible to push data to a range type field mapping. This was because we were not properly nesting documents in the JSON we are pushing to the index.

This change creates objects in the document source itself. This has two major benefits:

  • Allows _preview to show the mapped objects as they will be actually stored
  • Allows for more interesting and complex use cases with user provided mappings

Example use case that is enabled with this change:

PUT data-logs-by-client
{
  "mappings": {
    "properties": {
      "time_frame": {
          "type": "date_range"
        }
    }
  }
}

PUT _data_frame/transforms/data_log
{
  "source": {
    "index": "kibana_sample_data_logs"
  },
  "dest": {
    "index": "data-logs-by-client"
  },
  "pivot": {
    "group_by": {
      "machine.os": {"terms": {"field": "machine.os.keyword"}},
      "machine.ip": {"terms": {"field": "clientip"}}
    },
    "aggregations": {
      "time_frame.lte": {
        "max": {
          "field": "timestamp"
        }
      },
      "time_frame.gte": {
        "min": {
          "field": "timestamp"
        }
      }
    }
  }
}

This will result in an index where range queries are possible to determine which clients accessed the website over a given range.

Implementation details:

  • I did not choose to create objects in the group_by fields as I could not think of a use case for it. The mapping created still treats the fields as an object (machine in the above use case), the document source just does not show it as plainly. I could be convinced otherwise :)
  • I am not throwing an error when parsing and discovering duplicate fields, or objects that conflict. These validations should occur earlier in the process (see: [ML] verify that there are no duplicate leaf fields in aggs #41895) and any errors here should be logged then allowed, this is consistent with how we treat unsupported aggregations - validate ahead of time and log if something weird occurred.

Backport of #41901

* [ML] properly nesting objects in document source

* Throw exception on agg extraction failure, cause it to fail df

* throwing error to stop df if unsupported agg is found
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@benwtrent benwtrent merged commit 0931815 into elastic:7.x May 10, 2019
@benwtrent benwtrent deleted the feature/ml-df-better-handling-of-object-fields-7.x branch May 10, 2019 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants