Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Data frame analytics - exclude fields from dest index #49531

Closed
LucaWintergerst opened this issue Nov 25, 2019 · 1 comment · Fixed by #49690
Closed

[ML] Data frame analytics - exclude fields from dest index #49531

LucaWintergerst opened this issue Nov 25, 2019 · 1 comment · Fixed by #49690
Labels
>enhancement :ml Machine learning

Comments

@LucaWintergerst
Copy link
Contributor

LucaWintergerst commented Nov 25, 2019

Describe the feature:
Data frame analytics always reindexes all fields of an index. While it is possible to exclude individual documents, it is not possible to exclude a set of fields from being written to the dest index.

For outlier detection only numeric fields are required for analytics. However, all text fields also get copied to the output index.

I suggest adding _source filtering as part of the source definition like this.

PUT _ml/data_frame/analytics/test1
{
  "id": "test1",
  "source": {
    "index": [
      "demo-airbnb-listings-munich"
    ],
    "query": {
      "match_all": {}
    },
    "_source": { 
      "excludes": [ "foobar" ] 
    }
  },
  ...
}

This way the behaviour would be consistent with source filtering in queries or when using _reindex.

This is easy to implement too as we can make use of source filtering in the _reindex request that we make

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@dimitris-athanasiou dimitris-athanasiou changed the title data frame analytics - exclude fields from dest index [ML] Data frame analytics - exclude fields from dest index Nov 25, 2019
dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this issue Nov 28, 2019
This adds a `_source` setting under the `source` setting of a data
frame analytics config. The new `_source` is reusing the structure
of a `FetchSourceContext` like `analyzed_fields` does. Specifying
includes and excludes for source allows selecting which fields
will get reindexed and will be available in the destination index.

Closes elastic#49531
dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this issue Nov 29, 2019
This adds a `_source` setting under the `source` setting of a data
frame analytics config. The new `_source` is reusing the structure
of a `FetchSourceContext` like `analyzed_fields` does. Specifying
includes and excludes for source allows selecting which fields
will get reindexed and will be available in the destination index.

Closes elastic#49531
dimitris-athanasiou added a commit that referenced this issue Nov 29, 2019
This adds a `_source` setting under the `source` setting of a data
frame analytics config. The new `_source` is reusing the structure
of a `FetchSourceContext` like `analyzed_fields` does. Specifying
includes and excludes for source allows selecting which fields
will get reindexed and will be available in the destination index.

Closes #49531
dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this issue Nov 29, 2019
…lastic#49690)

This adds a `_source` setting under the `source` setting of a data
frame analytics config. The new `_source` is reusing the structure
of a `FetchSourceContext` like `analyzed_fields` does. Specifying
includes and excludes for source allows selecting which fields
will get reindexed and will be available in the destination index.

Closes elastic#49531

Backport of elastic#49690
dimitris-athanasiou added a commit that referenced this issue Nov 29, 2019
…49690) (#49718)

This adds a `_source` setting under the `source` setting of a data
frame analytics config. The new `_source` is reusing the structure
of a `FetchSourceContext` like `analyzed_fields` does. Specifying
includes and excludes for source allows selecting which fields
will get reindexed and will be available in the destination index.

Closes #49531

Backport of #49690
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this issue Jan 23, 2020
…tic#49690)

This adds a `_source` setting under the `source` setting of a data
frame analytics config. The new `_source` is reusing the structure
of a `FetchSourceContext` like `analyzed_fields` does. Specifying
includes and excludes for source allows selecting which fields
will get reindexed and will be available in the destination index.

Closes elastic#49531
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :ml Machine learning
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants