Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] User-friendly experience for categorization of text fields #17997

Open
elasticmachine opened this issue May 25, 2017 · 2 comments
Open

[ML] User-friendly experience for categorization of text fields #17997

elasticmachine opened this issue May 25, 2017 · 2 comments

Comments

@elasticmachine
Copy link
Contributor

Original comment by @droberts195:

This came out of a Slack chat with @peteharverson. It was also something that was brought up on the IRC channel during the recent ML webinar.

We would expect categorization to be applied to log messages, and we would expect people to be storing log messages in text fields, because that's what you have to do to make use of Elasticsearch's text search.

Additionally, the reverse search terms we generate as an output of categorization can only be used to efficiently search text fields.

However, at present we make it very hard for people to use a text field as their categorization_field_name when feeding a job with a datafeed. They have to set the obscure "_source": true setting in the JSON.

I propose the following:

  1. If a field of type text is selected as the categorization_field_name we automatically set "_source": true in the datafeed config
  2. If a field that is not of type text is selected as the categorization_field_name we warn people that it's unlikely to work well with categorization
@elasticmachine
Copy link
Contributor Author

Original comment by @skearns64:

++, this will help the vast majority of users.

@elasticmachine
Copy link
Contributor Author

Original comment by @Harvey-Maddocks:

I have created a snapshot dataset called it_ops_new_raw_snapshot of an index called it_ops_new_raw, that will help with testing this behaviour. This contains as a type called logs (which is just the old it_ops_app_logs dataset). Which has as it's mapping
for the message field both a type text and type keyword.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant