You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This came out of a Slack chat with @peteharverson. It was also something that was brought up on the IRC channel during the recent ML webinar.
We would expect categorization to be applied to log messages, and we would expect people to be storing log messages in text fields, because that's what you have to do to make use of Elasticsearch's text search.
Additionally, the reverse search terms we generate as an output of categorization can only be used to efficiently search text fields.
However, at present we make it very hard for people to use a text field as their categorization_field_name when feeding a job with a datafeed. They have to set the obscure "_source": true setting in the JSON.
I propose the following:
If a field of type text is selected as the categorization_field_name we automatically set "_source": true in the datafeed config
If a field that is not of type text is selected as the categorization_field_name we warn people that it's unlikely to work well with categorization
The text was updated successfully, but these errors were encountered:
I have created a snapshot dataset called it_ops_new_raw_snapshot of an index called it_ops_new_raw, that will help with testing this behaviour. This contains as a type called logs (which is just the old it_ops_app_logs dataset). Which has as it's mapping
for the message field both a type text and type keyword.
Original comment by @droberts195:
This came out of a Slack chat with @peteharverson. It was also something that was brought up on the IRC channel during the recent ML webinar.
We would expect categorization to be applied to log messages, and we would expect people to be storing log messages in
text
fields, because that's what you have to do to make use of Elasticsearch's text search.Additionally, the reverse search terms we generate as an output of categorization can only be used to efficiently search
text
fields.However, at present we make it very hard for people to use a
text
field as theircategorization_field_name
when feeding a job with a datafeed. They have to set the obscure"_source": true
setting in the JSON.I propose the following:
text
is selected as thecategorization_field_name
we automatically set"_source": true
in the datafeed configtext
is selected as thecategorization_field_name
we warn people that it's unlikely to work well with categorizationThe text was updated successfully, but these errors were encountered: