Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classification analysis fails when dependent variable is of type text. #53876

Closed
przemekwitek opened this issue Mar 20, 2020 · 6 comments
Closed
Assignees
Labels
>bug :ml Machine learning

Comments

@przemekwitek
Copy link
Contributor

The error message is:

{
        "type" : "illegal_argument_exception",
        "reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [glass_type] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
      }
@przemekwitek przemekwitek self-assigned this Mar 20, 2020
@przemekwitek
Copy link
Contributor Author

This reproduces easily in integration test.

@przemekwitek
Copy link
Contributor Author

przemekwitek commented Mar 20, 2020

Ok, I was able to track down the root cause of this issue. As part of starting the analysis, we first perform cardinality aggregation on dependent_variable field to use it later for limit verification (see line 125 in ExtractedFieldsDetectorFactory.java) and only then we verify supported dependent_variable types in ExtractedFieldsDetector

This order is problematic as we simply cannot perform cardinality aggregation on a text field (it yields the error I provided in the first post).
So we would need to reverse this order and first check for supported types and then check cardinality.

@javanna
Copy link
Member

javanna commented Mar 20, 2020

heya @przemekwitek may I ask you to add labels to this issue please? Thanks!

@przemekwitek przemekwitek added :ml Machine learning >bug labels Mar 23, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@przemekwitek
Copy link
Contributor Author

heya @przemekwitek may I ask you to add labels to this issue please? Thanks!

Done.

@przemekwitek
Copy link
Contributor Author

PR #53874 makes the error message more understandable for the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :ml Machine learning
Projects
None yet
Development

No branches or pull requests

3 participants