Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML][Inference] Add support for multi-value leaves to the tree model #52531

Merged

Conversation

benwtrent
Copy link
Member

This adds support for multi-value leaves. This is a prerequisite for multi-class boosted tree classification.

One hitch in BWC is when there is a decision tree model with multi-value support, but it is serialized to a node that does not support multi-values in the leaves. I opted to throw in that situation instead of potentially losing information over the wire.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

double probOfClassOne = sigmoid(summation);
assert 0.0 <= probOfClassOne && probOfClassOne <= 1.0;
return Arrays.asList(1.0 - probOfClassOne, probOfClassOne);
return new double[] {1.0 - probOfClassOne, probOfClassOne};
Copy link
Contributor

@tveasey tveasey Feb 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite right: we turn the set of values into a collection of predicted probabilities via the softmax function, i.e. the i'th predicted probability is exp(values[i]) / sum_j{ exp(values[j]) }. I think it is also worthwhile dividing through by k = exp(max_j{ values[j] }) to handle the case the exp overflows: whence exp(values[i] - k) / sum_j{ (exp(values[j] - k)) }.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tveasey I will adjust to allow logistic_regression to be binomial and multinomial.

Your concerns about overflow are handled in the already existing Statistics#softMax function.

Thanks for the feedback!

@benwtrent benwtrent marked this pull request as ready for review February 20, 2020 15:45
out.writeDoubleArray(leafValue);
} else {
if (leafValue.length > 1) {
throw new IOException("[leaf_value] only supports multiple values after version 7.7.0");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this. To what extent does it cause a problem for the receiving node? Could it cause problems for the reading of a subsequent wire transfer? I don't know the serialisation code well enough to say whether this is OK or not.

Also, would it make sense to an end user who saw it? I guess they'd see a remote transport exception from a 7.6 coordinating node wrapping this exception. But which endpoint would they have called? Would leaf_value make sense to somebody who had called that endpoint?

Finally, maybe we can defend against this at an earlier stage by banning creation of multi-class classification jobs until all nodes in the cluster are on version 7.7.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To what extent does it cause a problem for the receiving node?

Receiver node gets an exception. The serialization fails.

This type of thing is done in aggregations as well https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/search/aggregations/bucket/composite/CompositeValuesSourceParserHelper.java#L72

I think throwing here is OK.

As for the message, I will fix it. I think something like Multi-class classification model attempted to serialize to node before version X. Multi-class classification is only supported if the node is greater than version X

maybe we can defend against this at an earlier stage by banning creation of multi-class classification jobs until all nodes in the cluster are on version 7.7.

I agree, right now it is not possible to create a multi-class model via the analytics process. Once it is, we should defend against it there.

Additionally, on PUT I can verify the minimum node version

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@benwtrent benwtrent merged commit e39eade into elastic:master Feb 27, 2020
@benwtrent benwtrent deleted the feature/ml-inference-multi-value-leaves branch February 27, 2020 17:17
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Feb 27, 2020
…lastic#52531)

This adds support for multi-value leaves. This is a prerequisite for multi-class boosted tree classification.
benwtrent added a commit that referenced this pull request Feb 27, 2020
…model (#52531) (#52901)

* [ML][Inference] Add support for multi-value leaves to the tree model (#52531)

This adds support for multi-value leaves. This is a prerequisite for multi-class boosted tree classification.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants