Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: multipleModelMethod="max" when multiple classes have max #53

Closed
infiton opened this issue Mar 19, 2017 · 4 comments
Closed

Question: multipleModelMethod="max" when multiple classes have max #53

infiton opened this issue Mar 19, 2017 · 4 comments

Comments

@infiton
Copy link

infiton commented Mar 19, 2017

Hi @vruusmann,

Looking through the implementation of multipleModelMethod="max" for classification, particularly: https://github.com/jpmml/jpmml-evaluator/blob/master/pmml-evaluator/src/main/java/org/jpmml/evaluator/ProbabilityAggregator.java#L207

Suppose we have a case with three segments that are predicting three classes and we have the following probabilities:

{a: 0.8, b: 0.1, c: 0.1},
{a: 0.5, b: 0.1, c: 0.4},
{a: 0.1, b: 0.1, c: 0.8}

Then using the max I would expect the average of the first and third model:

{a: 0.45, b: 0.1, c: 0.45}
  1. Is that your interpretation of the spec? max: consider the model(s) that have contributed the chosen probability for the winning category. Return their average probabilities;
  2. Will the implementation linked to above return that?
@vruusmann
Copy link
Member

vruusmann commented Mar 19, 2017

Is that your interpretation of the spec? max: consider the model(s) that have contributed the chosen probability for the winning category. Return their average probabilities;

The max multiple model method is pertinent to the MiningModel element, when the "member" model elements are returning probability distributions (ie. instances of HasProbabilityDistribution):
https://github.com/jpmml/jpmml-evaluator/blob/master/pmml-evaluator/src/main/java/org/jpmml/evaluator/mining/MiningModelEvaluator.java#L702

And I think the following is a correct implementation of the spec:
https://github.com/jpmml/jpmml-evaluator/blob/master/pmml-evaluator/src/main/java/org/jpmml/evaluator/ProbabilityAggregator.java#L96

{a: 0.45, b: 0.1, c: 0.45}. Will the implementation linked to above return that?

Here are my unit tests:
https://github.com/jpmml/jpmml-evaluator/blob/master/pmml-evaluator/src/test/java/org/jpmml/evaluator/ProbabilityAggregatorTest.java#L33

The important detail is that the tie resolution is a two step process. First, if there is more than one winner, then you compute the average of their probability distributions. Second, if there is still more than winner (eg. categories a and c still have a probability value of 0.45), then the result is the first category (as determined by the ordering of /PMML/DataDictionary/DataField/Value elements).

Feel free to implement your question as another unit test, and run it.

@infiton
Copy link
Author

infiton commented Mar 19, 2017

thanks for the quick response! is this tie resolution specific to your implementation, or do you think the spec outlines this?

@vruusmann
Copy link
Member

Is this tie resolution specific to your implementation, or do you think the spec outlines this?

This comes from the PMML spec. I can't pinpoint the exact quote at the moment, but you can find this "theme" repeated over and over when the spec is detailing classification-type models.

The JPMML-Evaluator library is very careful about maintaining the insertion order of values when constructing result objects. For example, that's the reason why field org.jpmml.evaluator.Classification#map is java.util.LinkedHashMap (and not java.util.HashMap):
https://github.com/jpmml/jpmml-evaluator/blob/master/pmml-evaluator/src/main/java/org/jpmml/evaluator/Classification.java#L60

@infiton
Copy link
Author

infiton commented Mar 19, 2017

ok thanks for the info, I will dig to find that! Really appreciate your quick responses 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants