Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Add ModelExplanation element with various model evaluation info to PMML for regressions/decision trees/other models #93

Open
sveta-levitan opened this issue Jan 9, 2019 · 4 comments

Comments

Projects
None yet
2 participants
@sveta-levitan
Copy link

commented Jan 9, 2019

We would like to use PMML for model visualization, and ModelExplanation element can be very useful for that. We would like to see it added to PMML, starting with the regressions and decision trees.

@vruusmann

This comment has been minimized.

Copy link
Member

commented Jan 9, 2019

The SkLearn2PMML/JPMML-SkLearn stack implements supports encoding basic metadata in the form of Extension elements. For example, attaching node impurity information:
https://github.com/jpmml/jpmml-sklearn/blob/1.5.9/src/test/resources/main.py#L563-L566

Of course, it would be desirable to "graduate" from custom Extension elements to standardized ModelExplanation elements.

@sveta-levitan I assume that your visualization tool is expecting IBM SPSS-style model explanations? I don't have access to IBM SPSS myself, so I would appreciate if you could share some relevant IBM SPSS-generated PMML documents about "well annotated" models.

@sveta-levitan

This comment has been minimized.

Copy link
Author

commented Jan 10, 2019

Thank you, Villu. I will look for some IBM SPSS examples, but in general it would be great to follow the standard: http://dmg.org/pmml/v4-3/ModelExplanation.html
Actually, it was created mostly by IBM before it acquired SPSS.
Node impurity is not in that standard, I think, and may be harder to put into ModelExplanation element. It is probably better to add this optional attribute to Node element in TreeModel. Let me look into that for a future PMML release, when I have time.
Thank you.
Svetlana.

@vruusmann

This comment has been minimized.

Copy link
Member

commented Jan 11, 2019

I will look for some IBM SPSS examples, but in general it would be great to follow the standard: http://dmg.org/pmml/v4-3/ModelExplanation.html

There are two parts to the solution:

  1. Recording model-level statistics using the ModelExplanation element.
  2. Recording decision tree node-level statistics. The Node element allows a Partition child element, which then may reference further elements in the ModelExplanation element?

I'm technically more intrigued by part two, as that would allow me to migrate away from the current Extension element based "hack".

Do you know any (public and-) successful implementations of the ModelExplanation element. Would like to see how people have done it so far.

it was created mostly by IBM before it acquired SPSS.

Here's an example IBM SPSS-created decision tree model:
https://github.com/pmservice/wml-sample-models/blob/master/pmml/iris-species/model/iris_chaid.xml

Its Node elements have Extension/X-Node/X-NodeStats child elements. Is there a formal specification about these IBM SPSS extension elements?

Node impurity is not in that standard, I think, and may be harder to put into ModelExplanation element.

IIRC, Scikit-Learn uses purity to estimate the goodness of fit with classification-type decision tree models, and (r)mse with regression-type ones. From the SkLearn2PMML/JPMML-SkLearn perspective there's not much difference between the two - the idea is that it's possible to pass a Python dict {node id : {extension_name : extension_value}} as the node_extensions conversion option.

The trouble with the current implementation is that extension names are locally devised. It would be much better if there was a significant overlap in the "vocabulary" of extension names between PMML producer software.

@sveta-levitan

This comment has been minimized.

Copy link
Author

commented Jan 11, 2019

Well, the purpose of Extensions was to overcome the lack of the standard features. Once we agree on a standard attribute/element names, we don't need Extensions anymore. Yes, we still have some extensions left in our PMML, but we worked hard to convert most extensions into new PMML features.
Here is an example of ModelExplanation element from a regression model that I have:

<ModelExplanation>
<PredictiveModelQuality targetField="price" numOfRecords="160" numOfRecordsWeighted="160" numOfPredictors="25" adj-r-squared="0.763848608437667" meanAbsoluteError="974.56875"/>
</ModelExplanation>

In addition, in ModelStats that PMML included the information normally found in Parameter Estimates table:

<MultivariateStat name="P0000056" stdError="708.315876229608" tValue="-1.43580009163924" dF="126" pValueFinal="0.153537205555158" confidenceLowerBound="-2418.73629598133" confidenceUpperBound="384.736295981331"/>

I will find an example for a classification model. Those usually include a confusion matrix and accuracy, at a minimum.
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.