Feature request: Add ModelExplanation element with various model evaluation info to PMML for regressions/decision trees/other models #93
The SkLearn2PMML/JPMML-SkLearn stack implements supports encoding basic metadata in the form of
Of course, it would be desirable to "graduate" from custom
@sveta-levitan I assume that your visualization tool is expecting IBM SPSS-style model explanations? I don't have access to IBM SPSS myself, so I would appreciate if you could share some relevant IBM SPSS-generated PMML documents about "well annotated" models.
Thank you, Villu. I will look for some IBM SPSS examples, but in general it would be great to follow the standard: http://dmg.org/pmml/v4-3/ModelExplanation.html
There are two parts to the solution:
I'm technically more intrigued by part two, as that would allow me to migrate away from the current
Do you know any (public and-) successful implementations of the
Here's an example IBM SPSS-created decision tree model:
IIRC, Scikit-Learn uses purity to estimate the goodness of fit with classification-type decision tree models, and (r)mse with regression-type ones. From the SkLearn2PMML/JPMML-SkLearn perspective there's not much difference between the two - the idea is that it's possible to pass a Python dict
The trouble with the current implementation is that extension names are locally devised. It would be much better if there was a significant overlap in the "vocabulary" of extension names between PMML producer software.
Well, the purpose of Extensions was to overcome the lack of the standard features. Once we agree on a standard attribute/element names, we don't need Extensions anymore. Yes, we still have some extensions left in our PMML, but we worked hard to convert most extensions into new PMML features.
<ModelExplanation> <PredictiveModelQuality targetField="price" numOfRecords="160" numOfRecordsWeighted="160" numOfPredictors="25" adj-r-squared="0.763848608437667" meanAbsoluteError="974.56875"/> </ModelExplanation>
In addition, in ModelStats that PMML included the information normally found in Parameter Estimates table:
<MultivariateStat name="P0000056" stdError="708.315876229608" tValue="-1.43580009163924" dF="126" pValueFinal="0.153537205555158" confidenceLowerBound="-2418.73629598133" confidenceUpperBound="384.736295981331"/>
I will find an example for a classification model. Those usually include a confusion matrix and accuracy, at a minimum.