SelectFirstClassifier will cause the predicted value of pmml to all become 0? #370

liuhuanshuo · 2023-02-04T02:08:31Z

When I use SelectFirstClassifier, I found that in some cases, it will cause the predicted value of pmml to be all 0

Use SelectFirstClassifier

As shown in the following code, I built a pipeline based on SelectFirstClassifier

pipeline_t = PMMLPipeline([
    ("mapper_3",mapper_3),
    ('conditional_pipeline',
    SelectFirstClassifier([
        ('model_1', model_1, "X[-2] is not None"), 
        ('model_2', model_2, "X[-2] is None")]))])

If use the pipeline_t here for prediction, I will get the following results

Now, I converted this pipeline_t to pmml and used the same data to make predictions, and found that the results all became 0!

Do not use SelectFirstClassifier

Obviously, the above results are not as expected, so I suspected SelectFirstClassifier, so I modified the code as follows

pipeline_t = PMMLPipeline([
            ("mapper_3",mapper_3),
            ('conditional_pipeline',model_1)])

Of course the only thing I did was remove the SelectFirstClassifier，The prediction result of the pipeline is still normal

At this time, it is converted to a pmml file, and the prediction result is also in line with expectations

So can it be explained that SelectFirstClassifier will affect the prediction results of the pmml file?

The text was updated successfully, but these errors were encountered:

vruusmann · 2023-02-04T07:24:12Z

You have two issues combined in this report.

First, about the predicted probability values being all 0. This is an OutputField mis-placement issue - recent versions of SkLearn2PMML package try to "optimize" the layout of PMML documents by moving "common" OutputField elements from the child model level (eg. /PMML/MiningModel/Segmentation/RegressionModel) to the parent model level (eg. /PMML/MiningModel).

The JPMML-lEvaluator library gets confused about this, and thinks that it is dealing with a non-probabilistic model, and doesn't need to provide predicted probability values at all.

Second, the precision of predicted probability values changing (eg. from 12 decimal places to 7 decimal places) when switching between language environments (direct Python, Scikit-Learn over Python, PMML).

This appears like a Python runtime configuration issue - something related to "console display preferences". The PMML engine is definitely performing any explicit rounding, it always return float32 values for single-precision models (ie. Model@mathContext="float") and float64 values for double-precision models (ie. Model@mathContext="double").

vruusmann · 2023-02-04T07:30:20Z

Recent versions of SkLearn2PMML package try to "optimize" the layout of PMML documents by moving "common" OutputField elements from the child model level to the parent model level.

As a workaround, I'd suggest that you export the model ensemble as sklearn2pmml.ensemble.EstimatorChain (instead of a sklearn2pmml.ensemble.SelectFirstClassifier).

Keep a parallel SelectFirstEstimator object when you need to perform predict_proba(X) in Python environment.

Something like this:

# Shared steps - all SkLearn2PMML ensemble model types expect the same step layout
steps = [...]

# Fit and export
pmml_estimator = EstimatorChain(steps, multioutput = False)
pmml_estimator.fit(X, y)

sklearn2pmml(pmml_estimator)

# Predict probabilities
py_estimator = SelectFirstClassifier(steps)
# The steps have already been fitted, so the object is ready for predict_proba(X) as-is
py_estimator.predict_proba(X)

vruusmann · 2023-02-04T07:31:49Z

The ultimate fix would be to add EstimatorChain.predict_proba(X) method for non-multioutput cases, where all child estimators are subclasses of ClassifierMixin.

liuhuanshuo · 2023-02-07T06:10:43Z

Keep a parallel SelectFirstEstimator object when you need to perform predict_proba(X) in Python environment.

In fact, the reason I chose sklearn2pmml for conversion is to call pmml in Java for prediction.

Therefore, it is not a very reasonable method to keep a prediction scheme in the Python environment.

However, there doesn't seem to be any other way I can save the SelectFirstClassifier with the predicted probability values?

vruusmann · 2023-02-07T07:15:14Z

I'm thinking about refactoring SkLearn2PMML ensemble models in the following way:

Add EstimatorChain.predict_proba(X) method, which will work only multioutput = False mode, when all child models are classifiers.
Make SelectFirstClassifier and SelectFirstRegressor subclasses of EstimatorChain(multioutput = False).

Right now you could fix the SelectFirstClassifier PMML file by relocating OutputField elements from the top-level back to member model-level. Takes less than a minute to do in text editor. But it's very difficult to give you the instructions for doing so via GitHub.

liuhuanshuo · 2023-02-27T08:26:18Z

hi,villu

Nice to see you pushed a new version

But it seems that this problem is not implemented with the new version? After I save the pipeline containing SelectFirstClassifier as a pmml file, the predicted values are still all 0

Right now you could fix the SelectFirstClassifier PMML file by relocating OutputField elements from the top-level back to member model-level. Takes less than a minute to do in text editor.

So I need to modify the generated pmml now?

But it's very difficult to give you the instructions for doing so via GitHub.

I think I understand what you mean, do I need to move the OutputField below to another location?

I don't know if you can give a simple explanation, since your operation only takes 1 minute, I don't think it will be very complicated

vruusmann · 2023-02-27T08:36:22Z

After I save the pipeline containing SelectFirstClassifier as a pmml file, the predicted values are still all 0

The fix was addressed towards the sklearn2pmml.ensemble.EstimatorChain model type, which now provides the predict_proba(X) method. Please consider migrating from SelectFirstClassifier(steps) to EstimatorChain(steps, multioutput = False).

The latest SkLearn2PMML 0.91.0 release was about completely refactoring Python-to-PMML translation functionality (affecting ExpressionTransformer, the step predicates of SelectFirstEstimator, EstimatorChain and RuleSetClassifier).

The new translator has some crazy new capabilities (which I will blog about in short time). Also, the Python side evaluation should be 10x faster, because the expression/predicate is "precompiled" once, and then reused across all rows.

@liuhuanshuo The pandas.isnull(X) nullability check is now also supported in the predicate context. There shouldn't be any module import issues anymore.

liuhuanshuo · 2023-02-27T08:39:20Z

I observed many normal working pmml files myself, observed the position of OutputField, and tried some modifications.

I found that just moving the OutputField in front of the LocalTransformations seems to work

But this can only ensure that the predicted values are not all 0, and there are still problems in the prediction of rows with null values.

liuhuanshuo · 2023-02-27T08:40:28Z

Surprised we both replied at the same time, I will research yours first

vruusmann closed this as completed in 041a703 Feb 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SelectFirstClassifier will cause the predicted value of pmml to all become 0? #370

SelectFirstClassifier will cause the predicted value of pmml to all become 0? #370

liuhuanshuo commented Feb 4, 2023

vruusmann commented Feb 4, 2023

vruusmann commented Feb 4, 2023

vruusmann commented Feb 4, 2023

liuhuanshuo commented Feb 7, 2023

vruusmann commented Feb 7, 2023

liuhuanshuo commented Feb 27, 2023 •

edited

Loading

vruusmann commented Feb 27, 2023

liuhuanshuo commented Feb 27, 2023

liuhuanshuo commented Feb 27, 2023

SelectFirstClassifier will cause the predicted value of pmml to all become 0? #370

SelectFirstClassifier will cause the predicted value of pmml to all become 0? #370

Comments

liuhuanshuo commented Feb 4, 2023

Use SelectFirstClassifier

Do not use SelectFirstClassifier

vruusmann commented Feb 4, 2023

vruusmann commented Feb 4, 2023

vruusmann commented Feb 4, 2023

liuhuanshuo commented Feb 7, 2023

vruusmann commented Feb 7, 2023

liuhuanshuo commented Feb 27, 2023 • edited Loading

vruusmann commented Feb 27, 2023

liuhuanshuo commented Feb 27, 2023

liuhuanshuo commented Feb 27, 2023

liuhuanshuo commented Feb 27, 2023 •

edited

Loading