Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to combine LGBMClassifier and IsotonicRegressor into a single PMML? #146

Closed
liamjoy opened this issue Aug 4, 2020 · 10 comments

Comments

@liamjoy
Copy link

liamjoy commented Aug 4, 2020

I have been able to do the above by creating separate PMMLs for both LGBMClassifier and IsotonicRegressor, then copying the IsotonicRegressor PMML into the LGBM PMML as a final Segment in the chained model. I have looked into using StackedClassifier/Regressor but because LGBM is a classifier and Isotonic is a regressor, it does not allow it. I am also unable to use two models in a single pipeline as only one estimator is allowed. Is it possible to do this using a single pipeline or some other work-around?

@vruusmann
Copy link
Member

In plain english, what is this model chain supposed to do? What is the function of LightGBMClassifier, what is the function of IsotonicRegression?

Are you trying to "smooth" the prediction of the classifier?

I have looked into using StackedClassifier/Regressor but because LGBM is a classifier and Isotonic is a regressor, it does not allow it.

I assume you're referring to Scikit-Learn's stacking estimator classes, and that it is Scikit-Learn that prevents you from building such a model chain (not the SkLearn2PMML/JPMML-SkLearn stack).

I am also unable to use two models in a single pipeline as only one estimator is allowed

Possible workaround - the first estimator should be packaged as a transformer: jpmml/sklearn2pmml#118

@liamjoy
Copy link
Author

liamjoy commented Aug 4, 2020

LGBMClassifier takes in around 100 features to predict a binary target class. The isotonic regression is used to calibrate the model predictions to match a different distribution. The output should be a probability of the target being 1, after prediction calibration.

Thank you, I will look into packaging the LGBMClassifier as a transformer.

@vruusmann
Copy link
Member

vruusmann commented Aug 4, 2020

The isotonic regression is used to calibrate the model predictions to match a different distribution.

This looks like a "decision engineering" problem - taking the prediction of a model, and then doing something extra with it.

In such a case LGBMClassifier is still the primary/final estimator of the pipeline, and the challenge is about applying IsotonicRegression to its predicted probability.

Decision engineering is not supported by Scikit-Learn pipelines. However, the sklearn2pmml.pipeline.PMMLPipeline class lets you specify three attributes predict_transformer, predict_proba_transformer and apply_transformer to accomplish it: https://github.com/jpmml/sklearn2pmml/blob/0.61.0/sklearn2pmml/pipeline/__init__.py#L47-L51

Suppose you want to manually correct the predicted probability of a binary classifier:

pipeline = PMMLPipeline(.., predict_proba_transformer = ExpressionTransformer("X[1] * 0.95 + 0.1"))

You should package IsotonicRegression as a transformer instead.

@sidelmary
Copy link

Hi Villu!

Could you suggest a way to package IsotonicRegression as a transformer, please?
I tried ModelTransformer from #jpmml/sklearn2pmml#118, but bumped into

SEVERE: Failed to convert
java.lang.IllegalArgumentException: Attribute 'sklearn2pmml.pipeline.PMMLPipeline.predict_proba_transformer' has an unsupported value (Python class __main__.ModelTransformer)
	at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
	at org.jpmml.sklearn.PyClassDict.get(PyClassDict.java:57)
	at org.jpmml.sklearn.PyClassDict.getOptional(PyClassDict.java:67)
	at sklearn2pmml.pipeline.PMMLPipeline.getTransformer(PMMLPipeline.java:441)
	at sklearn2pmml.pipeline.PMMLPipeline.getPredictProbaTransformer(PMMLPipeline.java:433)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:101)
	at org.jpmml.sklearn.Main.run(Main.java:145)
	at org.jpmml.sklearn.Main.main(Main.java:94)
Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDict to sklearn.Transformer
	at java.lang.Class.cast(Unknown Source)
	at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:41)
	... 7 more

Exception in thread "main" java.lang.IllegalArgumentException: Attribute 'sklearn2pmml.pipeline.PMMLPipeline.predict_proba_transformer' has an unsupported value (Python class __main__.ModelTransformer)
	at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
	at org.jpmml.sklearn.PyClassDict.get(PyClassDict.java:57)
	at org.jpmml.sklearn.PyClassDict.getOptional(PyClassDict.java:67)
	at sklearn2pmml.pipeline.PMMLPipeline.getTransformer(PMMLPipeline.java:441)
	at sklearn2pmml.pipeline.PMMLPipeline.getPredictProbaTransformer(PMMLPipeline.java:433)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:101)
	at org.jpmml.sklearn.Main.run(Main.java:145)
	at org.jpmml.sklearn.Main.main(Main.java:94)
Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDict to sklearn.Transformer
	at java.lang.Class.cast(Unknown Source)
	at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:41)
	... 7 more

And I'm also not sure how to represent isotonic regression as an expression for ExpressionTransformer. The only idea which came to my mind is iteratively building a string with "if else" clauses with implementing extrapolation between values x in scipy.interpolate.interpolate.interp1d, which is the base of IsotonicRegression in sklearn. But it doesn't seem like a good solution to me.

Are there any other options to wrap IsotonicRegression in a transformer? Or maybe is there a better solution with ExpressionTransformer?

@vruusmann
Copy link
Member

java.lang.IllegalArgumentException: Attribute 'sklearn2pmml.pipeline.PMMLPipeline.predict_proba_transformer' has an unsupported value (Python class main.ModelTransformer)

Looks like you're trying to develop a custom transformer. You've implemented the Python side, but you still haven't implemented the Java side, plus informing the SkLearn2PMML package about it all.

Lately it's been discussed here: jpmml/sklearn2pmml#283

And I'm also not sure how to represent isotonic regression as an expression for ExpressionTransformer.

See the EstimatorTransformer class from the Scikit-Lego package (I decided to reuse an existing 3rd party class instead of coming up with my own).

Something like this:

from sklego.meta import EstimatorTransformer

# A pre-fitted Isotonic regression
isotonicRegression = ..

pipeline = PMMLPipeline(.., predict_proba_transformer = EstimatorTransformer(isotonicRegression))

@sidelmary
Copy link

Thanks for the quick response!

I found EstimatorTransformer in the supported packages list #https://github.com/jpmml/jpmml-sklearn and tried to use it but found two issues:

  1. Can't dump the pipeline with it to PMML, having the same issue: Attribute 'sklearn2pmml.pipeline.PMMLPipeline.predict_proba_transformer' has an unsupported value (Python class sklego.meta.estimator_transformer.EstimatorTransformer), probably it can be an issue with library version (I use sklearn2pmml==0.49.3)
  2. Can't use predict_proba_transform, got ValueError: Isotonic regression input should be a 1d array, since the output of the model predict_proba is 2d array, but isotonic expects 1d array.

Is there a workaround with using EstimatorTransformer? Or the only way is building a custom transformer?

@vruusmann
Copy link
Member

vruusmann commented Jun 23, 2021

it can be an issue with library version (I use sklearn2pmml==0.49.3)

Exactly - support for the sklego.meta.EstimatorTransformer transformation type was added in SkLearn2PMML version 0.73.0 (released ~3 days ago).

Can't use predict_proba_transform, since the output of the model predict_proba is 2d array, but isotonic expects 1d array.

Use a helper transformer to select a single column (eg. probability of class Z) out of the available ones:

pipeline = PMMLPipeline(..,
  predict_proba_transformer = Pipeline([
    ("select_col", ExpressionTransformer("X[1]")),
    ("transform_col", IsotonicRegression())
  ])
)

Is there a workaround with using EstimatorTransformer? Or the only way is building a custom transformer?

Honestly, just upgrade the SkLearn2PMML package to the latest version.

@sidelmary
Copy link

Hi, Villu!

I updated sklearn2pmml library and found a new issue while building PMMLPipeline
Code:

model = XGBClassifier( ... )
model.fit(x, y)
pipeline = PMMLPipeline([('classifier', model)])

Error:

 53                 self.apply_transformer = apply_transformer
   54                 # SkLearn 0.24+
---> 55                 super(PMMLPipeline, self).__init__(steps = steps, memory = memory, verbose = verbose)
   56 
   57         def __repr__(self):

TypeError: __init__() got an unexpected keyword argument 'verbose'

sklearn2pmml version:
0.73.0

0.60.0 and older work well, but EstimatorTransformer isn't supported there.

@vruusmann
Copy link
Member

TypeError: init() got an unexpected keyword argument 'verbose'

The sklearn.pipeline.Pipeline constructor introduced the verbose parameter in Scikit-Learn 0.21.0:
https://scikit-learn.org/0.21/modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline

Why would anyone use a pre-0.21 version in June 2021?

@sidelmary
Copy link

It works with updated libraries!
Thank you for your suggestions, it helped a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants