New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convenience method for transforming existing estimator objects to `PMMLPipeline` objects #27

Closed
rossmeissl opened this Issue Feb 27, 2017 · 3 comments

Comments

Projects
None yet
2 participants
@rossmeissl

I have about a hundred sklearn classifiers (each stored pickled, if that matters) that I'd like to export to PMML. How can I apply this tool to an existing classifier, versus wrapping a new training process?

@vruusmann

This comment has been minimized.

Show comment
Hide comment
@vruusmann

vruusmann Feb 27, 2017

Member

Simply wrap your pre-trained estimator object into a sklearn.PMMLPipeline object:

estimator = pickle.load(...)
pipeline = PMMLPipeline([
  ("pretrained-estimator", estimator)
])

sklearn2pmml(pipeline, "pretrained-estimator.pmml")

The main problem with using pre-trained estimators is that the resulting PMML document will be devoid of any supporting (meta-)information such as column names or data types. All input fields will be called x{index}, and they will be of some numeric data type (typically, float for decision tree-based model types, and double for all other model types).

If you have some knowledge about the "schema" of your pre-trained models, then it's possible to enhance the sklearn2pmml.PMMLPipeline object accordingly:

dataframemapper = pickle.load(...)
pipeline = PMMLPipeline([
  ("pretrained-dataframemapper", dataframemapper),
  ("pretrained-estimator", estimator)
])

or

pipeline = PMMLPipeline([
  ("pretrained-estimator", estimator)
])
pipeline.target_field = "Species"
pipeline.active_fields = numpy.array(["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"])
Member

vruusmann commented Feb 27, 2017

Simply wrap your pre-trained estimator object into a sklearn.PMMLPipeline object:

estimator = pickle.load(...)
pipeline = PMMLPipeline([
  ("pretrained-estimator", estimator)
])

sklearn2pmml(pipeline, "pretrained-estimator.pmml")

The main problem with using pre-trained estimators is that the resulting PMML document will be devoid of any supporting (meta-)information such as column names or data types. All input fields will be called x{index}, and they will be of some numeric data type (typically, float for decision tree-based model types, and double for all other model types).

If you have some knowledge about the "schema" of your pre-trained models, then it's possible to enhance the sklearn2pmml.PMMLPipeline object accordingly:

dataframemapper = pickle.load(...)
pipeline = PMMLPipeline([
  ("pretrained-dataframemapper", dataframemapper),
  ("pretrained-estimator", estimator)
])

or

pipeline = PMMLPipeline([
  ("pretrained-estimator", estimator)
])
pipeline.target_field = "Species"
pipeline.active_fields = numpy.array(["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"])

@vruusmann vruusmann closed this Feb 27, 2017

@rossmeissl

This comment has been minimized.

Show comment
Hide comment

Thanks @vruusmann!

@vruusmann

This comment has been minimized.

Show comment
Hide comment
@vruusmann

vruusmann Feb 27, 2017

Member

It would be nice to have utility method(s) for doing this kind of wrapping work. Furthermore, it could happen silently inside the sklearn2pmml function itself.

The underlying JPMML-SkLearn library has no problem accepting "raw" estimator objects:
jpmml/jpmml-sklearn@7b258cd

So, this limitation exists only in the sklearn2pmml package, in the form of the following isinstance check:
https://github.com/jpmml/sklearn2pmml/blob/master/sklearn2pmml/__init__.py#L128

Member

vruusmann commented Feb 27, 2017

It would be nice to have utility method(s) for doing this kind of wrapping work. Furthermore, it could happen silently inside the sklearn2pmml function itself.

The underlying JPMML-SkLearn library has no problem accepting "raw" estimator objects:
jpmml/jpmml-sklearn@7b258cd

So, this limitation exists only in the sklearn2pmml package, in the form of the following isinstance check:
https://github.com/jpmml/sklearn2pmml/blob/master/sklearn2pmml/__init__.py#L128

@vruusmann vruusmann reopened this Feb 27, 2017

@vruusmann vruusmann changed the title from Apply to existing classifier? to Convenience method for transforming existing estimators to `PMMLPipeline` objects Feb 27, 2017

@vruusmann vruusmann changed the title from Convenience method for transforming existing estimators to `PMMLPipeline` objects to Convenience method for transforming existing estimator objects to `PMMLPipeline` objects Feb 27, 2017

@vruusmann vruusmann closed this in 5235137 Oct 18, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment