-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mechanism for registering custom Estimator and Transformer converter classes #20
Comments
First, you need to introduce an appropriate converter class into the JPMML-SkLearn project:
There's not much documentation about it. Here's an example about implementing a converter class for Scikit-Learn's Then, build your modified JPMML-SkLearn project with Apache Maven, and drop the resulting JAR file into the sklearn2pmml |
Edited the title of this issue to reflect the real user need. Actually, one shouldn't be making custom converter classes part of the JPMML-SkLearn library. They could be isolated into a standalone (mini-)project, which depends on the JPMML-SkLearn library (and other Java libraries). If this (mini-)project is built, then it should produce a JAR file, which is suitable for dropping into sklearn2pmml A solution would be to introduce some sort of "sklearn2pmml extension module metadata" mechanism. For example, the JAR file could contain a properties file |
Is there a simpler solution e.g. similar to http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.FunctionTransformer.html where an arbitrary function can be registered? |
Class As you can see, in order to support an ufunc, you still need to write conversion "business logic" in Java, and (re-)build a modified JPMML-SkLearn library. |
Could you perhaps share your Python Transformer class? Maybe I can suggest a simple and easy way of automatically translating it to PMML then. For example, the JPMML-R library now includes a general-purpose R-to-PMML expression conversion functionality: iris.rf = randomForest(Species ~ . + I(log(Sepal.Length / Sepal.Width) + 1), data = iris) Should be possible to build a similar Python-to-PMML expression converter. Of course, the trouble is that you cannot refer fields by name in Scikit-Learn; have to use field references something like |
Unfortunately sharing the code will not be possible. But I can explain you the actions which are performed.
The whole pipeline is a bit similar to http://zacstewart.com/2014/08/05/pipelines-of-featureunions-of-pipelines.html where the cleaning of the data is encapsulated in its own transformer. |
Starting from the JPMML-SkLearn library version 1.2.0, the You can implement your own During conversion, you can add a list of JAR files to the application classpath using the newly introduced sklearn2pmml(estimator, mapper, "estimator_mapper.pmml", user_classpath = ["/path/to/extensions.jar"]) |
How can I integrate a custom transformer to skelarn2pmml?
I am thinking about some preprocessing code which cleans the data, hand handles the imputation of missing values.
Is it correct to assume pyrolite will handle any pickled transformer?
The text was updated successfully, but these errors were encountered: