Support for transformer-only pipelines #86

AshwinSekar · 2018-10-09T20:02:17Z

I understand that a PMMLPipeline must end with an estimator to be valid for conversion to pmml. I have use cases in which I have useful pipelines for preprocessing that I would like to convert to pmml for evaluation in Java.

If I stick a DummyClassifier or DummyRegressor at the end of the pipeline, it is able to be written to valid pmml, however the target_fields information is lost, and I am unsure how to recover anything but the dummy prediction from the pmml.

Is there a recommended workflow in this situation? Should I use the jpmml-plugin to create some sort of "pass through" estimator that returns the input?

Thanks for your help!

The text was updated successfully, but these errors were encountered:

vruusmann · 2018-10-10T14:00:29Z

Is there a recommended workflow in this situation?

There was a similar situation with Apache Spark pipelines, and we managed to find some sort of fairly elegant solution there. However, Apache Spark pipelines are far more flexible than Scikit-Learn pipelines (eg. can have multiple models in a pipeline, and there can be transformers following the last model), so the solution is probably 1:1 transferable (and I really cannot recall its technical details).

Should I use the jpmml-plugin to create some sort of "pass through" estimator that returns the input?

Probably the easiest solution to your problem:

Create a dummy-like estimator. Could very well be a subclass of DummyRegressor or DummyClassifier.
In its #encodeModel(Schema) method, create an empty Output element, and append an OutputField child element for every pre-processing step that you want to pass through. Be sure to use unique field names in order to avoid naming conflicts between DerivedField and OutputField elements.

Something like this should do:

<Output>
  <OutputField name="z" dataType=".." optype="..">
    <!-- refers to a DerivedField element whose name is "internal(y)" -->
    <FieldRef field="internal(y)"/>
  </OutputField>
</Output>

vruusmann · 2018-10-10T14:04:09Z

Re-purposed this issue. Would like to provide a solution that wouldn't require defining custom estimator types and renaming fields.

Perhaps the sklearn2pmml.pipeline.PMMLPipeline class should have a marker attribute transformation_only (or similar), which would inform the JPMML-SkLearn backend that the final estimator step (if any) should be skipped.

AshwinSekar · 2018-10-10T15:28:56Z

Thanks for the suggestion, I will look into creating a dummy estimator.

I noticed that the TransformationDictionary actually has all of the transforms in my pipeline in the form of derived fields. Is there anyway I can use these derived fields to extract the transformed values? Can I apply the expression from getExpression() in some way to the input fields?

vruusmann · 2018-10-10T15:39:30Z

Is there anyway I can use these derived fields to extract the transformed values?

See this comment, and the issue referenced therein:
jpmml/jpmml-converter#11 (comment)

vruusmann changed the title ~~Is it possible to access only the pre-processing steps of a pipeline?~~ Support for transformer-only pipelines Oct 10, 2018

vruusmann mentioned this issue Oct 10, 2018

Support for transformer-only pipelines jpmml/jpmml-converter#11

Closed

vruusmann mentioned this issue Jan 14, 2019

Cannot cast sklearn.feature_extraction.text.TfidfVectorizer to sklearn.Estimator jpmml/sklearn2pmml#125

Closed

vruusmann mentioned this issue Apr 10, 2019

Need to Export Pipeline Without Specifying Model jpmml/jpmml-sparkml#61

Closed

vruusmann mentioned this issue May 14, 2019

encounter exceptions when export PCA model jpmml/sklearn2pmml#153

Closed

vruusmann mentioned this issue Jul 1, 2019

got a transformer object #108

Closed

vruusmann closed this as completed in 49205f8 Jul 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for transformer-only pipelines #86

Support for transformer-only pipelines #86

AshwinSekar commented Oct 9, 2018

vruusmann commented Oct 10, 2018

vruusmann commented Oct 10, 2018

AshwinSekar commented Oct 10, 2018

vruusmann commented Oct 10, 2018

Support for transformer-only pipelines #86

Support for transformer-only pipelines #86

Comments

AshwinSekar commented Oct 9, 2018

vruusmann commented Oct 10, 2018

vruusmann commented Oct 10, 2018

AshwinSekar commented Oct 10, 2018

vruusmann commented Oct 10, 2018