Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The period (.) in <output> creates problems #45

Open
AayushSameerShah opened this issue Jun 9, 2022 · 0 comments
Open

The period (.) in <output> creates problems #45

AayushSameerShah opened this issue Jun 9, 2022 · 0 comments

Comments

@AayushSameerShah
Copy link

AayushSameerShah commented Jun 9, 2022

Classification Scenario

Context

Hello,
Sometimes in the dataset, target variable is pre-encoded into the 0, 1. But in those sometimes there are other sometimes when those 0 and 1 are encoded in as float. So they are 0.0 and 1.0.

Problem

Now, when we create the PMML for such data the output code is generated like this:

<Output>
	<OutputField name="probability(0.0)" optype="continuous" dataType="double" feature="probability" value="0.0"/>
	<OutputField name="probability(1.0)" optype="continuous" dataType="double" feature="probability" value="1.0"/>
</Output>

So, when we try to predict with other data with the following sentence:

Dataset<Row> result = pmmlTransformer.transform(DF);

The error is generated like:

Exception in thread "main" org.apache.spark.sql.AnalysisException: No such struct field `probability(0.0)` in income_>50K, probability(0.0), probability(1.0);
...

A Fix

So I tried, manually removing the . from the .pmml file and that worked correctly!
The updated code that worked correctly:

<Output>
	<OutputField name="probability(0)" optype="continuous" dataType="double" feature="probability" value="0.0"/>
	<OutputField name="probability(1)" optype="continuous" dataType="double" feature="probability" value="1.0"/>
</Output>

I can understand, that this could be solved from the file where we are generating the PMML but that might not be possible all time. So for the convenience I would ask this community to fix this at the jpmml level.

Thanking you 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant