Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem writing out categorical variable to JPMML file #144

Open
puifais opened this issue Mar 18, 2019 · 4 comments

Comments

Projects
None yet
3 participants
@puifais
Copy link

commented Mar 18, 2019

Hello,

I am having trouble writing out JPMML file with categorical variables. When I remove the categorical variables, I can write my numerical variables just fine.

Here is my error trace:

Standard output is empty
Standard error:
Mar 18, 2019 3:35:46 PM org.jpmml.sklearn.Main run
INFO: Parsing PKL..
Mar 18, 2019 3:35:54 PM org.jpmml.sklearn.Main run
INFO: Parsed PKL in 7610 ms.
Mar 18, 2019 3:35:54 PM org.jpmml.sklearn.Main run
INFO: Converting..
Mar 18, 2019 3:35:54 PM org.jpmml.sklearn.Main run
SEVERE: Failed to convert
java.lang.IllegalArgumentException
	at org.jpmml.converter.ValueUtil.getDataType(ValueUtil.java:212)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
	at sklearn.TypeUtil.getDataType(TypeUtil.java:38)
	at sklearn2pmml.preprocessing.LookupTransformer.encodeFeatures(LookupTransformer.java:105)
	at sklearn_pandas.DataFrameMapper.initializeFeatures(DataFrameMapper.java:75)
	at sklearn.Initializer.encodeFeatures(Initializer.java:41)
	at sklearn.pipeline.Pipeline.encodeFeatures(Pipeline.java:81)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:198)
	at org.jpmml.sklearn.Main.run(Main.java:145)
	at org.jpmml.sklearn.Main.main(Main.java:94)

Exception in thread "main" java.lang.IllegalArgumentException
	at org.jpmml.converter.ValueUtil.getDataType(ValueUtil.java:212)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
	at sklearn.TypeUtil.getDataType(TypeUtil.java:38)
	at sklearn2pmml.preprocessing.LookupTransformer.encodeFeatures(LookupTransformer.java:105)
	at sklearn_pandas.DataFrameMapper.initializeFeatures(DataFrameMapper.java:75)
	at sklearn.Initializer.encodeFeatures(Initializer.java:41)
	at sklearn.pipeline.Pipeline.encodeFeatures(Pipeline.java:81)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:198)
	at org.jpmml.sklearn.Main.run(Main.java:145)
	at org.jpmml.sklearn.Main.main(Main.java:94)

Preserved joblib dump file(s): /tmp/pipeline-9o79z2_d.pkl.z

--------------------------------------------------------------------------
RuntimeError                             Traceback (most recent call last)
<ipython-input-25-91db46e5268e> in <module>
      5 filename = 'my_model.pmml'
      6 
----> 7 sklearn2pmml(pipeline_rf1, pathname+filename, debug=True)

~/usr1/local/lib/anaconda3/envs/puifai_env/lib/python3.6/site-packages/sklearn2pmml/__init__.py in sklearn2pmml(pipeline, pmml, user_classpath, with_repr, debug, java_encoding)
    250                                 print("Standard error is empty")
    251                 if retcode:
--> 252                         raise RuntimeError("The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams")
    253         finally:
    254                 if debug:

RuntimeError: The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams

Here are versions of my packages:

python: 3.6.8
sklearn: 0.20.3
sklearn.externals.joblib: 0.13.0
pandas: 0.24.2
sklearn_pandas: 1.8.0
sklearn2pmml: 0.43.0
openjdk: 1.8.0_181

Would you please help me? :( I've tried several different combinations of versions of packages and nothing seems to work with my categorical.

@vruusmann

This comment has been minimized.

Copy link
Member

commented Mar 18, 2019

This issue is a functional duplicate of #70

In brief, the problem is that your LookupTransformer output column contains mixed data type values, and the converter is unable to "infer" what the intended data type should be.

To solve the problem, simply cast all output values to the same data type. Perhaps you have a mix of int and string values, or a mix of integer and float values there.

Specifically, see these two comments:
#70 (comment)
#70 (comment)

@vruusmann vruusmann closed this Mar 18, 2019

@puifais

This comment has been minimized.

Copy link
Author

commented Mar 18, 2019

Hello!

I figured it out! The issue has to do with the default value for LookupTransformer being type numpy.float instead of Python float because I set it like this:

default_val = train['my_tag'].sum()/len(train))

To solve this, I simply did:

default_val = float(train['my_tag'].sum()/len(train)))

You may want to consider making the code automatically convert numpy.float to float. Thank you so much for all your support!

@vruusmann

This comment has been minimized.

Copy link
Member

commented Mar 18, 2019

You may want to consider making the code automatically convert numpy.float to float.

I'm not a Python power user, so I wasn't aware of numpy.float vs float distinction.

Will have to investigate how the underlying Pyrolite library maps them to Java types. I suspect that numpy.float becomes some ClassDict object, whereas float becomes a regular java.lang.Float object.

Reopening to keep this on my radar.

@vruusmann vruusmann reopened this Mar 18, 2019

@xiaowei1234

This comment has been minimized.

Copy link

commented Mar 20, 2019

yea, this was the issue for my problem too. #145

I suspect a lot of people encounter this problem when they are trying to create a pmml file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.