Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't decode byte 0xd4 in position 2: invalid continuation byte? #4

Closed
liubinpoem opened this issue Oct 12, 2018 · 14 comments
Closed

Comments

@liubinpoem
Copy link

I downloaded your project, and try to run the example. But get this error.:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 2: invalid continuation byte

I don't know what the problem is. Any suggestions?

Thanks a lot.

@vruusmann
Copy link
Member

What do you mean by "run the example"?

Is the mvn clean install command failing with such an exception?

@liubinpoem
Copy link
Author

liubinpoem commented Oct 12, 2018

I follow your example, and generated the jar file. Then I run python example script. Then I get this error.
Debug information is as bellow:

b'\xca\xae\xd4\xc2 12, 2018 10:58:55 \xc9\xcf\xce\xe7 org.jpmml.sklearn.Main run\r\n\xd0\xc5\xcf\xa2: Parsing PKL..\r\n\xca\xae\xd4\xc2 12, 2018 10:58:55 \xc9\xcf\xce\xe7 org.jpmml.sklearn.Main run\r\n\xd0\xc5\xcf\xa2: Parsed PKL in 19 ms.\r\n\xca\xae\xd4\xc2 12, 2018 10:58:55 \xc9\xcf\xce\xe7 org.jpmml.sklearn.Main run\r\n\xd0\xc5\xcf\xa2: Converting..\r\n\xca\xae\xd4\xc2 12, 2018 10:58:55 \xc9\xcf\xce\xe7 org.jpmml.sklearn.Main run\r\n\xd1\xcf\xd6\xd8: Failed to convert\r\njava.lang.IllegalArgumentException: The value object (Python class com.mycompany.Aggregator) is not a supported Transformer\r\n\tat org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)\r\n\tat com.google.common.collect.Lists$TransformingRandomAccessList$1.transform(Lists.java:616)\r\n\tat com.google.common.collect.TransformedIterator.next(TransformedIterator.java:47)\r\n\tat sklearn_pandas.DataFrameMapper.initializeFeatures(DataFrameMapper.java:72)\r\n\tat sklearn.Initializer.encodeFeatures(Initializer.java:41)\r\n\tat sklearn.pipeline.Pipeline.encodeFeatures(Pipeline.java:81)\r\n\tat sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:202)\r\n\tat org.jpmml.sklearn.Main.run(Main.java:145)\r\n\tat org.jpmml.sklearn.Main.main(Main.java:94)\r\nCaused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDict to sklearn.Transformer\r\n\tat java.lang.Class.cast(Unknown Source)\r\n\tat org.jpmml.sklearn.CastFunction.apply(CastFunction.java:41)\r\n\t... 8 more\r\n\r\nException in thread "main" java.lang.IllegalArgumentException: The value object (Python class com.mycompany.Aggregator) is not a supported Transformer\r\n\tat org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)\r\n\tat com.google.common.collect.Lists$TransformingRandomAccessList$1.transform(Lists.java:616)\r\n\tat com.google.common.collect.TransformedIterator.next(TransformedIterator.java:47)\r\n\tat sklearn_pandas.DataFrameMapper.initializeFeatures(DataFrameMapper.java:72)\r\n\tat sklearn.Initializer.encodeFeatures(Initializer.java:41)\r\n\tat sklearn.pipeline.Pipeline.encodeFeatures(Pipeline.java:81)\r\n\tat sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:202)\r\n\tat org.jpmml.sklearn.Main.run(Main.java:145)\r\n\tat org.jpmml.sklearn.Main.main(Main.java:94)\r\nCaused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDict to sklearn.Transformer\r\n\tat java.lang.Class.cast(Unknown Source)\r\n\tat org.jpmml.sklearn.CastFunction.apply(CastFunction.java:41)\r\n\t... 8 more\r\n'

@liubinpoem
Copy link
Author

Thanks a lot for your response. I don't know why Aggregator is not supported transformer. Is there any operation that I missed so that this error happen?

@vruusmann
Copy link
Member

If the conversion fails, then the sklearn2pmml.sklearn2pmml function tries to catch and parse the error message using the UTF-8 charset:
https://github.com/jpmml/sklearn2pmml/blob/master/sklearn2pmml/__init__.py#L235
https://github.com/jpmml/sklearn2pmml/blob/master/sklearn2pmml/__init__.py#L239

Apparently, this default charset does not work well for Chinese users that are using Chinese Scikit-Learn error messages.

Maybe it's possible to detect Scikit-Learn's default charset in some way.

@liubinpoem
Copy link
Author

Thanks a lot for your response. Actually,I'm not using Chinese Scikit-Learn. SKlearn is in Anaconda. There are no Chinese characters. I'm very confused. Can you run the example correctly?

@doushi19961117
Copy link

actually, I encountered the same problem. Though the error is UnicodeDecodeError, but in fact it did not support this transformer, I do not know what I should do.

@doushi19961117
Copy link

I solved this problem by modifying the java file. I add the main function in each java file before. This example can run correctly.

@vruusmann
Copy link
Member

I solved this problem by modifying the java file

@doushi19961117 What Java file did you exactly modify, and in which way?

If your pipeline contains an unsupported transformer/model type, then the underlying JPMML-SkLearn is throwing a "class X is not supported" exception. If the character encoding of your Scikit-Learn/OS is something else than UTF-8 (some Chinese or Arabic encoding), then the error.decode("UTF-8") function call fails with a UnicodeDecodeError.

I don't have access to such "exotic" compute platforms, so I'm not in a very good position to figure out/implement a fix to this issue.

@liubinpoem
Copy link
Author

@doushi19961117 How did you modify the java file? Could you please give more details? Your method shall be super helpful to me!

@liubinpoem
Copy link
Author

I tried to add java files of plugin example into jpmml-sklearn project. Sklearn2pmml.properties file is also modified. But I still get the error that it customized aggregator is not a supported transformer.

@liubinpoem
Copy link
Author

@doushi19961117 Thanks a lot for your comment. I also added main function in each java file. Then I can run the example correctly. Much appreciated! And this solution is also very amazing.

@doushi19961117
Copy link

@liubinpoem I am also doing custom transformer now, can u leave your contact below

@liubinpoem
Copy link
Author

@doushi19961117 my email is liupoem at 126.com, hope we can discuss on this problem.

@vruusmann
Copy link
Member

This issue has been handled in the SkLearn2PMML project a long time ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants