New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IllegalArgumentException: Cannot create PyString with non-byte value #90

Open
wzfxue opened this Issue Sep 19, 2017 · 8 comments

Comments

Projects
None yet
3 participants
@wzfxue

wzfxue commented Sep 19, 2017

2017-09-19 13:02:58 INFO jython script : OracleExtract:-2 - Found 43 record for partition info
2017-09-19 13:02:59 INFO jython script : OracleExtract:-2 - Found 2535 record for index info
2017-09-19 13:03:05 INFO jython script : OracleExtract:-2 - 2752 Table records generated
2017-09-19 13:03:05 INFO jython script : OracleExtract:-2 - Collecting table info [13:02:21 -> 13:03:05]
2017-09-19 13:03:07 ERROR Job Launcher:87 - Traceback (most recent call last):
File "", line 372, in
File "", line 330, in run
File "", line 301, in write_csv
File "/home/cloudera/wherehows/backend-service-1.0-SNAPSHOT/lib/jython-standalone-2.7.1.jar/Lib/csv.py", line 148, in writerow
java.lang.IllegalArgumentException: Cannot create PyString with non-byte value
at org.python.core.PyString.(PyString.java:57)
at org.python.core.PyString.(PyString.java:70)
at org.python.core.PyString.(PyString.java:74)
at org.python.modules._csv.PyWriter.writer_writerow(PyWriter.java:167)
at org.python.modules._csv.PyWriter$writer_writerow_exposer.call(Unknown Source)
at org.python.core.PyObject.call(PyObject.java:484)
at csv$py.writerow$16(/home/cloudera/wherehows/backend-service-1.0-SNAPSHOT/lib/jython-standalone-2.7.1.jar/Lib/csv.py:148)
at csv$py.call_function(/home/cloudera/wherehows/backend-service-1.0-SNAPSHOT/lib/jython-standalone-2.7.1.jar/Lib/csv.py)
at org.python.core.PyTableCode.call(PyTableCode.java:171)
at org.python.core.PyBaseCode.call(PyBaseCode.java:154)
at org.python.core.PyFunction.call(PyFunction.java:423)
at org.python.core.PyMethod.call(PyMethod.java:141)
at org.python.pycode._pyx0.write_csv$10(:302)
at org.python.pycode._pyx0.call_function()
at org.python.core.PyTableCode.call(PyTableCode.java:171)
at org.python.core.PyBaseCode.call(PyBaseCode.java:189)
at org.python.core.PyFunction.call(PyFunction.java:446)
at org.python.core.PyMethod.call(PyMethod.java:171)
at org.python.pycode.pyx0.run$11(:340)
at org.python.pycode.pyx0.call_function()
at org.python.core.PyTableCode.call(PyTableCode.java:171)
at org.python.core.PyBaseCode.call(PyBaseCode.java:308)
at org.python.core.PyBaseCode.call(PyBaseCode.java:199)
at org.python.core.PyFunction.call(PyFunction.java:482)
at org.python.core.PyMethod.instancemethod___call
(PyMethod.java:237)
at org.python.core.PyMethod.call(PyMethod.java:228)
at org.python.pycode._pyx0.f$0(:380)
at org.python.pycode._pyx0.call_function()
at org.python.core.PyTableCode.call(PyTableCode.java:171)
at org.python.core.PyCode.call(PyCode.java:18)
at org.python.core.Py.runCode(Py.java:1614)
at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:296)
at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:291)
at metadata.etl.JythonEtlJob.runScript(JythonEtlJob.java:64)
at metadata.etl.JythonEtlJob.extract(JythonEtlJob.java:35)
at metadata.etl.EtlJob.run(EtlJob.java:97)
at wherehows.common.jobs.Launcher.main(Launcher.java:82)
java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Cannot create PyString with non-byte value

at org.python.core.Py.JavaError(Py.java:552)
at org.python.core.PyTableCode.call(PyTableCode.java:180)
at org.python.core.PyBaseCode.call(PyBaseCode.java:154)
at org.python.core.PyFunction.__call__(PyFunction.java:423)
at org.python.core.PyMethod.__call__(PyMethod.java:141)
at org.python.pycode._pyx0.write_csv$10(<iostream>:302)
at org.python.pycode._pyx0.call_function(<iostream>)
at org.python.core.PyTableCode.call(PyTableCode.java:171)
at org.python.core.PyBaseCode.call(PyBaseCode.java:189)
at org.python.core.PyFunction.__call__(PyFunction.java:446)
at org.python.core.PyMethod.__call__(PyMethod.java:171)
at org.python.pycode._pyx0.run$11(<iostream>:340)
at org.python.pycode._pyx0.call_function(<iostream>)
at org.python.core.PyTableCode.call(PyTableCode.java:171)
at org.python.core.PyBaseCode.call(PyBaseCode.java:308)
at org.python.core.PyBaseCode.call(PyBaseCode.java:199)
at org.python.core.PyFunction.__call__(PyFunction.java:482)
at org.python.core.PyMethod.instancemethod___call__(PyMethod.java:237)
at org.python.core.PyMethod.__call__(PyMethod.java:228)
at org.python.pycode._pyx0.f$0(<iostream>:380)
at org.python.pycode._pyx0.call_function(<iostream>)
at org.python.core.PyTableCode.call(PyTableCode.java:171)
at org.python.core.PyCode.call(PyCode.java:18)
at org.python.core.Py.runCode(Py.java:1614)
at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:296)
at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:291)
at metadata.etl.JythonEtlJob.runScript(JythonEtlJob.java:64)
at metadata.etl.JythonEtlJob.extract(JythonEtlJob.java:35)
at metadata.etl.EtlJob.run(EtlJob.java:97)
at wherehows.common.jobs.Launcher.main(Launcher.java:82)

Caused by: java.lang.IllegalArgumentException: Cannot create PyString with non-byte value
at org.python.core.PyString.(PyString.java:57)
at org.python.core.PyString.(PyString.java:70)
at org.python.core.PyString.(PyString.java:74)
at org.python.modules._csv.PyWriter.writer_writerow(PyWriter.java:167)
at org.python.modules._csv.PyWriter$writer_writerow_exposer.call(Unknown Source)
at org.python.core.PyObject.call(PyObject.java:484)
at csv$py.writerow$16(/home/cloudera/wherehows/backend-service-1.0-SNAPSHOT/lib/jython-standalone-2.7.1.jar/Lib/csv.py:148)
at csv$py.call_function(/home/cloudera/wherehows/backend-service-1.0-SNAPSHOT/lib/jython-standalone-2.7.1.jar/Lib/csv.py)
at org.python.core.PyTableCode.call(PyTableCode.java:171)
... 28 more

@Stewori

This comment has been minimized.

Show comment
Hide comment
@Stewori

Stewori Sep 19, 2017

Member

wzfxue,
can you provide a code sample to reproduce this issue?
Maybe some adjustments like in https://github.com/jythontools/jython/pull/28/files#diff-c22ddd1a64f9d775962e621420e6a09d can fix this one.

Member

Stewori commented Sep 19, 2017

wzfxue,
can you provide a code sample to reproduce this issue?
Maybe some adjustments like in https://github.com/jythontools/jython/pull/28/files#diff-c22ddd1a64f9d775962e621420e6a09d can fix this one.

@jeff5

This comment has been minimized.

Show comment
Hide comment
@jeff5

jeff5 Sep 19, 2017

Member

@Stewori It's clear what's happening I think without a sample, although it would be nice to confirm what kind of str is being written. The csv module is supposed to take bytes (that may be UTF-8) but we are handling these as char, with the failure we've got used to since we got strict. Unfortunately, the regression tests don't test with UTF-8.

@wzfxue is this happening with text you have encoded to UTF-8?

Almost certainly we should use a ByteBuffer where we presently use a StringBuilder since the file is in binary mode and the user should expect to encode the text before calling csv.writer.writerow(). Use of PyStringOrUnicode here would just dig the hole deeper.

When Jython 3 comes along we reverse all that, accepting unicode, using char internally and writing a text mode file. 2.7.2?

Member

jeff5 commented Sep 19, 2017

@Stewori It's clear what's happening I think without a sample, although it would be nice to confirm what kind of str is being written. The csv module is supposed to take bytes (that may be UTF-8) but we are handling these as char, with the failure we've got used to since we got strict. Unfortunately, the regression tests don't test with UTF-8.

@wzfxue is this happening with text you have encoded to UTF-8?

Almost certainly we should use a ByteBuffer where we presently use a StringBuilder since the file is in binary mode and the user should expect to encode the text before calling csv.writer.writerow(). Use of PyStringOrUnicode here would just dig the hole deeper.

When Jython 3 comes along we reverse all that, accepting unicode, using char internally and writing a text mode file. 2.7.2?

@wzfxue

This comment has been minimized.

Show comment
Hide comment
@wzfxue

wzfxue Sep 20, 2017

@Stewori @jeff5
Thank you very much for your help first。

I use the code as follows:https://github.com/linkedin/WhereHows/blob/master/wherehows-etl/src/main/java/metadata/etl/JythonEtlJob.java
py code:https://github.com/linkedin/WhereHows/blob/master/wherehows-etl/src/main/resources/jython/OracleExtract.py

I try to set the encoding (utf-8),the problem still exists:
sys.setdefaultencoding("utf-8");

the problem still exists.

wzfxue commented Sep 20, 2017

@Stewori @jeff5
Thank you very much for your help first。

I use the code as follows:https://github.com/linkedin/WhereHows/blob/master/wherehows-etl/src/main/java/metadata/etl/JythonEtlJob.java
py code:https://github.com/linkedin/WhereHows/blob/master/wherehows-etl/src/main/resources/jython/OracleExtract.py

I try to set the encoding (utf-8),the problem still exists:
sys.setdefaultencoding("utf-8");

the problem still exists.

@Stewori

This comment has been minimized.

Show comment
Hide comment
@Stewori

Stewori Sep 20, 2017

Member

Alright @jeff5 , looks like you have a better ad-hoc understanding of this issue than me. So better leaving it to your assessment... :)

Member

Stewori commented Sep 20, 2017

Alright @jeff5 , looks like you have a better ad-hoc understanding of this issue than me. So better leaving it to your assessment... :)

@wzfxue

This comment has been minimized.

Show comment
Hide comment
@wzfxue

wzfxue Sep 21, 2017

@Stewori @jeff5
Chinese and full-width characters, there will be this problem。

wzfxue commented Sep 21, 2017

@Stewori @jeff5
Chinese and full-width characters, there will be this problem。

@jeff5

This comment has been minimized.

Show comment
Hide comment
@jeff5
Member

jeff5 commented Oct 21, 2017

Will fix under http://bugs.jython.org/issue2632

@wzfxue

This comment has been minimized.

Show comment
Hide comment
@wzfxue

wzfxue commented Oct 23, 2017

@jeff5 谢谢~

@jeff5

This comment has been minimized.

Show comment
Hide comment
@jeff5

jeff5 Oct 24, 2017

Member

@wzfxue : Having looked into this, I'm not so sure it is a Jython bug after all. The csv.writer expects to be given byte data. Almost certainly, the message results from writing a unicode object here: https://github.com/linkedin/WhereHows/blob/master/wherehows-etl/src/main/resources/jython/OracleExtract.py#L301

I can create a test that does this and it fails with the same kind of message. The way this module is implemented in Jython results in UTF-16 data buffered up as if it were bytes, and then when we try to write it as bytes, we get this error. If you run this in CPython, you get a message like: UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 12: ordinal not in range(128).

The "fixed" behaviour for Jython would be to produce that kind of message. This is not much comfort for you as your application would continue to throw at the same point.

Assuming you are calling write_csv() defined here: https://github.com/linkedin/WhereHows/blob/2e38fff984120946280f746ee3376f7329fbeb21/wherehows-etl/src/main/resources/jython/OracleExtract.py#L294, your answer is to ensure Unicode strings in data_list are replaced by byte-strings in a known encoding (probably utf-8) before you call write_csv(), and that you read the file like that subsequently.

In response to your issue linkedin/WhereHows#754, that project could perhaps create a version of write_csv() that takes an encoding parameter.

Let us know if that helps. If I've guessed wrong, could you show us more of your code, or create an isolated demonstration of the problem, separate from the WhereHows library.

Member

jeff5 commented Oct 24, 2017

@wzfxue : Having looked into this, I'm not so sure it is a Jython bug after all. The csv.writer expects to be given byte data. Almost certainly, the message results from writing a unicode object here: https://github.com/linkedin/WhereHows/blob/master/wherehows-etl/src/main/resources/jython/OracleExtract.py#L301

I can create a test that does this and it fails with the same kind of message. The way this module is implemented in Jython results in UTF-16 data buffered up as if it were bytes, and then when we try to write it as bytes, we get this error. If you run this in CPython, you get a message like: UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 12: ordinal not in range(128).

The "fixed" behaviour for Jython would be to produce that kind of message. This is not much comfort for you as your application would continue to throw at the same point.

Assuming you are calling write_csv() defined here: https://github.com/linkedin/WhereHows/blob/2e38fff984120946280f746ee3376f7329fbeb21/wherehows-etl/src/main/resources/jython/OracleExtract.py#L294, your answer is to ensure Unicode strings in data_list are replaced by byte-strings in a known encoding (probably utf-8) before you call write_csv(), and that you read the file like that subsequently.

In response to your issue linkedin/WhereHows#754, that project could perhaps create a version of write_csv() that takes an encoding parameter.

Let us know if that helps. If I've guessed wrong, could you show us more of your code, or create an isolated demonstration of the problem, separate from the WhereHows library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment