You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hi,i defined encoding is in the file as UTF-8 according to pep-0263
but the String argument passed in contains Non-ASCII characters it will result in garbled code
java.lang.String(text, "utf-8") can be used to resolve the garbled code
But String are used a lot and this call is a bit complicated, is there a way to simplify this call? Such as encoded according string with the script encoding
If not,Is it possible to add some method to simplify?
The text was updated successfully, but these errors were encountered:
Now with the Greek and French examples I get this mess:
>>> test(gk)
array: array('B', [206, 187, 207, 140, 206, 179, 206, 191, 207, 130])
unicode: u'\u03bb\u03cc\u03b3\u03bf\u03c2'
String: Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "utf8string.py", line 18, in test
print "String: ", repr(t)
...
UnicodeEncodeError: 'ms936' codec can't encode character u'\u03cc' in position 1: illegal multibyte sequence
String: >>>
It is evident that the program works, in that the correct bytes end up in gk and a, and characters in the other representations of the text. What goes wrong is only in the output to the terminal, and only with java.lang.String, because it tries to make it text on screen during print, even when the repr is demanded.
Actually, I was a bit surprised to find that Stringis not immediately converted to unicode on creation, but it is probably useful that it isn't. Various actions will treat it like bytes, however, ignoring the upper byte of each UTF-16 code unit, or die in the attempt.
The work-around appears to be, if you can't simply use strand unicode, to be careful how you handle the String.
Jython 2.7.2
Java 17
hi,i defined encoding is in the file as UTF-8 according to pep-0263
but the String argument passed in contains Non-ASCII characters it will result in garbled code
java.lang.String(text, "utf-8") can be used to resolve the garbled code
But String are used a lot and this call is a bit complicated, is there a way to simplify this call? Such as encoded according string with the script encoding
If not,Is it possible to add some method to simplify?
The text was updated successfully, but these errors were encountered: