-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange regression with some unicode characters (e.g. with the Russian Н) #65
Comments
Thanks for reporting! This could be related to #62. Will investigate further. |
I have found a new group of the evil characters.
|
The issue indeed has the same cause as #62. All of the characters you've listed contain some control bytes when UTF-8 encoded, e.g. >>> "Н".encode("utf-8")
b'\xd0\x9d' # \x9d is OSC
>>> "қ".encode("utf-8")
b'\xd2\x9b' # \x9b is CSI |
Of course they have, I listed some of them with their codes and they indeed contain 9d and 9b as you can see. On the other hand, in the last block I listed another group of characters, those do not contain neither 9d nor 9b. That seem to be another problem |
The new "unprintable" group seems to be related to the way we do Unicode normalization as all of them (I think) are combining characters. |
How do you think, are there any chances that the bug will be fixed in the next weeks? Or should I better downgrade pyte and use 0.5.2? Can I help somehow probably? |
The bug is a consequence of delegating input decoding to Screen (see febdad7). I am currently thinking about how to best approach this, can't guarantee the fix would arrive shortly. If you have any ideas, feel free to share them here. |
I can try to find some other broken characters if it can help |
Don't worry, the ones you already came up with are already enough. |
Any news about the issue may be? The problem is that many Japanese/Chinese are also corrupted. There are some simple workaround for Cyrllic/Greek, but things are getting worse with the oriental languages. So the issue is a real blocker for pyte 0.6 usage in a multilingual environment |
I am still thinking on how to implement this without making the code too much of a nightmare. I have a prototype in a local branch but it is not finished yet. Most likely I won't have much time to work on this further until the next weekend, so if you have any ideas feel free to post them here or submit a PR.
Yes, I understand it is critical, but 0.6.0 has not been released, so I'd suggest to use the latest stable version if you're after correctness. |
I confirm the problem is fixed now! @superbobry you are genius! Thank you very much! |
Haha, thanks! Glad it works for you :) |
pyte 0.6 has a strange regression with some Unicode characters, particularly with the Russian "Н" character:
That works:
That does not work:
As you can see, the output is empty in the second example (where the printed text contains "Н").
Everything works find with the 0.5.x version of the module.
Another problematic character: greek letter Ν
Some other broken characters:
1b, 1d, 5b, 5d, 9b, 9d seem to be the root of the problem
The text was updated successfully, but these errors were encountered: