-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ANSI doesn't allow nonprintable characters #83
Comments
yes, this is wrong. will see. also for cp437 line-drawing characters, you may be interested in: and maybe also, https://github.com/tehmaze/piece |
I take it that the difference between this encoding and the standard cp437 one is that this one maps the first 32 characters to glyphs (smiley face, etc.) whereas the standard one leaves them with the same ordinal values?
Thanks, but I think that wouldn't work so well for me as it wants me to give it a file, and it parses it all in one hit, whereas I need to feed it data read from a serial interface a byte at a time and process the parsed result. |
For a streaming terminal emulator "screen region" access, recommend then also, pyte, see |
oh yes, and you are correct -- "cp437_art" is the control characters are smileys & etc. |
While I'm working on issue #84 - adding support for Unicode - is it okay if I just get rid of this check for whether the character is printable? If anyone really wanted to exclude non-printable characters, I think (haven't confirmed) that, with my fix for issue #84, they should be able to specify something like codec="ascii", and with the default setting of codec_errors="replace", most non-printable characters should get replaced. |
That is correct, technically a utf-8 byte sequence would fail "printable". its up to the decoder to raise UnicodeDecodeError, etc. |
This commit updates the the screen and ANSI modules to support Unicode under Python 2.x. Under Python 3.x, it was already supported because strings are Unicode by default. Now, on both Python versions: - The constructors accept a codec name (defaults to 'latin-1') and a scheme for handling encoding/decoding errors (defaults to 'replace'). The codec may be set to None to inhibit encoding/decoding. - Unicode is now used internally for storing the screen contents. - Methods that accept input characters will, if passed input of type 'bytes' (or, under Python 2.x, 'str'), use the specified codec to decode the input, otherwise treating it as Unicode. - Methods that return screen contents now return Unicode, with the exception of __str__() under Python 2.x, and __bytes__() in all versions of Python, which return the screen contents encoded using the specified codec. These changes are designed to work only with Python 2.6, 2.7, and 3.3 and later, specifically versions that provide both b'' and u'' string literals. The check in ANSI for characters being printable is also removed, as this prevents non-ASCII characters being accepted, which is not compatible with the goal of adding Unicode support. This addresses issue pexpect#83.
Filed pull request #96 which includes a fix for this issue. |
This commit updates the the screen and ANSI modules to support Unicode under Python 2.x. Under Python 3.x, it was already supported because strings are Unicode by default. Now, on both Python versions: - The constructors accept a codec name (defaults to 'latin-1') and a scheme for handling encoding/decoding errors (defaults to 'replace'). The codec may be set to None to inhibit encoding/decoding. - Unicode is now used internally for storing the screen contents. - Methods that accept input characters will, if passed input of type 'bytes' (or, under Python 2.x, 'str'), use the specified codec to decode the input, otherwise treating it as Unicode. - Methods that return screen contents now return Unicode, with the exception of __str__() under Python 2.x, and __bytes__() in all versions of Python, which return the screen contents encoded using the specified codec. These changes are designed to work only with Python 2.6, 2.7, and 3.3 and later, specifically versions that provide both b'' and u'' string literals. The check in ANSI for characters being printable is also removed, as this prevents non-ASCII characters being accepted, which is not compatible with the goal of adding Unicode support. This addresses issue pexpect#83.
Closing, pexpect's terminal emulation code remains next release but no longer improved, marked deprecated by #240 Suggest any terminal emulation / screen scraping code efforts moved to more concerted project efforts such as https://github.com/selectel/pyte |
In 345eb58, in write_ch():
I would like non-printable characters to be accepted, since I am dealing with data streams that include CP437 line drawing characters.
Obviously it's easy to comment out the above lines, but I think it might make more sense for non-printable characters to be accepted, and to leave it up to the caller to decide what to do with any that they find on the virtual "screen" as they see fit. Filtering them out, as is currently done, means that other characters do not appear in their correct locations on the screen, unless this is meant to be a way to filter out line noise, in which case you don't want the cursor to be moved when they are filtered out?
The text was updated successfully, but these errors were encountered: