Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ANSI doesn't allow nonprintable characters #83

Closed
dcoshea opened this issue Jul 2, 2014 · 8 comments
Closed

ANSI doesn't allow nonprintable characters #83

dcoshea opened this issue Jul 2, 2014 · 8 comments
Labels

Comments

@dcoshea
Copy link

dcoshea commented Jul 2, 2014

In 345eb58, in write_ch():

if ch not in string.printable:
    fout = open ('log', 'a')
    fout.write ('Nonprint: ' + str(ord(ch)) + '\n')
    fout.close()
    return

I would like non-printable characters to be accepted, since I am dealing with data streams that include CP437 line drawing characters.

Obviously it's easy to comment out the above lines, but I think it might make more sense for non-printable characters to be accepted, and to leave it up to the caller to decide what to do with any that they find on the virtual "screen" as they see fit. Filtering them out, as is currently done, means that other characters do not appear in their correct locations on the screen, unless this is meant to be a way to filter out line noise, in which case you don't want the cursor to be moved when they are filtered out?

@jquast
Copy link
Member

jquast commented Jul 11, 2014

yes, this is wrong. will see.

also for cp437 line-drawing characters, you may be interested in:
https://github.com/jquast/x84/blob/master/x84/encodings/cp437_art.py

and maybe also, https://github.com/tehmaze/piece

@jquast jquast added the bug label Jul 11, 2014
@dcoshea
Copy link
Author

dcoshea commented Jul 14, 2014

also for cp437 line-drawing characters, you may be interested in:
https://github.com/jquast/x84/blob/master/x84/encodings/cp437_art.py

I take it that the difference between this encoding and the standard cp437 one is that this one maps the first 32 characters to glyphs (smiley face, etc.) whereas the standard one leaves them with the same ordinal values?

and maybe also, https://github.com/tehmaze/piece

Thanks, but I think that wouldn't work so well for me as it wants me to give it a file, and it parses it all in one hit, whereas I need to feed it data read from a serial interface a byte at a time and process the parsed result.

@jquast
Copy link
Member

jquast commented Jul 16, 2014

For a streaming terminal emulator "screen region" access, recommend then also, pyte, see stream.feed() call in example https://github.com/selectel/pyte/blob/master/examples/helloworld.py

@jquast
Copy link
Member

jquast commented Jul 16, 2014

oh yes, and you are correct -- "cp437_art" is the control characters are smileys & etc.

@dcoshea
Copy link
Author

dcoshea commented Jul 23, 2014

While I'm working on issue #84 - adding support for Unicode - is it okay if I just get rid of this check for whether the character is printable? If anyone really wanted to exclude non-printable characters, I think (haven't confirmed) that, with my fix for issue #84, they should be able to specify something like codec="ascii", and with the default setting of codec_errors="replace", most non-printable characters should get replaced.

@jquast
Copy link
Member

jquast commented Jul 24, 2014

That is correct, technically a utf-8 byte sequence would fail "printable". its up to the decoder to raise UnicodeDecodeError, etc.

dcoshea pushed a commit to dcoshea/pexpect that referenced this issue Jul 24, 2014
This commit updates the the screen and ANSI modules to support Unicode
under Python 2.x.  Under Python 3.x, it was already supported because
strings are Unicode by default.  Now, on both Python versions:

- The constructors accept a codec name (defaults to 'latin-1') and a
  scheme for handling encoding/decoding errors (defaults to
  'replace').  The codec may be set to None to inhibit
  encoding/decoding.

- Unicode is now used internally for storing the screen contents.

- Methods that accept input characters will, if passed input of type
  'bytes' (or, under Python 2.x, 'str'), use the specified codec to
  decode the input, otherwise treating it as Unicode.

- Methods that return screen contents now return Unicode, with the
  exception of __str__() under Python 2.x, and __bytes__() in all
  versions of Python, which return the screen contents encoded using
  the specified codec.

These changes are designed to work only with Python 2.6, 2.7, and 3.3
and later, specifically versions that provide both b'' and u'' string
literals.

The check in ANSI for characters being printable is also removed, as
this prevents non-ASCII characters being accepted, which is not
compatible with the goal of adding Unicode support.  This addresses
issue pexpect#83.
@dcoshea
Copy link
Author

dcoshea commented Jul 24, 2014

Filed pull request #96 which includes a fix for this issue.

dcoshea pushed a commit to dcoshea/pexpect that referenced this issue Aug 5, 2014
This commit updates the the screen and ANSI modules to support Unicode
under Python 2.x.  Under Python 3.x, it was already supported because
strings are Unicode by default.  Now, on both Python versions:

- The constructors accept a codec name (defaults to 'latin-1') and a
  scheme for handling encoding/decoding errors (defaults to
  'replace').  The codec may be set to None to inhibit
  encoding/decoding.

- Unicode is now used internally for storing the screen contents.

- Methods that accept input characters will, if passed input of type
  'bytes' (or, under Python 2.x, 'str'), use the specified codec to
  decode the input, otherwise treating it as Unicode.

- Methods that return screen contents now return Unicode, with the
  exception of __str__() under Python 2.x, and __bytes__() in all
  versions of Python, which return the screen contents encoded using
  the specified codec.

These changes are designed to work only with Python 2.6, 2.7, and 3.3
and later, specifically versions that provide both b'' and u'' string
literals.

The check in ANSI for characters being printable is also removed, as
this prevents non-ASCII characters being accepted, which is not
compatible with the goal of adding Unicode support.  This addresses
issue pexpect#83.
@jquast
Copy link
Member

jquast commented Sep 19, 2015

Closing, pexpect's terminal emulation code remains next release but no longer improved, marked deprecated by #240 Suggest any terminal emulation / screen scraping code efforts moved to more concerted project efforts such as https://github.com/selectel/pyte

@jquast jquast closed this as completed Sep 19, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants