New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode content crashes the pager (console) #2738

Merged
merged 2 commits into from Jan 5, 2013

Conversation

Projects
None yet
3 participants
@mdboom
Contributor

mdboom commented Jan 3, 2013

We've run into an interesting bug in the astropy project.

astropy/astropy#600

When displaying a docstring that contains Unicode and is also long enough that it gets sent to the pager it fails since the docstring can't be sent to the pager as ascii. This crashes in the middle of sending content to the pager, so the shell ends up in an inconsistent state and stops echoing the keyboard etc.

The fix (attached) is merely to encode the content sent to the pager in the same encoding as the terminal (sys.stdout.encoding). Strictly speaking, this isn't always the right thing to do, since the pager may be configured to expect a different encoding than the terminal, but that is sort of an irrational way to configure a machine... ;) For example, less, in the absence of any special environment variables to tell it otherwise, uses the standard LC* environment variables to determine what to do, which should be the same mechanism the terminal also uses by default.

If anyone can suggest a better fix, I'm all for it. Perhaps it should be configurable, defaulting to sys.stdout.encoding?

@takluyver

This comment has been minimized.

Show comment
Hide comment
@takluyver

takluyver Jan 3, 2013

Member

If the string sent to page() is a Python 2 str, calling .encode() on it might throw a UnicodeDecodeError. In Python 3, the handle returned by os.popen() expects unicode str objects, so we shouldn't encode if we keep using that.

As a quick fix, I think py3compat.cast_bytes_py2 would work. But if you've got time, could you refactor the relevant bit of code to use subprocess.Popen, which should handle bytes on both Python versions?

Member

takluyver commented Jan 3, 2013

If the string sent to page() is a Python 2 str, calling .encode() on it might throw a UnicodeDecodeError. In Python 3, the handle returned by os.popen() expects unicode str objects, so we shouldn't encode if we keep using that.

As a quick fix, I think py3compat.cast_bytes_py2 would work. But if you've got time, could you refactor the relevant bit of code to use subprocess.Popen, which should handle bytes on both Python versions?

@bfroehle

This comment has been minimized.

Show comment
Hide comment
@bfroehle

bfroehle Jan 3, 2013

Contributor

Unfortunately, this still leaves us with the same issue if the unicode string cannot be encoded using the sys.stdout.encoding codec.

I think we should do something like

pager.write(string.encode(..., errors='replace'))

It's not clear to me that sys.stdout.encoding is the right codec to use... why not pager.encoding?

Contributor

bfroehle commented Jan 3, 2013

Unfortunately, this still leaves us with the same issue if the unicode string cannot be encoded using the sys.stdout.encoding codec.

I think we should do something like

pager.write(string.encode(..., errors='replace'))

It's not clear to me that sys.stdout.encoding is the right codec to use... why not pager.encoding?

@bfroehle

This comment has been minimized.

Show comment
Hide comment
@bfroehle

bfroehle Jan 3, 2013

Contributor

Also this would be "fixed" (although still broken) if it was a try/finally block:

pager = ...
try:
    pager.write(...)
finally:
    pager.close()
Contributor

bfroehle commented Jan 3, 2013

Also this would be "fixed" (although still broken) if it was a try/finally block:

pager = ...
try:
    pager.write(...)
finally:
    pager.close()
@mdboom

This comment has been minimized.

Show comment
Hide comment
@mdboom

mdboom Jan 3, 2013

Contributor

Ok. I've updated this to use py3compat.cast_bytes_py2, which does "replace" under the hood, so that should also address @bfroehle's concern.

On Python 2, pager.encoding is None which would imply using sys.getdefaultencoding, which (at least on my Linux box here) is 'ascii', so we haven't advanced the ball. I've updated this to use pager.encoding or sys.stdout.encoding which is probably the smarter thing to do.

I did add the try ... finally block, so that at least the pager gets closed if something fails.

I attempted the refactor to subprocess.Popen, but I was seeing a change in behavior. At present, if one presses Ctrl+C out of the pager to close it, one is returned to the IPython console intact (albeit with a KeyboardInterrupt exception traceback displayed). When switching to subprocess.Popen, this would close the pager, leaving the console in a state where the arrow keys and keyboard echoing stopped working. I tinkered for a while, but couldn't get things to work in the old way. Maybe we can file a new issue to use subprocess.Popen in the future, but for now, this change works on Python 2 and Python 3 fine without changing any existing behavior, as far as I can tell.

Contributor

mdboom commented Jan 3, 2013

Ok. I've updated this to use py3compat.cast_bytes_py2, which does "replace" under the hood, so that should also address @bfroehle's concern.

On Python 2, pager.encoding is None which would imply using sys.getdefaultencoding, which (at least on my Linux box here) is 'ascii', so we haven't advanced the ball. I've updated this to use pager.encoding or sys.stdout.encoding which is probably the smarter thing to do.

I did add the try ... finally block, so that at least the pager gets closed if something fails.

I attempted the refactor to subprocess.Popen, but I was seeing a change in behavior. At present, if one presses Ctrl+C out of the pager to close it, one is returned to the IPython console intact (albeit with a KeyboardInterrupt exception traceback displayed). When switching to subprocess.Popen, this would close the pager, leaving the console in a state where the arrow keys and keyboard echoing stopped working. I tinkered for a while, but couldn't get things to work in the old way. Maybe we can file a new issue to use subprocess.Popen in the future, but for now, this change works on Python 2 and Python 3 fine without changing any existing behavior, as far as I can tell.

@takluyver

This comment has been minimized.

Show comment
Hide comment
@takluyver

takluyver Jan 3, 2013

Member

Thanks, that looks OK to me. @bfroehle , I'll let you have another look, and you can merge away if you're happy with it.

Member

takluyver commented Jan 3, 2013

Thanks, that looks OK to me. @bfroehle , I'll let you have another look, and you can merge away if you're happy with it.

@bfroehle

View changes

Show outdated Hide outdated IPython/core/page.py Outdated
@bfroehle

This comment has been minimized.

Show comment
Hide comment
@bfroehle

bfroehle Jan 3, 2013

Contributor

Looks okay to me, with the one exception of the new raise in the OSError block. Why was that added?

Contributor

bfroehle commented Jan 3, 2013

Looks okay to me, with the one exception of the new raise in the OSError block. Why was that added?

@mdboom

This comment has been minimized.

Show comment
Hide comment
@mdboom

mdboom Jan 4, 2013

Contributor

Sorry -- the new raise went in by accident. I had it in there to test what happens when the pager disappears out from under (through Ctrl+C or killing its process). I've push an amended commit.

Contributor

mdboom commented Jan 4, 2013

Sorry -- the new raise went in by accident. I had it in there to test what happens when the pager disappears out from under (through Ctrl+C or killing its process). I've push an amended commit.

bfroehle added a commit that referenced this pull request Jan 5, 2013

Merge pull request #2738 from mdboom/pager-encoding
Unicode content crashes the pager (console)

@bfroehle bfroehle merged commit 66274d0 into ipython:master Jan 5, 2013

1 check passed

default The Travis build passed
Details
@bfroehle

This comment has been minimized.

Show comment
Hide comment
@bfroehle

bfroehle Jan 5, 2013

Contributor

Thanks again @mdboom.

Contributor

bfroehle commented Jan 5, 2013

Thanks again @mdboom.

minrk added a commit that referenced this pull request Mar 5, 2013

Backport PR #2738: Unicode content crashes the pager (console)
We've run into an interesting bug in the astropy project.

astropy/astropy#600

When displaying a docstring that contains Unicode and is also long enough that it gets sent to the pager it fails since the docstring can't be sent to the pager as ascii.  This crashes in the middle of sending content to the pager, so the shell ends up in an inconsistent state and stops echoing the keyboard etc.

The fix (attached) is merely to encode the content sent to the pager in the same encoding as the terminal (`sys.stdout.encoding`).  Strictly speaking, this isn't always the right thing to do, since the pager may be configured to expect a different encoding than the terminal, but that is sort of an irrational way to configure a machine... ;)  For example, `less`, in the absence of any special environment variables to tell it otherwise, uses the standard `LC*` environment variables to determine what to do, which should be the same mechanism the terminal also uses by default.

If anyone can suggest a better fix, I'm all for it.  Perhaps it should be configurable, defaulting to `sys.stdout.encoding`?

mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014

Merge pull request ipython#2738 from mdboom/pager-encoding
Unicode content crashes the pager (console)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment