Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode content crashes the pager (console) #2738

Merged
merged 2 commits into from Jan 5, 2013

Conversation

mdboom
Copy link
Contributor

@mdboom mdboom commented Jan 3, 2013

We've run into an interesting bug in the astropy project.

astropy/astropy#600

When displaying a docstring that contains Unicode and is also long enough that it gets sent to the pager it fails since the docstring can't be sent to the pager as ascii. This crashes in the middle of sending content to the pager, so the shell ends up in an inconsistent state and stops echoing the keyboard etc.

The fix (attached) is merely to encode the content sent to the pager in the same encoding as the terminal (sys.stdout.encoding). Strictly speaking, this isn't always the right thing to do, since the pager may be configured to expect a different encoding than the terminal, but that is sort of an irrational way to configure a machine... ;) For example, less, in the absence of any special environment variables to tell it otherwise, uses the standard LC* environment variables to determine what to do, which should be the same mechanism the terminal also uses by default.

If anyone can suggest a better fix, I'm all for it. Perhaps it should be configurable, defaulting to sys.stdout.encoding?

@takluyver
Copy link
Member

If the string sent to page() is a Python 2 str, calling .encode() on it might throw a UnicodeDecodeError. In Python 3, the handle returned by os.popen() expects unicode str objects, so we shouldn't encode if we keep using that.

As a quick fix, I think py3compat.cast_bytes_py2 would work. But if you've got time, could you refactor the relevant bit of code to use subprocess.Popen, which should handle bytes on both Python versions?

@bfroehle
Copy link
Contributor

bfroehle commented Jan 3, 2013

Unfortunately, this still leaves us with the same issue if the unicode string cannot be encoded using the sys.stdout.encoding codec.

I think we should do something like

pager.write(string.encode(..., errors='replace'))

It's not clear to me that sys.stdout.encoding is the right codec to use... why not pager.encoding?

@bfroehle
Copy link
Contributor

bfroehle commented Jan 3, 2013

Also this would be "fixed" (although still broken) if it was a try/finally block:

pager = ...
try:
    pager.write(...)
finally:
    pager.close()

@mdboom
Copy link
Contributor Author

mdboom commented Jan 3, 2013

Ok. I've updated this to use py3compat.cast_bytes_py2, which does "replace" under the hood, so that should also address @bfroehle's concern.

On Python 2, pager.encoding is None which would imply using sys.getdefaultencoding, which (at least on my Linux box here) is 'ascii', so we haven't advanced the ball. I've updated this to use pager.encoding or sys.stdout.encoding which is probably the smarter thing to do.

I did add the try ... finally block, so that at least the pager gets closed if something fails.

I attempted the refactor to subprocess.Popen, but I was seeing a change in behavior. At present, if one presses Ctrl+C out of the pager to close it, one is returned to the IPython console intact (albeit with a KeyboardInterrupt exception traceback displayed). When switching to subprocess.Popen, this would close the pager, leaving the console in a state where the arrow keys and keyboard echoing stopped working. I tinkered for a while, but couldn't get things to work in the old way. Maybe we can file a new issue to use subprocess.Popen in the future, but for now, this change works on Python 2 and Python 3 fine without changing any existing behavior, as far as I can tell.

@takluyver
Copy link
Member

Thanks, that looks OK to me. @bfroehle , I'll let you have another look, and you can merge away if you're happy with it.

retval = None
else:
retval = 1
except OSError:
# Other strange problems, sometimes seen in Win2k/cygwin
retval = 1
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new raise here... why?

@bfroehle
Copy link
Contributor

bfroehle commented Jan 3, 2013

Looks okay to me, with the one exception of the new raise in the OSError block. Why was that added?

@mdboom
Copy link
Contributor Author

mdboom commented Jan 4, 2013

Sorry -- the new raise went in by accident. I had it in there to test what happens when the pager disappears out from under (through Ctrl+C or killing its process). I've push an amended commit.

bfroehle added a commit that referenced this pull request Jan 5, 2013
Unicode content crashes the pager (console)
@bfroehle bfroehle merged commit 66274d0 into ipython:master Jan 5, 2013
@bfroehle
Copy link
Contributor

bfroehle commented Jan 5, 2013

Thanks again @mdboom.

minrk added a commit that referenced this pull request Mar 5, 2013
We've run into an interesting bug in the astropy project.

astropy/astropy#600

When displaying a docstring that contains Unicode and is also long enough that it gets sent to the pager it fails since the docstring can't be sent to the pager as ascii.  This crashes in the middle of sending content to the pager, so the shell ends up in an inconsistent state and stops echoing the keyboard etc.

The fix (attached) is merely to encode the content sent to the pager in the same encoding as the terminal (`sys.stdout.encoding`).  Strictly speaking, this isn't always the right thing to do, since the pager may be configured to expect a different encoding than the terminal, but that is sort of an irrational way to configure a machine... ;)  For example, `less`, in the absence of any special environment variables to tell it otherwise, uses the standard `LC*` environment variables to determine what to do, which should be the same mechanism the terminal also uses by default.

If anyone can suggest a better fix, I'm all for it.  Perhaps it should be configurable, defaulting to `sys.stdout.encoding`?
mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014
Unicode content crashes the pager (console)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants