Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

codepage handling of output from scripts and shellcommands are not handled properly by qtconsole #768

Closed
jstenar opened this Issue · 14 comments

4 participants

@jstenar
Collaborator

On my machine when running ls in a qtconsole any non-ascii characters in the output are garbage (diamond shaped question mark) .

I have a testscript at https://gist.github.com/1198529 that can be used to illustrate the problem

In a regular ipython terminal I get correct result for:

In [1]: %run run-encoding.py cp1252
Test data åäö

But as expected I get incorrect results for

In [2]: %run run-encoding.py cp850
Test data †„”

In [3]: %run run-encoding.py utf-8
Test data åäö

However when running in qtconsole I get incorrect results in all three cases.

/Jörgen

@minrk
Owner

The basic reason is that the 'encoding' associated with the qtconsole is sys.getdefaultencoding(), so just like you get the wrong answer in everything but cp1252 in your Windows terminal, you get the wrong answer in everything but the default encoding (generally ascii) in the qtconsole. The question marks are the result of s.decode(sys.getdefaultencoding(), 'replace').

The general idea is that if you are printing unicode, you should be printing unicode objects, which will behave correctly, not bytes objects, which have discarded the character meaning of their contents.

@jstenar
Collaborator
@minrk
Owner

I was mistaken, we actually start with sys.stdin.encoding, and fallback to getdefaultencoding, but sys.stdin.encoding is often None for subprocesses like the kernel.

In any case, I think if we give the OutStream (what we replace sys.stdout with) object a configurable encoding attr, much of these should be helped, and would be configurable.

@takluyver
Owner

It's not entirely clear what the 'correct' encoding is, because we're not limited by the terminal code page. If you do print "åäö", should we assume that to be in the encoding a terminal would force you to use, or UTF-8, or something else?

For external processes, I think we should decode the bytes as we read them from the other process, and assume that it's using the system code page. I thought we already did this, but I guess it must be going wrong somewhere.

@minrk
Owner

We use sys.stdin.encoding, which can be (and often is for subprocesses) None. If we give the OutStream object an encoding with the same default behavior it currently has, it should improve the situation, allowing users to set it when stdin encoding doesn't tell us anything.

@minrk
Owner

@jstenar, can you check if the code in PR #770 makes the behavior more reasonable for you? It adds checking the locale for encoding information, so if you change the locale, it will change the default interpretation of bytes objects.

@jstenar
Collaborator
@fperez
Owner

I've just merged #770 which supposedly helped with this, but on linux I still see problems. On the terminal I get:

In [4]: %run run-encoding.py utf-8
Test data åäö

but on the qtconsole I see the little question-mark-diamonds:

In [1]: %run run-encoding.py utf-8
Test data ������

So it seems we still have issues, no?

@minrk
Owner

Arg, I switched getpreferredencoding() to getpreferredencoding(False), since I thought it was safer. Turns out the opposite makes the most sense, and fixes this particular case.

@fperez
Owner

@minrk, since #770 is already merged, do you want to just make that change in master? We can then retest this...

@minrk
Owner

Sure, change pushed.

@fperez
Owner

OK, with Min's fix, master does work for me now both at the terminal and the qtconsole. I should note that only utf-8 shows the output correctly, the cp1252 still shows the diamonds on linux. But I imagine that's correct on a linux box...

So now that this has been merged, should we close the original issue? @jstenar?

@jstenar
Collaborator
@minrk
Owner

closed by PR #770

@minrk minrk closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.