unicode problem in qtconsole for windows #529

Closed
jstenar opened this Issue Jun 20, 2011 · 17 comments

Projects

None yet

2 participants

Member
jstenar commented Jun 20, 2011

Hi

there is some unicode problem when running magic commands like ls from the qtconsole. I see the same problem both on master and newapp branch.

As you can see below there are problems with åäö (the directory with the strange name below).

In [1]: ls
Volymen i enhet C har etiketten Enhet C
Volymens serienummer är 3260-74A6

Innehåll i katalogen C:\python\slask

2011-06-20 18:17 .
2011-06-20 18:17 ..
2011-06-20 18:17 ├Ñ├ñ├Â
0 fil(er) 0 byte
3 katalog(er) 12 938 858 496 byte ledigt

Owner

Note: u"åäö".encode("utf-8").decode("cp850") == u"├Ñ├ñ├Â"

Member
jstenar commented Jun 20, 2011

takluyver skrev 2011-06-20 18:40:

Note: u"åäö".encode("utf-8").decode("cp850") == u"├Ñ├ñ├Â"

I believe cp850 is the default for windows consoles. Perhaps that is the
codepage used when launching processes from within python.

/Jörgen

Owner

The default depends on region, so I'm assuming that it produces output in cp850 for you. We must therefore be decoding it correctly, re-encoding as UTF-8, then trying to decode again using cp850.

Can you try the following commands, to check that the process is actually producing cp850 encoded text:

p = subprocess.Popen('dir /on', shell=True,
                         stdin=subprocess.PIPE,
                         stdout=subprocess.PIPE,
                         stderr=subprocess.PIPE)
p.communicate()    # Let me know the results from this
Member
jstenar commented Jun 20, 2011

takluyver skrev 2011-06-20 22:43:

The default depends on region, so I'm assuming that it produces output in cp850 for you. We must therefore be decoding it correctly, re-encoding as UTF-8, then trying to decode again using cp850.

Can you try the following command, to check that the process is actually producing cp850 encoded text:

p = subprocess.Popen('dir /on', shell=True,
                          stdin=subprocess.PIPE,
                          stdout=subprocess.PIPE,
                          stderr=subprocess.PIPE)
p.communicate()    # Let me know the results from this

setting codepage to 850 before starting python I get:

(' Volymen i enhet C har etiketten Enhet C\r\n Volymens serienummer
\x84r 3260-74A6\r\n\r\n Inneh\x86ll i katalogen C:
python\slask\r\n\r\n2011-06-20 18:17 .\r\n2011-06-20
18:17 ..\r\n2011-06-20 18:1
7 \x86\x84\x94\r\n 0 fil(er)
0 byte\r\n 3 katalog(er) 1
2\xff799\xff311\xff872 byte ledigt\r\n',
'')

setting codepage to 1252 before starting python I get:
(' Volymen i enhet C har etiketten Enhet C\r\n Volymens serienummer
\xe4r 3260-74A6\r\n\r\n Inneh\xe5ll i katalogen C:
python\slask\r\n\r\n2011-06-20 18:17 .\r\n2011-06-20
18:17 ..\r\n2011-06-20 18:1
7 \xe5\xe4\xf6\r\n 0 fil(er)
0 byte\r\n 3 katalog(er) 1
2\xa0799\xa0299\xa0584 byte ledigt\r\n',
'')

So it seems I get the codepage of the python process.

/Jörgen

Owner

OK, and can you start the Qt console app, and try print u"åäö".encode(enc) with enc as 'cp850', 'cp1252' and 'utf-8' in turn.

Member
jstenar commented Jun 20, 2011

takluyver skrev 2011-06-21 00:07:

OK, and can you start the Qt console app, and try print u"åäö".encode(enc) with enc as 'cp850', 'cp1252' and 'utf-8' in turn.

Using master branch I get:
In [1]: print u"åäö".encode('cp850')
åäö

In [2]: "åäö"
Out[2]: '\xc3\xa5\xc3\xa4\xc3\xb6'

In [3]: print u"åäö".encode('cp1252')
Õõ÷

In [4]: print u"åäö".encode('utf-8')
├Ñ├ñ├Â

Using newapp branch I get:
I get the same traceback in all three cases:

Traceback (most recent call last):

File "c:\python\external\ipython\IPython\zmq\ipkernel.py", line 233,
in execute_request
shell.run_cell(code)

File "c:\python\external\ipython\IPython\core\interactiveshell.py",
line 2265, in run_cell
cell, raw_cell)

File "c:\python\external\ipython\IPython\core\history.py", line 391,
in store_inputs
self._i00 = source_raw

File "c:\python\external\ipython\IPython\utils\traitlets.py", line
301, in set
new_value = self._validate(obj, value)

File "c:\python\external\ipython\IPython\utils\traitlets.py", line
309, in _validate
return self.validate(obj, value)

File "c:\python\external\ipython\IPython\utils\traitlets.py", line
990, in validate
return unicode(value)

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 8:
ordinal not in range(128)

I get this traceback even when I just type in "åäö" or u"åäö" at the prompt

Owner

The newapp problem I found as well - it should be fixed if you pull again from git.

So it seems that Qt is rendering text in your system code page too. I guess somewhere we've assumed that Qt is expecting UTF-8 text. Don't know where off the top of my head.

Owner

@jstenar: Just to make sure, can you still replicate this with a fresh checkout?

Member
jstenar commented Jun 21, 2011

takluyver skrev 2011-06-21 14:55:

@jstenar: Just to make sure, can you still replicate this with a fresh checkout?

Running from master b45902e (it looks like newapp was merged).

In [1]: ls
Volymen i enhet C har etiketten Enhet C
Volymens serienummer är 3260-74A6

Innehåll i katalogen C:\python\slask

2011-06-20 18:17 .
2011-06-20 18:17 ..
2011-06-20 18:17 ├Ñ├ñ├Â
0 fil(er) 0 byte
3 katalog(er) 12 746 477 568 byte ledigt

Owner

And if you do print u"åäö" at the qt console, I assume it works as expected?

Member
jstenar commented Jun 21, 2011

takluyver skrev 2011-06-21 18:25:

And if you do print u"åäö" at the qt console, I assume it works as expected?

No I get:

In [1]: print u"åäö"
├Ñ├ñ├Â

/Jörgen

Owner

Oh, that's annoying. I've marked the bug as critical.

Can you check out my qtconsole-unicode-debug branch, and try the print command again. It should spit out reprs of the text it's trying to append at the terminal it's launched from.

https://github.com/takluyver/ipython/tree/qtconsole-unicode-debug

Member
jstenar commented Jun 21, 2011

takluyver skrev 2011-06-21 19:29:

Oh, that's annoying. I've marked the bug as critical.

Can you check out my qtconsole-unicode-debug branch, and try the print command again. It should spit out reprs of the text it's trying to append at the terminal it's launched from.

https://github.com/takluyver/ipython/tree/qtconsole-unicode-debug

I got this text at the terminal:
u'\u251c\xd1\u251c\xf1\u251c\xc2\n'

/Jörgen

Owner

OK, I think I've found it. Can you try with my iostream-unicode branch?

https://github.com/takluyver/ipython/tree/iostream-unicode

@takluyver takluyver was assigned Jun 21, 2011
Member
jstenar commented Jun 21, 2011

takluyver skrev 2011-06-21 20:12:

OK, I think I've found it. Can you try with my iostream-unicode branch?

https://github.com/takluyver/ipython/tree/iostream-unicode

It works fine.

/Jörgen

Owner

Excellent, thanks. I've made PR #534 for it.

Owner

Closed by 19d5c41.

@takluyver takluyver closed this Jun 22, 2011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment