Permalink
Browse files

Unicode handling.

  • Loading branch information...
1 parent e8b9d2b commit 506015955ade7a9cebfa2458b2ca31300c572e5e @ralphbean committed Aug 25, 2012
Showing with 14 additions and 3 deletions.
  1. +8 −2 ansi2html/converter.py
  2. +6 −1 tests/test_ansi2html.py
View
@@ -305,5 +305,11 @@ def _print(output):
return
# Otherwise, just process the whole thing in one go
- output = conv.convert(" ".join(sys.stdin.readlines()))
- _print(output)
+ if six.PY3:
+ output = conv.convert(" ".join(sys.stdin.readlines()))
+ _print(output)
+ else:
+ output = conv.convert(six.u(" ").join(
+ map(six.u, sys.stdin.readlines())
+ ))
+ _print(output.encode(opts.output_encoding))
View
@@ -54,7 +54,12 @@ def test_conversion_as_command(self, mock_stdout, mock_argv):
with open(join(_here, "ansicolor.html"), "rb") as output:
expected_data = "".join(read_to_unicode(output))
- with patch("sys.stdin", new_callable=lambda: six.StringIO(test_data)):
+ if six.PY3:
+ f = lambda: six.StringIO(test_data)
+ else:
+ f = lambda: six.StringIO(test_data.encode('utf-8'))
+
+ with patch("sys.stdin", new_callable=f):
main()
html = mock_stdout.getvalue()

10 comments on commit 5060159

Contributor

posativ replied Sep 1, 2012

Output encoding is also important for --partial and --inline and I think you missed the input encoding. When I echo "ä" | lolcat | ansi2html I get a ä.

Owner

ralphbean replied Sep 2, 2012

What is lolcat in your example?

Owner

ralphbean replied Sep 2, 2012

The part with --partial is fixed in f31c7dd

Contributor

posativ replied Sep 2, 2012

That's this tool here: https://github.com/busyloop/lolcat

My current approach for the encoding thing is this 59014c7 commit. Not nice, but works for me. To get this removed, the whole internal string flow must be unicode aware...

Owner

ralphbean replied Sep 3, 2012

Here's an attempt at the encoding thing in db7b4db -- it's the "unicode sandwich" model where when any text enters the program it is immediately decoded into unicode and before any text leaves the program, it is re-encoded back into bytes. Everything the program does internally is all unicode all the way.

Will that do it?

Contributor

posativ replied Sep 3, 2012

Works for me! But I can't say if it is the right way to do it in python3. My which python is still version 2 :)

Owner

ralphbean replied Sep 3, 2012

:) I'm just supporting it in a "looking to the future" kind of way. The test suite still passes for python3, so let's just wait until someone complains about it.

Contributor

posativ replied Sep 9, 2012

I still get an error when I use ansi2html (develop from GitHub) in a python subprocess and stdin. I guess output_unicode also needs to be encoded with the output encoding. When I run ansi2html in a regular bash session, I don't have this issue. Here's what I do in python

  File "/usr/local/share/python/ansi2html", line 9, in <module>
    load_entry_point('ansi2html==0.9.1', 'console_scripts', 'ansi2html')()
  File "/usr/local/Cellar/python/2.7.3/lib/python2.7/site-packages/ansi2html-0.9.1-py2.7.egg/ansi2html/converter.py", line 315, in main
    _print(conv.convert(ansi=line, full=False)[:-1])
  File "/usr/local/Cellar/python/2.7.3/lib/python2.7/site-packages/ansi2html-0.9.1-py2.7.egg/ansi2html/converter.py", line 304, in _print
    print(output_unicode)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xdf' in position 2175: ordinal not in range(128)
Owner

ralphbean replied Sep 9, 2012

Can you see if 130195c helps?

Contributor

posativ replied Sep 9, 2012

Yep, this helps. Thanks alot!

Please sign in to comment.