Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse code

Unicode handling.

  • Loading branch information...
commit 506015955ade7a9cebfa2458b2ca31300c572e5e 1 parent e8b9d2b
Ralph Bean authored
10  ansi2html/converter.py
@@ -305,5 +305,11 @@ def _print(output):
305 305
         return
306 306
 
307 307
     # Otherwise, just process the whole thing in one go
308  
-    output = conv.convert(" ".join(sys.stdin.readlines()))
309  
-    _print(output)
  308
+    if six.PY3:
  309
+        output = conv.convert(" ".join(sys.stdin.readlines()))
  310
+        _print(output)
  311
+    else:
  312
+        output = conv.convert(six.u(" ").join(
  313
+            map(six.u, sys.stdin.readlines())
  314
+        ))
  315
+        _print(output.encode(opts.output_encoding))
7  tests/test_ansi2html.py
@@ -54,7 +54,12 @@ def test_conversion_as_command(self, mock_stdout, mock_argv):
54 54
         with open(join(_here, "ansicolor.html"), "rb") as output:
55 55
             expected_data = "".join(read_to_unicode(output))
56 56
 
57  
-        with patch("sys.stdin", new_callable=lambda: six.StringIO(test_data)):
  57
+        if six.PY3:
  58
+            f = lambda: six.StringIO(test_data)
  59
+        else:
  60
+            f = lambda: six.StringIO(test_data.encode('utf-8'))
  61
+
  62
+        with patch("sys.stdin", new_callable=f):
58 63
             main()
59 64
 
60 65
         html = mock_stdout.getvalue()

10 notes on commit 5060159

Martin Zimmermann

Output encoding is also important for --partial and --inline and I think you missed the input encoding. When I echo "ä" | lolcat | ansi2html I get a ä.

Ralph Bean
Owner

What is lolcat in your example?

Ralph Bean
Owner

The part with --partial is fixed in f31c7dd

Martin Zimmermann

That's this tool here: https://github.com/busyloop/lolcat

My current approach for the encoding thing is this 59014c7 commit. Not nice, but works for me. To get this removed, the whole internal string flow must be unicode aware...

Ralph Bean
Owner

Here's an attempt at the encoding thing in db7b4db -- it's the "unicode sandwich" model where when any text enters the program it is immediately decoded into unicode and before any text leaves the program, it is re-encoded back into bytes. Everything the program does internally is all unicode all the way.

Will that do it?

Martin Zimmermann

Works for me! But I can't say if it is the right way to do it in python3. My which python is still version 2 :)

Ralph Bean
Owner

:) I'm just supporting it in a "looking to the future" kind of way. The test suite still passes for python3, so let's just wait until someone complains about it.

Martin Zimmermann

I still get an error when I use ansi2html (develop from GitHub) in a python subprocess and stdin. I guess output_unicode also needs to be encoded with the output encoding. When I run ansi2html in a regular bash session, I don't have this issue. Here's what I do in python

  File "/usr/local/share/python/ansi2html", line 9, in <module>
    load_entry_point('ansi2html==0.9.1', 'console_scripts', 'ansi2html')()
  File "/usr/local/Cellar/python/2.7.3/lib/python2.7/site-packages/ansi2html-0.9.1-py2.7.egg/ansi2html/converter.py", line 315, in main
    _print(conv.convert(ansi=line, full=False)[:-1])
  File "/usr/local/Cellar/python/2.7.3/lib/python2.7/site-packages/ansi2html-0.9.1-py2.7.egg/ansi2html/converter.py", line 304, in _print
    print(output_unicode)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xdf' in position 2175: ordinal not in range(128)
Ralph Bean
Owner

Can you see if 130195c helps?

Martin Zimmermann

Yep, this helps. Thanks alot!

Please sign in to comment.
Something went wrong with that request. Please try again.