New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cursor behavior for full/ambiguous width characters #9

Closed
buganini opened this Issue May 20, 2012 · 19 comments

Comments

Projects
None yet
4 participants
@buganini

buganini commented May 20, 2012

Currently I'm using following hack in my program:

--- a/pyte/screens.py
+++ b/pyte/screens.py
@@ -28,6 +28,7 @@ from __future__ import (
     absolute_import, unicode_literals, division
 )

+import bsdconv
 import copy
 import math
 import operator
@@ -36,6 +37,7 @@ from itertools import islice, repeat

 from . import modes as mo, graphics as g, charsets as cs

+width_counter=bsdconv.Bsdconv("utf-8:width:null")

 try:
     xrange
@@ -392,6 +394,10 @@ class Screen(list):
         char = char.translate([self.g0_charset,
                                self.g1_charset][self.charset])

+       width_counter.conv(char.encode("utf-8"))
+       width_info=width_counter.info()
+       width=width_info['full']*2+width_info['ambi']*2+width_info['half']
+
         # If this was the last column in a line and auto wrap mode is
         # enabled, move the cursor to the next line. Otherwise replace
         # characters already displayed with newly entered.
@@ -410,9 +416,13 @@ class Screen(list):
         self[self.cursor.y][self.cursor.x] = self.cursor.attrs \
             ._replace(data=char)

+       if width>1:
+               self[self.cursor.y][self.cursor.x+1] = self.cursor.attrs \
+                   ._replace(data="")
+
         # .. note:: We can't use :meth:`cursor_forward()`, because that
         #           way, we'll never know when to linefeed.
-        self.cursor.x += 1
+        self.cursor.x += width

     def carriage_return(self):
         """Move the cursor to the beginning of the current line."""

there should be a similar but cleaner way to determine character width, and especially, there is a set of "east-asian ambiguous width" characters, its width depends on locale, I treat it as fullwidth (*2) in my case.

http://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt
https://github.com/buganini/bsdconv/blob/master/codecs/inter/WIDTH.c

@superbobry

This comment has been minimized.

Contributor

superbobry commented Mar 28, 2013

Hello, can you please submit a test case for the patch?

@jquast

This comment has been minimized.

jquast commented Mar 25, 2014

there is also wcwidth.py, a port of the same in .c, which i happen to use in similar capacity

https://github.com/jquast/x84/blob/master/x84/bbs/wcwidth.py#L197

@superbobry

This comment has been minimized.

Contributor

superbobry commented Mar 25, 2014

Hello, Jeff, thank you for the link. wcwidth sounds like a way to go, but I'd prefer to re-use someone else's code here.

Do you now if there a standalone Python implementation somewhere? If not, did you consider using wcwidth from libc via ctypes?

@jquast

This comment has been minimized.

jquast commented Mar 25, 2014

  1. Regarding ctypes, I prefer pure python for these things, I presume you'd like your emulator to work equally well in windows or jython or whatever. You don't currently link with libc, do you?
  2. This is originally from http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c who hosts many famous utf8 demo files you can test with in its parent directory. The python version is almost directly translated. I think i found it from http://mothy.org/hacks/unicodewidth/
  3. its been an open issue in core python for some time, i think perhaps longer than this, but this is a tracking and open issue, http://bugs.python.org/issue12568
@jonathanslenders

This comment has been minimized.

jonathanslenders commented Mar 31, 2014

I agree with Jeff, the pure python wcwidth.py works wonderfully, but it shouldn't be in the x84 repository. Copy it over to the pyte library, or make a standalone Python package of it.

@jquast

This comment has been minimized.

jquast commented Apr 3, 2014

I'll make a standalone. Just so we're clear, コンニチハ in a terminal, when selected, should highlight two cells, not one, even though its only 5 glyphs, its 10 cells.

In [11]: wcwidth.wcswidth(u'コンニチハ'), len(u'コンニチハ')
Out[11]: (10, 5)
@jonathanslenders

This comment has been minimized.

jonathanslenders commented Apr 4, 2014

Thanks, @jquast, for me that's perfect. Can you put that on pypi? And are you willing to maintain that package? (if there is any maintenance.) If you could add some tests and docs to the project, that would be even nicer. ;)

@superbobry

This comment has been minimized.

Contributor

superbobry commented Apr 4, 2014

Sounds good to me as well. Also, +1 for the tests, there should be some for the libc version, right?

@jquast

This comment has been minimized.

jquast commented Apr 5, 2014

yup, working on it now.

@jquast

This comment has been minimized.

jquast commented Apr 8, 2014

still working on it. https://github.com/jquast/wcwidth/ Learning a lot about this stuff... I wrote a viewer utility to check my work, https://asciinema.org/a/8683

Plan to write some tests first out of what is there, then, do a rewrite of something that should be automatically maintainable by just feeding the latest unicode spec docs. Then I'll publish to pypi, eta sunday or so. I think the wcwidth() and wcswidth() interfaces, as module wcwidth will remain. eta to pypi about 5 days or so.

@jonathanslenders

This comment has been minimized.

jonathanslenders commented Apr 8, 2014

Great!

@jquast

This comment has been minimized.

jquast commented Apr 19, 2014

libc comparison complete, error report here: https://travis-ci.org/jquast/wcwidth/jobs/23326096

@superbobry you should inspect this error report regarding your request for tests compared to libc -- libc is grossly out of date compared to the spec, producing thousands of wrong results! presumably libc conforms to a posix standard that conforms to a unicode specification that is about 10+ years behind today's monospaced fonts. unicode specifications provide accurate report, will continue my efforts to work against those.

how else would you know the true width of a snowman, afterall.

libc=-1, ours=1, name=SNOWMAN WITHOUT SNOW, url http://codepoints.net/U+26C4 --oo⛄oo--

[ I do see a few errors, esp. regarding combining, (report as 1 but should actually be -1), but this is unicodedata.combining() reporting that they are not when they most certainly are. not sure if i want to fix that far in. ]

@superbobry

This comment has been minimized.

Contributor

superbobry commented Apr 20, 2014

@jquast thank you, you've done an impressive amount of work with this! I'll try to change pyte to use wcwidth during the next week. When do you plan to make a release on PyPI?

I don't see any issues with being incompatible with libc as long as wcwidth follows the Unicode spec.

@jquast

This comment has been minimized.

jquast commented May 5, 2014

I've released to pypi, https://pypi.python.org/pypi/wcwidth

Took a lot of (manual, visual) testing -- its all plugged into travis and coveralls -- let me know if you have any suggestions, there is some ambiguous areas regarding control, combining, and emoticon characters.

@superbobry

This comment has been minimized.

Contributor

superbobry commented May 8, 2014

I've been thinking about integrating wcwidth with pyte for a while and I'm not sure how to best approach this. Here's why: in the current implementation Screen is a just a character matrix, which implies that a character is always of unit width. We can work around this by adding stub characters, i.e.

def draw(self, ch):
    # ...
    width = wcswidth(ch)
    for offset in range(width - 1):
        self.buffer[self.cursor.y][self.cursor.x] = stub_character

but this is obviously a nasty hack, which might require more hacks in the future.

For instance, imagine we have the following character matrix

["", S]
 ^
 |
cursor is here

where S denotes a stub character. And then we do

screen.insert_characters(1)

We need to insert an empty character right after the cursor erasing whatever there is, but the current character has width = 2. So the question is: what resulting character matrix do we expect in this case?

@jonathanslenders

This comment has been minimized.

jonathanslenders commented May 9, 2014

I think we should replace the stub character by a space.

Try this:

echo -e "你好\e[Dx" # Print double width characters, move cursor one position left and print an 'x'

You'll see that we get a space there.

(on some terminals, you should use \b instead of \e[D)

@superbobry

This comment has been minimized.

Contributor

superbobry commented May 13, 2014

I've tried this in Guake with TERM=xterm and probably got something else

$ echo -e "你好\e[Dx" 
你好x

Can you post the expected output?

@jonathanslenders

This comment has been minimized.

jonathanslenders commented May 14, 2014

I think it should print:

你 x

(We print "你好", cursor one position left, and print 'x')
That's how I get it in OS X terminal, iterm and xterm.)

Just tried xfce4-terminal. He prints the left half of the 好 character, and prints the letter 'x' on top of the right part.
I would prefer however to replace the left part of the wide character with a space, because if we are going to render the output of pyte again to a terminal, that will work everywhere. (Otherwise, I propose to make it configurable.)

@superbobry superbobry closed this in 00de944 Jan 9, 2016

superbobry added a commit that referenced this issue Jan 9, 2016

@superbobry

This comment has been minimized.

Contributor

superbobry commented Jan 9, 2016

I've finally managed to add this to pyte. A big thanks to everyone involved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment