Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

East asian characters are not aligned correctly in console output #604

Closed
spooning opened this issue Jun 29, 2014 · 10 comments
Closed

East asian characters are not aligned correctly in console output #604

spooning opened this issue Jun 29, 2014 · 10 comments
Assignees
Milestone

Comments

@spooning
Copy link
Contributor

Originally submitted to Google Code by xieyanbo on 1 Aug 2010

The width of some Unicode characters -- East asian -- is 2, that cause pybot's output aligned incorrectly.

Robot Framework 2.5 (Python 2.6.5 on darwin)

Demo:

0$ cat test_east_asian_width.txt
*** test cases ***
汉字应该正确对齐
Log Hello world!
#0$ pybot test_east_asian_width.txt

Test East Asian Width

汉字应该正确对齐 | PASS |

Test East Asian Width | PASS |
1 critical test, 1 passed, 0 failed
1 test total, 1 passed, 0 failed

After patched:
#0$ pybot test_east_asian_width.txt

Test East Asian Width

汉字应该正确对齐 | PASS |

Test East Asian Width | PASS |
1 critical test, 1 passed, 0 failed
1 test total, 1 passed, 0 failed

The patch and testcase attached.

@spooning spooning added this to the 2.5.3 milestone Jun 29, 2014
@spooning
Copy link
Contributor Author

Originally submitted to Google Code by xieyanbo on 1 Aug 2010

see also: East Asian Width http://unicode.org/reports/tr11/

@spooning
Copy link
Contributor Author

Originally submitted to Google Code by @pekkaklarck on 4 Aug 2010

Thanks for a bug report and patch. I was both able to verify the problem and test that the patch fixes it.

Imporing the unicodedata module used here is, unfortunately, very slow with Jython:

$ time jython -c "import sys"
real 0m5.243s
user 0m5.664s
sys 0m0.392s

$ time jython -c "from unicodedata import east_asian_width"
real 0m10.867s
user 0m15.545s
sys 0m0.488s

Applying the patch in the current format would thus mean slowing the start-up time with Jython for 5 seconds, which clearly is not acceptable. Do you know is there any other method to find out how long these characters actually are? If there isn't, we need to use this fix only with Python.

Because this problem apparently only affects the console output I consider it relatively low priority.

@spooning
Copy link
Contributor Author

Originally submitted to Google Code by xieyanbo on 5 Aug 2010

I'am sure I can optimize this code with pre-compiled data, and I have dumped all wide chars with a script to do it. I am glad to hear any suggestions, and the script file attached, for anyone if interested.

@spooning
Copy link
Contributor Author

Originally submitted to Google Code by @pekkaklarck on 16 Aug 2010

Pre-compiled data sounds like a good solution. I modified the attached script to print the number of characters and there only were 261 of them. I think it would be best to have a new module that would have both the characters and a single function to cut (and justify) the text correctly. xieyanbo, are you interested to try that out? We are going to do RF 2.5.2 in the near future and getting this in is still possible.

@spooning
Copy link
Contributor Author

Originally submitted to Google Code by xieyanbo on 16 Aug 2010

Actually, that script print 261 range of wild characters, and 45647 is the total number. I have implement a prototype to replace east_asian_width function. The attachment generate_wild_chars.py output a module's source code, which include a function "is_wild_char". "is_wild_char(c)" have the same behaviors as "eaw(c) in 'WF'". You can do more optimize for it, but I think "is_wild_char" is good enough to work in our product. Have a try.

@spooning
Copy link
Contributor Author

Originally submitted to Google Code by @pekkaklarck on 23 Aug 2010

We try to get this into 2.5.2 which we must get out this week. No promises at this point, though.

@spooning
Copy link
Contributor Author

Originally submitted to Google Code by @pekkaklarck on 27 Aug 2010

Unfortunately we don't have time to get this into 2.5.2. =(

@spooning
Copy link
Contributor Author

Originally submitted to Google Code by @jussimalinen on 31 Aug 2010

This is now committed in r4005, r4006, and r4007. We also implemented check for combining characters that have width of 0. (This caused problems in mac, which uses NFD encoding for file names.) Now coming out in 2.5.3.

Thanks for the brilliant patch xieyanbo!

@spooning
Copy link
Contributor Author

Originally submitted to Google Code by xieyanbo on 31 Aug 2010

Great job, thanks to you guys!

@spooning
Copy link
Contributor Author

Originally submitted to Google Code by xieyanbo on 21 Mar 2012

The generate script and east asian chars list in this page are not correct, don't use it. The correct version is in issue #1096 , use that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants