-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
East asian characters are not aligned correctly in console output #604
Comments
see also: East Asian Width http://unicode.org/reports/tr11/ |
Thanks for a bug report and patch. I was both able to verify the problem and test that the patch fixes it. Imporing the unicodedata module used here is, unfortunately, very slow with Jython: $ time jython -c "import sys" $ time jython -c "from unicodedata import east_asian_width" Applying the patch in the current format would thus mean slowing the start-up time with Jython for 5 seconds, which clearly is not acceptable. Do you know is there any other method to find out how long these characters actually are? If there isn't, we need to use this fix only with Python. Because this problem apparently only affects the console output I consider it relatively low priority. |
I'am sure I can optimize this code with pre-compiled data, and I have dumped all wide chars with a script to do it. I am glad to hear any suggestions, and the script file attached, for anyone if interested. |
Pre-compiled data sounds like a good solution. I modified the attached script to print the number of characters and there only were 261 of them. I think it would be best to have a new module that would have both the characters and a single function to cut (and justify) the text correctly. xieyanbo, are you interested to try that out? We are going to do RF 2.5.2 in the near future and getting this in is still possible. |
Actually, that script print 261 range of wild characters, and 45647 is the total number. I have implement a prototype to replace east_asian_width function. The attachment generate_wild_chars.py output a module's source code, which include a function "is_wild_char". "is_wild_char(c)" have the same behaviors as "eaw(c) in 'WF'". You can do more optimize for it, but I think "is_wild_char" is good enough to work in our product. Have a try. |
We try to get this into 2.5.2 which we must get out this week. No promises at this point, though. |
Unfortunately we don't have time to get this into 2.5.2. =( |
This is now committed in r4005, r4006, and r4007. We also implemented check for combining characters that have width of 0. (This caused problems in mac, which uses NFD encoding for file names.) Now coming out in 2.5.3. Thanks for the brilliant patch xieyanbo! |
Great job, thanks to you guys! |
The generate script and east asian chars list in this page are not correct, don't use it. The correct version is in issue #1096 , use that. |
The width of some Unicode characters -- East asian -- is 2, that cause pybot's output aligned incorrectly.
Robot Framework 2.5 (Python 2.6.5 on darwin)
Demo:
0$ cat test_east_asian_width.txt
*** test cases ***
汉字应该正确对齐
Log Hello world!
#0$ pybot test_east_asian_width.txt
Test East Asian Width
汉字应该正确对齐 | PASS |
Test East Asian Width | PASS |
1 critical test, 1 passed, 0 failed
1 test total, 1 passed, 0 failed
After patched:
#0$ pybot test_east_asian_width.txt
Test East Asian Width
汉字应该正确对齐 | PASS |
Test East Asian Width | PASS |
1 critical test, 1 passed, 0 failed
1 test total, 1 passed, 0 failed
The patch and testcase attached.
The text was updated successfully, but these errors were encountered: