New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrote table title with unicode strings and got some string content is unreadable #88

Open
tdy218 opened this Issue Sep 6, 2017 · 4 comments

Comments

Projects
None yet
2 participants
@tdy218

tdy218 commented Sep 6, 2017

I wrote a excel file by openpyxl with Jython 2.7.1 final, when i opened the file, i got the "Could not open xxx.xlsx because some content is unreadable....." message, look at the following picture.
popup warning information
Then i clicked "Open and Repair" button to open the problematic excel file. i found the problem came from the table title, all the unicode(Chinese) string display normally, but it's existing problem in fact, look at the following picture.
incorrect unicode string

But the same code is working fine under Python 2.7.5

works fine under Python 2.7.5

To reproduce this issue, you should do pip install openpyxl , and then run the following Python/Jython script file.
write_unicode_table_title.py

# -*- coding:utf-8 -*-
import sys, string
import openpyxl
from openpyxl import styles
from openpyxl.worksheet import table

reload(sys)
sys.setdefaultencoding('utf-8')

wb = openpyxl.Workbook()
ws = wb.active
ws.title = "demo"
table_start_row = 7
wlserver_config_mbeans = 4 # based on original code
table_end_row = table_start_row + wlserver_config_mbeans

table_style = table.TableStyleInfo(name="TableStyleMedium9", showRowStripes=True)
table = table.Table(displayName=u'WLServer列表', ref="A{0}:F{1}".format(table_start_row, table_end_row), tableStyleInfo=table_style)

servers_table_title = [u'序号', u'Server名称', u'Server监听地址', u'Server监听端口', u'Server启动状态', u'Server健康状态']

for index, item in enumerate(servers_table_title):
    ws[string.ascii_uppercase[index] + str(table_start_row)] = item
    ws[string.ascii_uppercase[index] + str(table_start_row)].font = styles.Font(name='Microsoft YaHei', color=styles.colors.WHITE, bold=False)
ws.add_table(table)
wb.save('bug899_1.xlsx')
@jeff5

This comment has been minimized.

Show comment
Hide comment
@jeff5

jeff5 Sep 8, 2017

Member

Thanks for the very clear evidence of the problem as you experience it.

The trick

reload(sys)
sys.setdefaultencoding('utf-8')

is unreliable on any Python. The Jython sys "module" is not properly reloaded because of differences in implementation from CPython. The default encoding is always 'ascii' in practice, so the effect of changing it will not have been tested. In fact, the only place I can see it used, the test is skipped.

See http://bugs.jython.org/issue1875 and https://anonbadger.wordpress.com/2015/06/16/why-sys-setdefaultencoding-will-break-code/

This may be the source of your problem. openpyxl depends on xml.etree (in Jython) for its encoding as far as I can see. I think the author of eml.etree knows better than to make you call setdefaultencoding but I wasn't able to follow how that works in the time I've given it.

Member

jeff5 commented Sep 8, 2017

Thanks for the very clear evidence of the problem as you experience it.

The trick

reload(sys)
sys.setdefaultencoding('utf-8')

is unreliable on any Python. The Jython sys "module" is not properly reloaded because of differences in implementation from CPython. The default encoding is always 'ascii' in practice, so the effect of changing it will not have been tested. In fact, the only place I can see it used, the test is skipped.

See http://bugs.jython.org/issue1875 and https://anonbadger.wordpress.com/2015/06/16/why-sys-setdefaultencoding-will-break-code/

This may be the source of your problem. openpyxl depends on xml.etree (in Jython) for its encoding as far as I can see. I think the author of eml.etree knows better than to make you call setdefaultencoding but I wasn't able to follow how that works in the time I've given it.

@tdy218

This comment has been minimized.

Show comment
Hide comment
@tdy218

tdy218 Sep 11, 2017

If i put sys.getdefaultencoding() after the following codes, i can get utf-8 charset encoding.

reload(sys)
sys.setdefaultencoding('utf-8')

And this is very useful solution for unicode string in Python 2.x even though it exist some issues in exceptional cases.

I try the following code in my case(remove the above code), but it doesn't work also.

from org.python.core import codecs
codecs.setDefaultEncoding('utf-8')

tdy218 commented Sep 11, 2017

If i put sys.getdefaultencoding() after the following codes, i can get utf-8 charset encoding.

reload(sys)
sys.setdefaultencoding('utf-8')

And this is very useful solution for unicode string in Python 2.x even though it exist some issues in exceptional cases.

I try the following code in my case(remove the above code), but it doesn't work also.

from org.python.core import codecs
codecs.setDefaultEncoding('utf-8')

@jeff5

This comment has been minimized.

Show comment
Hide comment
@jeff5

jeff5 Oct 21, 2017

Member

I have raised http://bugs.jython.org/issue2633 to cover this. Despite similarities, I don't believe it's related to #90.

Member

jeff5 commented Oct 21, 2017

I have raised http://bugs.jython.org/issue2633 to cover this. Despite similarities, I don't believe it's related to #90.

@tdy218

This comment has been minimized.

Show comment
Hide comment
@tdy218

tdy218 commented Oct 21, 2017

@jeff5 Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment