Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XMLGenerator ignores encoding in output #40169

Closed
mlh mannequin opened this issue Apr 19, 2004 · 3 comments
Closed

XMLGenerator ignores encoding in output #40169

mlh mannequin opened this issue Apr 19, 2004 · 3 comments
Assignees

Comments

@mlh
Copy link
Mannequin

mlh mannequin commented Apr 19, 2004

BPO 938076
Nosy @loewis

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/loewis'
closed_at = <Date 2004-05-06.02:40:48.000>
created_at = <Date 2004-04-19.17:18:47.000>
labels = ['expert-XML']
title = 'XMLGenerator ignores encoding in output'
updated_at = <Date 2004-05-06.02:40:48.000>
user = 'https://bugs.python.org/mlh'

bugs.python.org fields:

activity = <Date 2004-05-06.02:40:48.000>
actor = 'loewis'
assignee = 'loewis'
closed = True
closed_date = None
closer = None
components = ['XML']
creation = <Date 2004-04-19.17:18:47.000>
creator = 'mlh'
dependencies = []
files = []
hgrepos = []
issue_num = 938076
keywords = []
message_count = 3.0
messages = ['20545', '20546', '20547']
nosy_count = 2.0
nosy_names = ['loewis', 'mlh']
pr_nums = []
priority = 'high'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue938076'
versions = ['Python 2.3']

@mlh
Copy link
Mannequin Author

mlh mannequin commented Apr 19, 2004

When XMLGenerator is supplied with an encoding such as
'utf-8' and subsequently with some non-ASCII Unicode
characters, it crashes, because of its characters()
method. The current version is:

def characters(self, content):
    self._out.write(escape(content))

This completely ignores the encoding, and will (when
writing to something such as a StringIO or the like)
simply try to convert this into an ASCII string. The
encoding is only used in the XML header, not as the
real encoding!

It may be that I've gotten things wrong, but I would
suggest the following fix:

def characters(self, content):
    self._out.write(escape(content).encode(self._encoding))

This seems to work well for me, at least.

@mlh mlh mannequin closed this as completed Apr 19, 2004
@mlh mlh mannequin assigned loewis Apr 19, 2004
@mlh mlh mannequin added the topic-XML label Apr 19, 2004
@mlh mlh mannequin closed this as completed Apr 19, 2004
@mlh mlh mannequin assigned loewis Apr 19, 2004
@mlh mlh mannequin added the topic-XML label Apr 19, 2004
@loewis
Copy link
Mannequin

loewis mannequin commented Apr 20, 2004

Logged In: YES
user_id=21627

In general, it would be even better to generate character
references for characters not representable in the output
encoding.

@loewis
Copy link
Mannequin

loewis mannequin commented May 6, 2004

Logged In: YES
user_id=21627

Thanks for pointing this out. Fixed in

saxutils.py 1.21.10.2 1.23
NEWS 1.831.4.105

Fix in PyXML is pending.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

0 participants