Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode xmlcharrefreplace produces backslash not xml style. #44661

Closed
odontomatix mannequin opened this issue Mar 5, 2007 · 2 comments
Closed

Unicode xmlcharrefreplace produces backslash not xml style. #44661

odontomatix mannequin opened this issue Mar 5, 2007 · 2 comments

Comments

@odontomatix
Copy link
Mannequin

odontomatix mannequin commented Mar 5, 2007

BPO 1674223
Nosy @doerwalter

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2007-03-05.17:54:51.000>
created_at = <Date 2007-03-05.17:11:09.000>
labels = ['invalid']
title = 'Unicode xmlcharrefreplace produces backslash not xml style.'
updated_at = <Date 2007-03-05.17:54:51.000>
user = 'https://bugs.python.org/odontomatix'

bugs.python.org fields:

activity = <Date 2007-03-05.17:54:51.000>
actor = 'doerwalter'
assignee = 'none'
closed = True
closed_date = None
closer = None
components = ['None']
creation = <Date 2007-03-05.17:11:09.000>
creator = 'odontomatix'
dependencies = []
files = []
hgrepos = []
issue_num = 1674223
keywords = []
message_count = 2.0
messages = ['31431', '31432']
nosy_count = 2.0
nosy_names = ['doerwalter', 'odontomatix']
pr_nums = []
priority = 'normal'
resolution = 'not a bug'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue1674223'
versions = ['Python 2.5']

@odontomatix
Copy link
Mannequin Author

odontomatix mannequin commented Mar 5, 2007

In Python 2.4.2 and 2.5, and maybe other versions too, the unicode string encoder for producing xml style unicode output (example, © for copyright symbol) produces the wrong style -- it produces backslash encoding (\xa9 for same copyright unicode character).

Example at Python shell:
u'\u2122'.encode('unicode_escape','xmlcharrefreplace')
should produce: ™
but it produces \u2122

The same happens when it is used in a program. The print output of the encoded unicode contains backslash encodings as though the method 'backslashreplace' had been used.

@odontomatix odontomatix mannequin closed this as completed Mar 5, 2007
@odontomatix odontomatix mannequin added the invalid label Mar 5, 2007
@doerwalter
Copy link
Contributor

u'\u2122'.encode('unicode_escape','xmlcharrefreplace') produces \u2122 because that's the way the unicode_escape codec outputs unicode codepoints. For unicode_escape the xmlcharrefreplace error handler never kicks in. If you want the error handler to kick in, you have to use an encoding that doesn't support the character you want to encode. The best candidate is probably ascii:

>> u"\u2122".encode("ascii", "xmlcharrefreplace")
>> '"'

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant