Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Py30a3] xml.parsers.expat recognizes encoding="utf-8" but not encoding="utf8" #46531

Closed
mark-summerfield mannequin opened this issue Mar 12, 2008 · 4 comments
Closed
Labels
stdlib Python modules in the Lib dir topic-XML type-bug An unexpected behavior, bug, or error

Comments

@mark-summerfield
Copy link
Mannequin

mark-summerfield mannequin commented Mar 12, 2008

BPO 2278
Nosy @birkenfeld, @mark-summerfield, @benjaminp

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2008-03-17.07:42:41.887>
created_at = <Date 2008-03-12.11:03:55.861>
labels = ['expert-XML', 'type-bug', 'library']
title = '[Py30a3] xml.parsers.expat recognizes encoding="utf-8" but not encoding="utf8"'
updated_at = <Date 2008-03-17.07:42:41.833>
user = 'https://github.com/mark-summerfield'

bugs.python.org fields:

activity = <Date 2008-03-17.07:42:41.833>
actor = 'georg.brandl'
assignee = 'none'
closed = True
closed_date = <Date 2008-03-17.07:42:41.887>
closer = 'georg.brandl'
components = ['Library (Lib)', 'XML']
creation = <Date 2008-03-12.11:03:55.861>
creator = 'mark'
dependencies = []
files = []
hgrepos = []
issue_num = 2278
keywords = []
message_count = 4.0
messages = ['63471', '63516', '63558', '63621']
nosy_count = 3.0
nosy_names = ['georg.brandl', 'mark', 'benjamin.peterson']
pr_nums = []
priority = 'normal'
resolution = 'wont fix'
stage = None
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue2278'
versions = ['Python 3.0']

@mark-summerfield
Copy link
Mannequin Author

mark-summerfield mannequin commented Mar 12, 2008

Here is how to reproduce the bug:

from xml.etree.ElementTree import parse
import io
xml1 = """<?xml version="1.0" encoding="utf8"?>
<test>text</test>"""
xml2 = """<?xml version="1.0" encoding="utf-8"?>
<test>text</test>"""
f1 = io.StringIO(xml1)
f2 = io.StringIO(xml2)
tree2 = parse(f2) # this uses "utf-8" and works fine
tree1 = parse(f1)
Traceback (most recent call last):
  File "<pyshell#20>", line 1, in <module>
    tree1 = parse(f1)
  File
"/home/mark/opt/python30a3/lib/python3.0/xml/etree/ElementTree.py", line
823, in parse
    tree.parse(source, parser)
  File
"/home/mark/opt/python30a3/lib/python3.0/xml/etree/ElementTree.py", line
561, in parse
    parser.feed(data)
  File
"/home/mark/opt/python30a3/lib/python3.0/xml/etree/ElementTree.py", line
1201, in feed
    self._parser.Parse(data, 0)
xml.parsers.expat.ExpatError: unknown encoding: line 1, column 30

@mark-summerfield mark-summerfield mannequin added stdlib Python modules in the Lib dir topic-XML type-bug An unexpected behavior, bug, or error labels Mar 13, 2008
@benjaminp
Copy link
Contributor

Should the parser recognize "utf8"? I looked at the XML standard [1] and
it referred me to the IANA's charts [2]. It appears the the only correct
way to denote UTF-8 is "UTF-8".

[1] http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
[2] http://www.iana.org/assignments/character-sets

@mark-summerfield
Copy link
Mannequin Author

mark-summerfield mannequin commented Mar 15, 2008

You're right that the parser should not recognise "utf8" since it isn't
correct XML (as per the references you gave).

I made the mistake because I used the etree module and wrote an XML file
with encoding "utf8" which etree accepted. I've now switched to using
"UTF-8".

@birkenfeld
Copy link
Member

Okay to close this, then?

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir topic-XML type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants