Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAXParseError on unicode (Japanese) file #53487

Closed
gianzula mannequin opened this issue Jul 13, 2010 · 2 comments
Closed

SAXParseError on unicode (Japanese) file #53487

gianzula mannequin opened this issue Jul 13, 2010 · 2 comments
Labels
topic-XML type-bug An unexpected behavior, bug, or error

Comments

@gianzula
Copy link
Mannequin

gianzula mannequin commented Jul 13, 2010

BPO 9241
Nosy @amauryfa
Files
  • ff1a.xml
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2010-07-13.12:34:12.636>
    created_at = <Date 2010-07-13.09:04:35.100>
    labels = ['expert-XML', 'type-bug', 'invalid']
    title = 'SAXParseError on unicode (Japanese) file'
    updated_at = <Date 2010-07-13.12:34:12.630>
    user = 'https://bugs.python.org/gianzula'

    bugs.python.org fields:

    activity = <Date 2010-07-13.12:34:12.630>
    actor = 'amaury.forgeotdarc'
    assignee = 'none'
    closed = True
    closed_date = <Date 2010-07-13.12:34:12.636>
    closer = 'amaury.forgeotdarc'
    components = ['XML']
    creation = <Date 2010-07-13.09:04:35.100>
    creator = 'gianzula'
    dependencies = []
    files = ['17979']
    hgrepos = []
    issue_num = 9241
    keywords = []
    message_count = 2.0
    messages = ['110163', '110181']
    nosy_count = 2.0
    nosy_names = ['amaury.forgeotdarc', 'gianzula']
    pr_nums = []
    priority = 'normal'
    resolution = 'not a bug'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue9241'
    versions = ['Python 2.5']

    @gianzula
    Copy link
    Mannequin Author

    gianzula mannequin commented Jul 13, 2010

    When parsing a UTF-16 little-endian encoded XML file containing some japanese characters, the xml.sax.parse function raises a SAXParseException exception saying "no element found". Problem arises with/on:

    Python 2.5.2/Windows XP Pro SP3 32 bit
    Python 2.6.4/Windows XP Pro SP3 32 bit
    Python 2.5.2/Windows 2008 Server SP2 64 bit

    The same file is successfully processed with/on:

    Python 2.4.3/CentOS 5.4
    Python 2.6.3/CentOS 5.4

    I've attached a minimal XML file that contains a single U+FF1A japanese character that triggers the exception. Code for parsing the file follows:

    import xml.sax
    xml.sax.parse(open("ff1a.xml"), xml.sax.ContentHandler())

    Best regards,
    Gianfranco

    @gianzula gianzula mannequin added topic-XML type-bug An unexpected behavior, bug, or error labels Jul 13, 2010
    @amauryfa
    Copy link
    Member

    Your file contains the byte \x1a == EOF.
    You should not open it in text mode, but in binary mode, otherwise it's truncated.

    import xml.sax
    xml.sax.parse(open("ff1a.xml", 'rb'), xml.sax.ContentHandler())

    works on all versions I tried.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-XML type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant