Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expat parser not xml 1.1 (breaks xmlrpclib) #56013

Closed
xrg mannequin opened this issue Apr 8, 2011 · 12 comments
Closed

expat parser not xml 1.1 (breaks xmlrpclib) #56013

xrg mannequin opened this issue Apr 8, 2011 · 12 comments
Labels
topic-XML type-bug An unexpected behavior, bug, or error

Comments

@xrg
Copy link
Mannequin

xrg mannequin commented Apr 8, 2011

BPO 11804
Nosy @loewis, @amauryfa, @ezio-melotti
Files
  • expat-test.py: Test of expat compliance to xml 1.1
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2012-05-27.08:00:59.481>
    created_at = <Date 2011-04-08.09:33:02.874>
    labels = ['expert-XML', 'type-bug']
    title = 'expat parser not xml 1.1 (breaks xmlrpclib)'
    updated_at = <Date 2012-05-27.08:00:59.480>
    user = 'https://bugs.python.org/xrg'

    bugs.python.org fields:

    activity = <Date 2012-05-27.08:00:59.480>
    actor = 'loewis'
    assignee = 'none'
    closed = True
    closed_date = <Date 2012-05-27.08:00:59.481>
    closer = 'loewis'
    components = ['XML']
    creation = <Date 2011-04-08.09:33:02.874>
    creator = 'xrg'
    dependencies = []
    files = ['21580']
    hgrepos = []
    issue_num = 11804
    keywords = []
    message_count = 12.0
    messages = ['133301', '161341', '161342', '161346', '161491', '161503', '161520', '161521', '161697', '161699', '161700', '161701']
    nosy_count = 6.0
    nosy_names = ['loewis', 'amaury.forgeotdarc', 'ezio.melotti', 'santoso.wijaya', 'xrg', 'Phil.Daintree']
    pr_nums = []
    priority = 'normal'
    resolution = 'wont fix'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue11804'
    versions = ['Python 2.7']

    @xrg
    Copy link
    Mannequin Author

    xrg mannequin commented Apr 8, 2011

    The expat library (in C level) is not xml 1.1 compliant, meaning that
    it won't accept characters \x01-\x08,\x0b,\x0c and \x0e-\x1f .
    At the same time, ElementTree (or custom XML creation, such as in xmlrpclib.py:694) allow these characters to pass through. They will get blocked on the receiving side.
    Since 2.7, the expat library is the default parser for xml-rpc, so it
    this is a regression, IMHO. According to the network principal, we should
    accept these characters gracefully.

    The attached test script demonstrates that we're not xml 1.1 compliant (but instead enforce the more strict 1.0 rule)

    References:
    http://bugs.python.org/issue5166
    http://en.wikipedia.org/wiki/Valid_characters_in_XML

    @xrg xrg mannequin added topic-XML type-bug An unexpected behavior, bug, or error labels Apr 8, 2011
    @PhilDaintree
    Copy link
    Mannequin

    PhilDaintree mannequin commented May 22, 2012

    Another example - the following xml returned and displayed from verbose mode:

    <?xml version="1.0"?>
    <methodResponse>
    <params>
    <param>
    <value><array>
    <data>
    <value><string>0001</string></value>
    <value><string>001</string></value>
    <value><string>002</string></value>
    <value><string>100</string></value>
    <value><string>121213</string></value>
    <value><string>123456</string></value>
    <value><string>291</string></value>
    <value><string>321654</string></value>
    <value><string>580</string></value>
    <value><string>ABS</string></value>
    <value><string>ACTIVE</string></value>
    <value><string>AIRCON</string></value>
    <value><string>ALIEJA</string></value>
    <value><string>AMP</string></value>
    <value><string>ASSETS</string></value>
    <value><string>BAKE</string></value>
    <value><string>BRACE</string></value>
    <value><string>BYC</string></value>
    <value><string>CARRO</string></value>
    <value><string>CARTON</string></value>
    <value><string>CO</string></value>
    <value><string>COMPS</string></value>
    <value><string>CULOIL</string></value>
    <value><string>DECOR</string></value>
    <value><string>DVD</string></value>
    <value><string>E</string></value>
    <value><string>FOOD</string></value>
    <value><string>HDD</string></value>
    <value><string>INF</string></value>
    <value><string>LAB</string></value>
    <value><string>LINER</string></value>
    <value><string>LL</string></value>
    <value><string>MCNBI</string></value>
    <value><string>MEDS</string></value>
    <value><string>MODEL1</string></value>
    <value><string>NEM</string></value>
    <value><string>PEÃ\x87AS</string></value>
    <value><string>PENS</string></value>
    <value><string>PHONE</string></value>
    <value><string>PLANT</string></value>
    <value><string>PRJCTR</string></value>
    <value><string>PROD</string></value>
    <value><string>SERV</string></value>
    <value><string>SOCKS</string></value>
    <value><string>SS</string></value>
    <value><string>SW</string></value>
    <value><string>TACON</string></value>
    <value><string>TEST12</string></value>
    <value><string>VEGTAB</string></value>
    <value><string>ZFR</string></value>
    </data>
    </array></value>
    </param>
    </params>
    </methodResponse>

    will not parse with the error:

    File "/usr/lib/python2.7/xmlrpclib.py", line 557, in feed
    self._parser.Parse(data, 0)
    xml.parsers.expat.ExpatError: not well-formed (invalid token): line 43, column 23

    the following unicode characters on that line are the trouble:

    <value><string>PEÃ\x87AS</string></value>

    @PhilDaintree
    Copy link
    Mannequin

    PhilDaintree mannequin commented May 22, 2012

    The xml parses happily at http://www.w3schools.com/xml/xml_validator.asp

    @amauryfa
    Copy link
    Member

    In sample above, is "\x87" one character, or 4 ascii characters?

    @PhilDaintree
    Copy link
    Mannequin

    PhilDaintree mannequin commented May 24, 2012

    The field in question contains the utf-8 text: PEÇAS

    @amauryfa
    Copy link
    Member

    Yes, but where does this data come from? how did you feed it to the parser? And this does not relate to xml 1.1.

    BTW, I found this page about XML 1.1:
    http://www.cafeconleche.org/books/effectivexml/chapters/03.html

    """
    Everything you need to know about XML 1.1 can be summed up in two rules:

    • Don't use it.
    • (For experts only) If you speak Mongolian, Yi, Cambodian, Amharic, Dhivehi, Burmese or a very few other languages and you want to write your markup (not your text but your markup) in these languages, then you can set the version attribute of the XML declaration to 1.1. Otherwise, refer to rule 1.
      """

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 24, 2012

    This has nothing to do with XML 1.1 (so closing this report as "won't fix").

    The UTF-8 text that you present works very well:

    >>> p=xml.parsers.expat.ParserCreate(encoding="utf-8")
    >>> p.Parse("<x>\xc3\x87</x", 1)
    1

    The character LATIN CAPITAL LETTER C WITH CEDILLA is definitely supported in XML 1.0, so there is no need for XML 1.1 here.

    If this still fails to parse for you, it may be because the input is actually different, e.g.

    >>> p=xml.parsers.expat.ParserCreate(encoding="utf-8")
    >>> p.Parse("<x>&#195;\x87</x>", 1)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, column 9

    I.e. the input might contain the character &, #, 1, 9, 5, ;, and \x87. That is ill-formed UTF-8, and the parser is right to choke on it. Even if it was declared as XML 1.1, it will still be ill-formed, because it still would be invalid UTF-8.

    @loewis loewis mannequin closed this as completed May 24, 2012
    @xrg
    Copy link
    Mannequin Author

    xrg mannequin commented May 24, 2012

    I'm reopening the bug, as your last comment does not cover the initial report. We are not talking about invalid UTF8 here, but legal low-ASCII values.

    @xrg xrg mannequin reopened this May 24, 2012
    @PhilDaintree
    Copy link
    Mannequin

    PhilDaintree mannequin commented May 27, 2012

    Well maybe this should be a different bug as it is clearly not xml 1.1 related as the linue in the xml gives away :-)

    <?xml version="1.0"?>

    To repeat the bug ... using the webERP demo data

    #!/usr/bin/env python

    import xmlrpclib
    
    x_server = xmlrpclib.Server('http://www.weberp.org/weberp/api/api_xml-rpc.php',verbose=True)
    #Get the stock items defined in the demo webERP installation
    StockList = x_server.weberp.xmlrpc_SearchStockItems('discontinued','0','admin','weberp')
    
    if StockList[0]==0:
    	for StockID in StockList[1]:
    		print str(StockID)

    The webERP xml-rpc server uses XMLRPC for PHP http://phpxmlrpc.sourceforge.net/

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 27, 2012

    Phil: it seems you have hijacked the bug report. Don't do that. If you want to report a bug, please create a new bug report. Structure it as follows:

    1. this is what I did
    2. this is what happened
    3. this is what should have happened instead.

    @PhilDaintree
    Copy link
    Mannequin

    PhilDaintree mannequin commented May 27, 2012

    or for less data...

    #!/usr/bin/env python

    import xmlrpclib
    
    x_server = xmlrpclib.Server('http://www.weberp.org/weberp/api/api_xml-rpc.php',verbose=True)
    #Get the stock items defined in the webERP installation
    StockList = x_server.weberp.xmlrpc_SearchStockItems('units','cm','admin','weberp')
    
    if StockList[0]==0:
    	for StockID in StockList[1]:
    		print str(StockID)

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 27, 2012

    Panos: you are right. The original issue still exists.

    However, it is not a bug in Python, but a in the expat library. So I am now closing this report as out-of-scope for Python.

    There is a bug report open on expat requesting support for XML 1.1, see

    http://sourceforge.net/tracker/?func=detail&atid=110127&aid=891265&group_id=10127

    This bug report is open since 2004. I see little hope that expat will support XML 1.1 within the next five years.

    I also fail to see the regression: expat has never supported XML 1.1.
    xmlrpclib always used expat, at least since Python 2.0. In any case, this report is about expat, not xmlrpclib, so any possible regression in xmlrpclib should be reported separately.

    @loewis loewis mannequin closed this as completed May 27, 2012
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-XML type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant