New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
expat parser not xml 1.1 (breaks xmlrpclib) #56013
Comments
The expat library (in C level) is not xml 1.1 compliant, meaning that The attached test script demonstrates that we're not xml 1.1 compliant (but instead enforce the more strict 1.0 rule) References: |
Another example - the following xml returned and displayed from verbose mode: <?xml version="1.0"?> will not parse with the error: File "/usr/lib/python2.7/xmlrpclib.py", line 557, in feed the following unicode characters on that line are the trouble: <value><string>PEÃ\x87AS</string></value> |
The xml parses happily at http://www.w3schools.com/xml/xml_validator.asp |
In sample above, is "\x87" one character, or 4 ascii characters? |
The field in question contains the utf-8 text: PEÇAS |
Yes, but where does this data come from? how did you feed it to the parser? And this does not relate to xml 1.1. BTW, I found this page about XML 1.1: """
|
This has nothing to do with XML 1.1 (so closing this report as "won't fix"). The UTF-8 text that you present works very well: >>> p=xml.parsers.expat.ParserCreate(encoding="utf-8")
>>> p.Parse("<x>\xc3\x87</x", 1)
1 The character LATIN CAPITAL LETTER C WITH CEDILLA is definitely supported in XML 1.0, so there is no need for XML 1.1 here. If this still fails to parse for you, it may be because the input is actually different, e.g. >>> p=xml.parsers.expat.ParserCreate(encoding="utf-8")
>>> p.Parse("<x>Ã\x87</x>", 1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, column 9 I.e. the input might contain the character &, #, 1, 9, 5, ;, and \x87. That is ill-formed UTF-8, and the parser is right to choke on it. Even if it was declared as XML 1.1, it will still be ill-formed, because it still would be invalid UTF-8. |
I'm reopening the bug, as your last comment does not cover the initial report. We are not talking about invalid UTF8 here, but legal low-ASCII values. |
Well maybe this should be a different bug as it is clearly not xml 1.1 related as the linue in the xml gives away :-) <?xml version="1.0"?> To repeat the bug ... using the webERP demo data #!/usr/bin/env python import xmlrpclib
x_server = xmlrpclib.Server('http://www.weberp.org/weberp/api/api_xml-rpc.php',verbose=True)
#Get the stock items defined in the demo webERP installation
StockList = x_server.weberp.xmlrpc_SearchStockItems('discontinued','0','admin','weberp')
if StockList[0]==0:
for StockID in StockList[1]:
print str(StockID) The webERP xml-rpc server uses XMLRPC for PHP http://phpxmlrpc.sourceforge.net/ |
Phil: it seems you have hijacked the bug report. Don't do that. If you want to report a bug, please create a new bug report. Structure it as follows:
|
or for less data... #!/usr/bin/env python import xmlrpclib
x_server = xmlrpclib.Server('http://www.weberp.org/weberp/api/api_xml-rpc.php',verbose=True)
#Get the stock items defined in the webERP installation
StockList = x_server.weberp.xmlrpc_SearchStockItems('units','cm','admin','weberp')
if StockList[0]==0:
for StockID in StockList[1]:
print str(StockID) |
Panos: you are right. The original issue still exists. However, it is not a bug in Python, but a in the expat library. So I am now closing this report as out-of-scope for Python. There is a bug report open on expat requesting support for XML 1.1, see http://sourceforge.net/tracker/?func=detail&atid=110127&aid=891265&group_id=10127 This bug report is open since 2004. I see little hope that expat will support XML 1.1 within the next five years. I also fail to see the regression: expat has never supported XML 1.1. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: