Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xmlrpc library returns string which contain null ( \x00 ) #51976

Open
StevenHartland mannequin opened this issue Jan 17, 2010 · 13 comments
Open

xmlrpc library returns string which contain null ( \x00 ) #51976

StevenHartland mannequin opened this issue Jan 17, 2010 · 13 comments
Labels
3.7 (EOL) end of life topic-XML type-bug An unexpected behavior, bug, or error

Comments

@StevenHartland
Copy link
Mannequin

StevenHartland mannequin commented Jan 17, 2010

BPO 7727
Nosy @loewis, @pitrou, @vstinner, @serhiy-storchaka, @fredrikhl, @iritkatriel
Files
  • xmlrpc_byte_string.patch
  • xmlrpc_dump_invalid_string-2.7_2.patch: Patch for 2.7
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2010-01-17.19:59:27.625>
    labels = ['expert-XML', 'type-bug', '3.7']
    title = 'xmlrpc library returns string which contain null ( \\x00 )'
    updated_at = <Date 2021-12-03.12:22:16.503>
    user = 'https://bugs.python.org/StevenHartland'

    bugs.python.org fields:

    activity = <Date 2021-12-03.12:22:16.503>
    actor = 'iritkatriel'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['XML']
    creation = <Date 2010-01-17.19:59:27.625>
    creator = 'Steven.Hartland'
    dependencies = []
    files = ['15961', '30360']
    hgrepos = []
    issue_num = 7727
    keywords = ['patch']
    message_count = 13.0
    messages = ['97972', '98095', '98096', '98097', '189782', '189801', '189803', '189808', '189822', '189831', '189851', '189919', '407580']
    nosy_count = 9.0
    nosy_names = ['loewis', 'effbot', 'pitrou', 'vstinner', 'Steven.Hartland', 'serhiy.storchaka', 'Alex Corcoles', 'fredrikhl', 'iritkatriel']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue7727'
    versions = ['Python 2.7', 'Python 3.3', 'Python 3.4', 'Python 3.5', 'Python 3.6', 'Python 3.7']

    @StevenHartland
    Copy link
    Mannequin Author

    StevenHartland mannequin commented Jan 17, 2010

    When using SimpleXMLRPCServer that is used to return data that includes strings that have a \x00 in them this data is returned, which is invalid.

    The expected result is that the data should be treated as binary and base64 encoded.

    The bug appears to be in the core xmlrpc library which relies on type( value ) to determine the data type. This returns str for a string even if it includes the null char.

    @StevenHartland StevenHartland mannequin added topic-XML type-bug An unexpected behavior, bug, or error labels Jan 17, 2010
    @vstinner
    Copy link
    Member

    Marshaller.dump_string() encodes a byte string in <string>...</string> using the escape() function. A byte string can be encoded in base64 using <base64>...</base64>. It's described in the XML-RPC specification, but I don't know if all XML-RPC implementations do understand this type.
    http://www.xmlrpc.com/spec

    Should we change the default type to base64, or only fallback to base64 if the byte string cannot be encoded in XML. Test if a byte string can be encoded in XML can be slow, and set default type to base64 may cause compatibility issues :-/

    @vstinner
    Copy link
    Member

    Here is an example of patch using the following test:

    all(32 <= ord(byte) <= 127 for byte in value)

    I don't know how much slower is the patch, but at least it doesn't raise an "ExpatError: not well-formed (invalid token): ...".

    @StevenHartland
    Copy link
    Mannequin Author

    StevenHartland mannequin commented Jan 21, 2010

    One thing that springs to mind is how valid is that when applied to utf8 data?

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented May 21, 2013

    Even if the original patch is valid it will need reworking as xmlrpclib isn't in Python 3, the code is now in xmlrpc/client. It also looks as if dump_string has been renamed dump_unicode.

    @pitrou
    Copy link
    Member

    pitrou commented May 22, 2013

    I don't really understand the issue. If you want to pass binary data (rather than unicode text), you should use a Binary object as explained in the docs:
    http://docs.python.org/2/library/xmlrpclib.html#binary-objects

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 22, 2013

    The original report really includes two parts:
    a) when a string containing \0 is marshalled, ill-formed XML is produced
    b) the expected behavior is that base64 is used

    IMO: While a) is correct, b) is not. Antoine is correct that xmlrpclib.Binary should be used if you want to transmit binary data. Consequently, an Error should be reported if an attempt is made to produce ill-formed XML.

    OTOH, ill-formed XML can also be produced when sending a byte string that does not match the encoding declaration. Because of that, I propose to close this by documentating the limitations, rather than changing the code.

    @serhiy-storchaka
    Copy link
    Member

    The limitations is already documented:

    """However, it’s the caller’s responsibility to ensure that the string is free of characters that aren’t allowed in XML, such as the control characters with ASCII values between 0 and 31 (except, of course, tab, newline and carriage return); failing to do this will result in an XML-RPC request that isn’t well-formed XML. If you have to pass arbitrary bytes via XML-RPC, use the bytes class or the class:Binary wrapper class described below."""

    Here is a patch which forbids creating ill-formed XML.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 22, 2013

    Serhiy: The patch fixes the OP's concern, but not the extended concern about producing ill-formed XML (at least not for 2.7). If the string contains non-UTF-8 data, yet the XML declaration says UTF-8, it's still ill-formed, and not caught by your patch.

    I wonder whether xmlrpclib.Error would be a better exception than ValueError (although ValueError is also plausible); either way, the case should be documented.

    @serhiy-storchaka
    Copy link
    Member

    Indeed, 2.7 needs more work. Here is a patch for 2.7.

    UnicodeError (which subclasses ValueError) can be raised implicitly here, that is why I think ValueError is a good exception.

    I'll be very grateful to you for your help with a documentation.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 23, 2013

    I'm still skeptical that a new exception should be introduced in 2.7.x, or 3.3 (might this break existing setups?). I suggest to ask the release manager for a decision.

    But if this is done, then I propose to add the following text to ServerProxy:

    versionchanged (2.7.6): Sending strings with characters that are ill-formed in XML (e.g. \x00) now raises ValueError.

    @serhiy-storchaka
    Copy link
    Member

    Updating tests I found some related errors.

    XML-RPC doesn't work in general case for non UTF-8 encoding:

    >>> import xmlrpclib
    >>> xmlrpclib.dumps(('\u20ac',), encoding='iso-8859-1')
    '<params>\n<param>\n<value><string>\\u20ac</string></value>\n</param>\n</params>\n'
    >>> xmlrpclib.dumps((u'\u20ac',), encoding='iso-8859-1')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python2.7/xmlrpclib.py", line 1085, in dumps
        data = m.dumps(params)
      File "/usr/lib/python2.7/xmlrpclib.py", line 632, in dumps
        dump(v, write)
      File "/usr/lib/python2.7/xmlrpclib.py", line 654, in __dump
        f(self, value, write)
      File "/usr/lib/python2.7/xmlrpclib.py", line 700, in dump_unicode
        value = value.encode(self.encoding)
    UnicodeEncodeError: 'latin-1' codec can't encode character u'\u20ac' in position 0: ordinal not in range(256)

    We should use 'xmlcharrefreplace' error handler.

    Non-ASCII strings is passed as Unicode strings (this should be documented).

    >>> xmlrpclib.loads(xmlrpclib.dumps(('\xe2\x82\xac',)))
    ((u'\u20ac',), None)

    '\r' and '\r\n' are deserialized as '\n'.

    >>> xmlrpclib.loads(xmlrpclib.dumps(('\r',)))
    (('\n',), None)
    >>> xmlrpclib.loads(xmlrpclib.dumps(('\r\n',)))
    (('\n',), None)

    @AlexCorcoles AlexCorcoles mannequin added the 3.7 (EOL) end of life label Jul 15, 2017
    @iritkatriel
    Copy link
    Member

    2.7 is no longer relevant, and it looks like these examples are working now:

    >>> xmlrpc.client.dumps(('\u20ac',), encoding='iso-8859-1')
    '<params>\n<param>\n<value><string>€</string></value>\n</param>\n</params>\n'
    >>> xmlrpc.client.dumps((u'\u20ac',), encoding='iso-8859-1')
    '<params>\n<param>\n<value><string>€</string></value>\n</param>\n</params>\n'

    There is possibly still a documentation enhancement to make regarding non-ascii strings. This is what I get now with Serhiy's examples:

    >>> xmlrpc.client.loads(xmlrpc.client.dumps(('\xe2\x82\xac',)))
    (('â\x82¬',), None)
    >>> xmlrpc.client.loads(xmlrpc.client.dumps(('\r',)))
    (('\n',), None)
    >>> xmlrpc.client.loads(xmlrpc.client.dumps(('\r\n',)))
    (('\n',), None)

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life topic-XML type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants