Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML-RPC changelog_since_serial() method is returning invalid XML due to \x01 in sdist name #5653

Closed
jwodder opened this Issue Apr 1, 2019 · 10 comments

Comments

Projects
None yet
3 participants
@jwodder
Copy link
Contributor

jwodder commented Apr 1, 2019

As of 2019-04-01 17:35:00 +0000, running the below Python 3 code:

#!/usr/bin/python3
import csv
from   xmlrpc.client import ServerProxy
import sys

ENDPOINT = 'https://pypi.org/pypi'
SINCE = 5011848
   
client = ServerProxy(ENDPOINT, use_builtin_types=True)
out = csv.writer(sys.stdout)
out.writerow('name version timestamp action serial'.split())
for row in client.changelog_since_serial(SINCE):
    out.writerow(row)

fails with:

Traceback (most recent call last):
  File "minimum.py", line 12, in <module>
    for row in client.changelog_since_serial(SINCE):
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/xmlrpc/client.py", line 1112, in __call__
    return self.__send(self.__name, args)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/xmlrpc/client.py", line 1452, in __request
    verbose=self.__verbose
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/xmlrpc/client.py", line 1154, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/xmlrpc/client.py", line 1170, in single_request
    return self.parse_response(resp)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/xmlrpc/client.py", line 1336, in parse_response
    p.feed(data)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/xmlrpc/client.py", line 439, in feed
    self._parser.Parse(data, 0)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 5264, column 38

Manually reproducing the XML-RPC request and inspecting the response shows that the problematic piece of XML is (with line number added):

<value><array><data>
<value><string>request-s</string></value>
<value><string>0.0.1</string></value>
<value><int>1554118234</int></value>
<value><string>add source file request^As-0.0.1.tar.gz</string></value> <!-- line 5364 -->
<value><int>5016503</int></value>
</data></array></value>

The ^A is here a literal \x01 which somehow ended up in the filename of an sdist uploaded to the request-s project. If Warehouse is going to allow uploads of such filenames, it needs to properly escape them in XML-RPC responses.

@jwodder jwodder changed the title XML-RPC changelog() method is returning invalid XML due to \x01 in sdist name XML-RPC changelog_since_serial() method is returning invalid XML due to \x01 in sdist name Apr 1, 2019

@jwodder

This comment has been minimized.

Copy link
Contributor Author

jwodder commented Apr 1, 2019

Upon further investigation, it appears that the failure to escape \x01 is the fault of Python's xmlrpc.client.dumps(). BPO 26147 appears to be related.

@di

This comment has been minimized.

Copy link
Member

di commented Apr 1, 2019

Here's the project in question: https://pypi.org/project/request-s/

Here's where we check that it normalizes to the same value as the project name:

# Make sure that our filename matches the project that it is being uploaded
# to.
prefix = pkg_resources.safe_name(project.name).lower()
if not pkg_resources.safe_name(filename).lower().startswith(prefix):
raise _exc_with_message(
HTTPBadRequest,
"Start filename for {!r} with {!r}.".format(project.name, prefix),
)

AFAICT, this is a valid, albeit confusing filename, and the issue is with the client instead.

@di di added the needs discussion label Apr 1, 2019

@ewdurbin

This comment has been minimized.

Copy link
Member

ewdurbin commented Apr 1, 2019

@di is this a place where the addition of some calls to _clean_for_xml are needed?

@di

This comment has been minimized.

Copy link
Member

di commented Apr 1, 2019

@ewdurbin Not sure. Cleaning the description/summary is one thing, but actually changing the filenames might not be helpful.

@jwodder

This comment has been minimized.

Copy link
Contributor Author

jwodder commented Apr 1, 2019

It appears that \x01 (and most C0 control characters) is flat-out invalid, both in raw and &#x01; form, in XML 1.0 (which is the version of XML that PyPI returns and seems to be the only version Python's XML libraries are aware of). Hence, the problem is with the very attempt to transmit the string "request\x01-s" over XML-RPC.

@di

This comment has been minimized.

Copy link
Member

di commented Apr 1, 2019

I think given the relative infrequency of files with these characters in them, and given the "deprecation" of the XML-RPC API, we can probably just use _clean_for_xml on the filenames to ensure this isn't breaking mirrors all over the place.

@ewdurbin

This comment has been minimized.

Copy link
Member

ewdurbin commented Apr 1, 2019

it seems reasonable to at least clean the changelog entries. will PR that shortly.

@ewdurbin

This comment has been minimized.

Copy link
Member

ewdurbin commented Apr 1, 2019

Though on second thought, I'm fairly certain this will also effect the releases data as well.

@ewdurbin

This comment has been minimized.

Copy link
Member

ewdurbin commented Apr 1, 2019

seems url encoding the filename field is probably the safer bet.

@jwodder

This comment has been minimized.

Copy link
Contributor Author

jwodder commented Apr 2, 2019

The XML-RPC is working again now. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.