Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML-RPC changelog_since_serial() method is returning invalid XML due to \x01 in sdist name #5653

Closed
jwodder opened this issue Apr 1, 2019 · 10 comments

Comments

@jwodder
Copy link
Contributor

@jwodder jwodder commented Apr 1, 2019

As of 2019-04-01 17:35:00 +0000, running the below Python 3 code:

#!/usr/bin/python3
import csv
from   xmlrpc.client import ServerProxy
import sys

ENDPOINT = 'https://pypi.org/pypi'
SINCE = 5011848
   
client = ServerProxy(ENDPOINT, use_builtin_types=True)
out = csv.writer(sys.stdout)
out.writerow('name version timestamp action serial'.split())
for row in client.changelog_since_serial(SINCE):
    out.writerow(row)

fails with:

Traceback (most recent call last):
  File "minimum.py", line 12, in <module>
    for row in client.changelog_since_serial(SINCE):
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/xmlrpc/client.py", line 1112, in __call__
    return self.__send(self.__name, args)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/xmlrpc/client.py", line 1452, in __request
    verbose=self.__verbose
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/xmlrpc/client.py", line 1154, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/xmlrpc/client.py", line 1170, in single_request
    return self.parse_response(resp)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/xmlrpc/client.py", line 1336, in parse_response
    p.feed(data)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/xmlrpc/client.py", line 439, in feed
    self._parser.Parse(data, 0)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 5264, column 38

Manually reproducing the XML-RPC request and inspecting the response shows that the problematic piece of XML is (with line number added):

<value><array><data>
<value><string>request-s</string></value>
<value><string>0.0.1</string></value>
<value><int>1554118234</int></value>
<value><string>add source file request^As-0.0.1.tar.gz</string></value> <!-- line 5364 -->
<value><int>5016503</int></value>
</data></array></value>

The ^A is here a literal \x01 which somehow ended up in the filename of an sdist uploaded to the request-s project. If Warehouse is going to allow uploads of such filenames, it needs to properly escape them in XML-RPC responses.

@jwodder jwodder changed the title XML-RPC changelog() method is returning invalid XML due to \x01 in sdist name XML-RPC changelog_since_serial() method is returning invalid XML due to \x01 in sdist name Apr 1, 2019
@jwodder
Copy link
Contributor Author

@jwodder jwodder commented Apr 1, 2019

Upon further investigation, it appears that the failure to escape \x01 is the fault of Python's xmlrpc.client.dumps(). BPO 26147 appears to be related.

@di
Copy link
Member

@di di commented Apr 1, 2019

Here's the project in question: https://pypi.org/project/request-s/

Here's where we check that it normalizes to the same value as the project name:

# Make sure that our filename matches the project that it is being uploaded
# to.
prefix = pkg_resources.safe_name(project.name).lower()
if not pkg_resources.safe_name(filename).lower().startswith(prefix):
raise _exc_with_message(
HTTPBadRequest,
"Start filename for {!r} with {!r}.".format(project.name, prefix),
)

AFAICT, this is a valid, albeit confusing filename, and the issue is with the client instead.

@ewdurbin
Copy link
Member

@ewdurbin ewdurbin commented Apr 1, 2019

@di is this a place where the addition of some calls to _clean_for_xml are needed?

@di
Copy link
Member

@di di commented Apr 1, 2019

@ewdurbin Not sure. Cleaning the description/summary is one thing, but actually changing the filenames might not be helpful.

@jwodder
Copy link
Contributor Author

@jwodder jwodder commented Apr 1, 2019

It appears that \x01 (and most C0 control characters) is flat-out invalid, both in raw and &#x01; form, in XML 1.0 (which is the version of XML that PyPI returns and seems to be the only version Python's XML libraries are aware of). Hence, the problem is with the very attempt to transmit the string "request\x01-s" over XML-RPC.

@di
Copy link
Member

@di di commented Apr 1, 2019

I think given the relative infrequency of files with these characters in them, and given the "deprecation" of the XML-RPC API, we can probably just use _clean_for_xml on the filenames to ensure this isn't breaking mirrors all over the place.

@ewdurbin
Copy link
Member

@ewdurbin ewdurbin commented Apr 1, 2019

it seems reasonable to at least clean the changelog entries. will PR that shortly.

@ewdurbin
Copy link
Member

@ewdurbin ewdurbin commented Apr 1, 2019

Though on second thought, I'm fairly certain this will also effect the releases data as well.

@ewdurbin
Copy link
Member

@ewdurbin ewdurbin commented Apr 1, 2019

seems url encoding the filename field is probably the safer bet.

@jwodder
Copy link
Contributor Author

@jwodder jwodder commented Apr 2, 2019

The XML-RPC is working again now. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

3 participants