-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Setuptools fails to parse a package url with query parameters properly
Case
private pip server, package_name, list of package versions on url as html page at
https://private-pip.example.com/simple/package_name/
on the page there is a link in html which points to a package file to download
<a href="https://download-server.example.com/package_name/package_name-0.1.2.tar.gz?tokena=A&tokenb=B">package_name-0.1.2.tar.gz</a>
Problem
href attribute
https://../package_name-0.1.2.tar.gz?tokena=A&tokenb=B">package_name-0.1.2.tar.gz
gets parased as
https://.../package_name-0.1.2.tar.gz?tokena=Aamptokenb=B">package_name-0.1.2.tar.gz
instead of
https://.../package_name-0.1.2.tar.gz?tokena=A&tokenb=B">package_name-0.1.2.tar.gz
note & => amp instead of &
Solution
href decoding in function htmldecode
regexp called entity_sub is wrong
setuptools/setuptools/package_index.py
Line 930 in 81f5f85
| # This pattern matches a character entity reference (a decimal numeric |
instead of
entity_sub = re.compile(r'&(#(\d+|x[\da-fA-F]+)|[\w.:-]+);?').subshould be
entity_sub = re.compile(r'(&#(\d+|x[\da-fA-F]+)|[\w.:-]+;?)').subnote rounded brackets inside regular expression