Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URI parsing #120

Closed
kshpytsya opened this issue Feb 6, 2018 · 17 comments · Fixed by #684
Closed

URI parsing #120

kshpytsya opened this issue Feb 6, 2018 · 17 comments · Fixed by #684

Comments

@kshpytsya
Copy link

$ python3
Python 3.6.4 (default, Jan  6 2018, 02:24:08)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import packaging.requirements
>>> packaging.requirements.Requirement("pip @ file://localhost/localbuilds/pip-1.3.1-py33-none-any.whl")
<Requirement('pip@ file://localhost/localbuilds/pip-1.3.1-py33-none-any.whl')>
>>> packaging.requirements.Requirement("pip @ file:///localbuilds/pip-1.3.1-py33-none-any.whl")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/uken/.local/lib64/python3.6/site-packages/packaging/requirements.py", line 101, in __init__
    raise InvalidRequirement("Invalid URL given")
packaging.requirements.InvalidRequirement: Invalid URL given

According to PEP 508 the following strings should all be parseable:

"pip @ file://localhost/localbuilds/pip-1.3.1-py33-none-any.whl"
"pip @ file:///localbuilds/pip-1.3.1-py33-none-any.whl"
"pip @ /localbuilds/pip-1.3.1-py33-none-any.whl"
"pip @ localbuilds/pip-1.3.1-py33-none-any.whl"

(I have fed them into the "test program" included in PEP 508 that uses Parsley for parsing. Note that there are several typos in the grammar in the document text)

The following logic may be flawed:

if not (parsed_url.scheme and parsed_url.netloc) or (
not parsed_url.scheme and not parsed_url.netloc):
raise InvalidRequirement("Invalid URL given")

@jaraco
Copy link
Member

jaraco commented Feb 10, 2018

Looking at what urllib does there:

>>> urllib.parse.urlparse('file:///localbuilds/pip-1.3.1')
ParseResult(scheme='file', netloc='', path='/localbuilds/pip-1.3.1', params='', query='', fragment='')
>>> urllib.parse.urlparse('file://localbuilds/pip-1.3.1')
ParseResult(scheme='file', netloc='localbuilds', path='/pip-1.3.1', params='', query='', fragment='')

It does seem that urllib.parse.urlparse may not be doing what is intended by that code (localbuilds is not a netloc).

@jayvdb jayvdb mentioned this issue Jul 14, 2018
@benoit-pierre
Copy link
Member

Isn't the urllib.parse behaviour valid? According to RFC 8089, file URIs support an optional (and possibly empty authority), so file:///etc/fstab, file://localhost/etc/fstab and file:/etc/fstab all refer to the same location (but file:///localbuilds/pip-1.3.1 and file://localbuilds/pip-1.3.1 are not equivalent).

@cjerdonek
Copy link
Member

There are at least a couple open issues around urllib.parse's parsing I know of:
https://bugs.python.org/issue22852
https://bugs.python.org/issue34276
They may or may not be related to what you are discussing here.

@Qix-
Copy link

Qix- commented Jun 28, 2019

Please see pypa/pip#6658. I just wasted several hours breaking down this issue to its fullest as it pertains to pip in particular.

The logic and error reporting can be drastically improved in this package, and I second @jaraco here - six's URL parsing library (or whatever it's using under the hood) is NOT compliant by allowing file:./foo/bar for example.

@benoit-pierre is 100% correct (#120 (comment)).

@cjerdonek
Copy link
Member

Has an issue been filed with six?

@Qix-
Copy link

Qix- commented Jun 28, 2019

Nope. I'll open one. However, the error reporting can still be improved here. I might have time to open a PR today, we'll see :)

@pradyunsg
Copy link
Member

pradyunsg commented Oct 2, 2019

@Qix- could you find some time to file an issue with six?

@Qix-
Copy link

Qix- commented Oct 2, 2019

@pradyunsg Yep, sorry for the delay.

benjaminp/six#300

@pradyunsg
Copy link
Member

Okay, it seems like the bug originates from the standard library. Do we want to work around what the standard library does?

@Qix-
Copy link

Qix- commented Oct 3, 2019

I just tried the test parser with the standard library and the issue persists. Six is not bugged.

import urllib.parse as urllib_parse

def tryparse(url):
    print(url)
    parsed = urllib_parse.urlparse(url)
    unparsed = urllib_parse.urlunparse(parsed)
    parsed_again = urllib_parse.urlparse(unparsed)
    print(parsed)
    print(unparsed)
    print(parsed_again)
>>> tryparse('file:./foo/bar')
file:./foo/bar
ParseResult(scheme='file', netloc='', path='./foo/bar', params='', query='', fragment='')
file:///./foo/bar
ParseResult(scheme='file', netloc='', path='/./foo/bar', params='', query='', fragment='')

Looking at RFC 8089, it pretty clearly does not support local paths. This is something that I missed earlier. This means that, currently, a netloc and absolute path are required if the file: scheme is used. The absolute path bit was what was throwing me off, but this issue also points out the requirement of netloc which isn't right either.

@piotr-dobrogost
Copy link

@Qix-

Looking at RFC 8089, it pretty clearly does not support local paths.

By local paths you mean relative paths (ones not starting with a slash) ?
I'm asking as local has its precise meaning in the RFC 8089 as shown in #264 (comment)

RFC 8089 says this:

A file URI can be dependably dereferenced or translated to a local
file path only if it is local. A file URI is considered "local" if
it has no "file-auth", or the "file-auth" is the special string
"localhost", or a fully qualified domain name that resolves to the
machine from which the URI is being interpreted (Section 2).

@uranusjr
Copy link
Member

The URI parsing logic also recently caused pypa/pip#10098. Pip allows some custom schemes like git+https, but pacakging’s rules means it’s not possible to load a local VCS directory, e.g. six @ git+file:///srv/six (where /src/six is a locally-cloned repository).

I’m wondering, due to the many intricacies around URI parsing in general, maybe packaging should stay out of the business and simply return the unvalidated URI instead, and let the user-side perform any additional validation on their own.

@brettcannon
Copy link
Member

I would be fine with @uranusjr 's suggestion.

@ofek
Copy link
Sponsor Contributor

ofek commented May 23, 2022

FYI for those in need (like I was) Hatch now supports relative paths using the new context formatting capability https://hatch.pypa.io/latest/config/dependency/#local

@abravalheri
Copy link
Contributor

The URI parsing logic also recently caused pypa/pip#10098. Pip allows some custom schemes like git+https, but pacakging’s rules means it’s not possible to load a local VCS directory, e.g. six @ git+file:///srv/six (where /src/six is a locally-cloned repository).

I’m wondering, due to the many intricacies around URI parsing in general, maybe packaging should stay out of the business and simply return the unvalidated URI instead, and let the user-side perform any additional validation on their own.

I recently hit this problem when trying to install a local repository with pip.

Parsing the example in pip's doc results in error:

# packaging==23.0
>>> from packaging.requirements import Requirement
>>> Requirement('MyProject @ git+file:///home/user/projects/MyProject') 
packaging.requirements.InvalidRequirement: Invalid URL: git+file:///home/user/projects/MyProject

@mpizenberg
Copy link

FYI for those in need (like I was) Hatch now supports relative paths using the new context formatting capability https://hatch.pypa.io/latest/config/dependency/#local

Awesome @ofek. I’m also very interested in having support for relative path support, to enable path to relative local wheel files for pip installations.

@uranusjr
Copy link
Member

Since nobody seems to object with the validation logic going away entirely, I’m going to just propose it (see PR linked above).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.