Ambiguity in relative file urls #328

Open
imankulov opened this Issue Aug 1, 2011 · 24 comments

Projects

None yet

I would like to point out that the requirements file format specification mentions the ability to set relative path names such as file:../../lib/project#egg=MyProject in the file. Although those who try to use this syntax probably were disappointed, because in fact this variant does not work.

While trying to cope with the problem I discovered that relative urls are not supported "officially" at all, nor in RFC (wikipedia could be the starting point), nor in web browsers which can be considered as "reference implementation".

In light of that, it would be acceptable to fix the documentation and to forbid explicitly using relative names in packages. On the other hand, it would be useful to accept such format. Personally, I discovered the issue when I tried to create the package and the sample subpackage in its subdirectory. The sample has a bunch of dependencies including "parent package" which I tried to point as file:../ . I believe this approach does make sense.

I suggest considering my patch 54f5ade which fixes url_to_path function behavior such that urls starting with file:/ are considered as absolute, whereas urls which don't have any slash after "file:" are relative ones.

Member
hltbra commented Aug 2, 2011

Relative file urls were introduced by f73385a, and I can't understand why people would use them instead of using the file path directly, like:

$ pip install relative/path/to/my/package
$ pip install /abs/path/to/my/package

Could you describe a good use case nowadays?

Unfortunately, relative file urls don't work as expected, at least for me. Current pip implementation ends up with

Obtaining file:../#egg=mypackage (from -r requirements.pip (line 5))
  Running setup.py egg_info for package from file:../#egg=mypackage
    Error [Errno 2] No such file or directory: '/../#egg=mypackage' while executing command python setup.py egg_info

When I try to include -e file:../#egg=mypackage in the requirements file.

Actually I see two usecases came from my experience.

Fat project development

Sometimes we work on projects with a lot of dependencies (let it be a Django project and a set of relatively independent Django apps). Same developers work both in main project codebase and do some tweaks in dependencies. All projects are physically stored in the same directory (like "~/workspace/" in eclipse), and everything can look like:

$ ls ~/workspace
my-project
dependency-one
dependency-two

I would like to enumerate all dependencies in my-project/requirements.pip in form of

-e ../dependency-one#egg=dependency-one
-e ../dependency-two#egg=dependency-two

and then propose a simple instruction to deploy the workspace like:

  1. Fetch our main repository A and all dependencies B, C and D in the same directory.
  2. Optionally, create a virtual environment
  3. Install all dependencies with pip -r requirements.pip install

Probably I could write dependencies as

-e git://repo/dependency-one#egg=dependency-one

but then I would lose the flexibility. I should commit every change in dependency before I can test its influence on the main project

Project and sample

There is another example. I would like to create a new package (python-openid extension) and also I would like to create a sample project (a web-application) demonstrating some key features of my project. I would like to store this project in a sample subdirectory of my original application, so that everyone can easily test it out. At the same time I would like to consider sample project as separate application which "accidentally" resides in the subdirectory of my main package. My directory hierarchy would look like:

|-- my_openid_extension.py
|-- sample
|   |  .... 
|   |-- requirements.pip
|   |-- setup.py
|   `-- sample.py
`-- setup.py

In the requirements.pip I would like to point to my openid extension as -e ../#egg=my-openid-extension, so that users would be sure that they use consistent version of the module and sample and I as developer would get a convenient mechanism to develop both main application and the extension.

Finally, I would like to point out that my usecases probably have other implementations, but my approach seems like the most natural for me, especially providing the fact that relative file urls expected to work. I would like to emphasize also I don't believe that relative file urls are the "rightest choice", but it would be acceptable solution for my, hopefully not very odd, example.

By the way, probably buildout-alike variable expansion would be more elegant and convenient solution.

Just to remind that I would be glad to get any feedback on my proposal.

carwyn commented Dec 6, 2011

One of the things I noticed about the commit mentioned above is that is only interprets relative URLs that contain "../":

# Handle relative file URLs
if link.scheme == 'file' and re.search(r'\.\./', url):
  url = path_to_url(os.path.normpath(os.path.abspath(link.path)))

The use case I'm considering is to reference a directory inside my VCS checkout as an editable but not checkout yet another copy from the VCS:

-e file:./mylib/

.. with a view to adding this as an editable to a virtualenv I've created within my working area.

carwyn commented Dec 6, 2011

That's odd. The following works:

# requirements.txt is in mydir
# Use ../ to work around lack of ./ support
file:../mydir/mylib#egg=mylib

.. but the following doesn't:

# requirements.txt is in mydir
# Use ../ to work around lack of ./ support
-e file:../mydir/mylib#egg=mylib

Fabric 1.3.3

carwyn commented Dec 6, 2011

This is even more confusing than I thought.

# Works
file:/full/path/to#egg=mylib
# Doesn't work
-e file:/full/path/to#egg=mylib

# Following also work:
./mylib
/full/path/to/mylib
-e ./mylib
-e /full/path/to/mylib

# But not
./mylib#egg=mylib
-e ./mylib#egg=mylib

The ability to point to local directory that's already checked out and add it as an editable is very useful for us as we can maintain a requirements file within the project space (in VCS) that can be used to populate a virtualenv with all the deps and then egg-link in the subdirs of the VCS checkout that are being worked on.

i.e.

virtualenv --distribute --no-site-packages myenv
./myenv/bin/pip install -r requirements.txt

Where:

# requirements.txt
# Ideally I think the next line should be file://./mylib#egg=mylib
-e ./mylib
Django >= 1.3
carwyn commented Dec 6, 2011

I think "-e ./mylib" in my last comment is working by accident rather than by design?

$ ./myenv/bin/pip freeze
Django==1.3.1
-e svn+svn+ssh://<My VCS Path>@4973#egg=mylib-0.9-py2.7-dev_r4973

.. as something very odd happens with an extra svn+ prefixed to the editable path. However:

file:../mydir/mylib#egg=mylib

Works as you may expect:

$./myenv/bin/pip freeze
Django==1.3.1
mylib==0.9

But not as an editable!

Contributor
carljm commented Dec 12, 2011

Pull request to fully support relative file: urls with or without -e, would be great. I'm not too concerned about whether they are "standard", just whether the syntax is unambiguous. Personally I think all URLs should have "://" so I don't like the "file:../path/to" syntax. I prefer "file://./path" or "file://../path" - basically we look for "." or ".." after the "file://" and normalize/make absolute the path if its found.

I've just been bitten by this issue, so I'd like to express my support to getting it solved. @carljm's sugestion seems pretty sound and I bet this syntax is more in-sync with the RFC and is more compatible with the "reference implementations" like @imankulov mentioned.

Contributor
carljm commented Dec 13, 2011

BTW if someone does work on this, it'd be good for --find-links to also have this support, whether specified on command line or in a requirements file.

carwyn commented Dec 15, 2011

Would it make sense to use urlparse.urlparse to at least parse these locators? I think file:./ file:../ etc may well be valid for relative paths.

I think this is a serious issue and I may be able to find time to work on it.

Before I begin, I'd like to make sure we're all on the same page with regard to what the desired result is. I suggest we work in strict accordance with RFC3808, which I believe is the most authoritative resource about URIs in existence at the time of this writing.

Given that specification, relative URIs in the file scheme should look like so:

$ mkdir -p /tmp/foo/bar ; cd /tmp/foo/bar; touch spam eggs
$ file:///tmp/foo/bar/spam # this means "on this computer, /tmp/foo/bar/spam"
$ file:spam # this means "relative to here, spam"
$ file:../bar/spam # this means "relative to the parent of the current directory, bar/spam"
$

@carljm, note that this is not what you asked above (not all URLs will have :// in them), please let me know if this is OK with you. As far as I understand the RFC, once you've added :// (or :///) after the scheme, any following path will become absolute. Since such a strong standard exists in this case to guide us, why deviate from it? Since urlparse complies with the RFC well, we should use only that for file URL parsing (I'd like to think all URL parsing, but auditing all that code might be too much for me), and after parsing only standard stuff like os.path.isabs() should be used to decide about relativeness/absoluteness (no regexes, startswith() checks, etc).

If these guidelines make sense and there are no objections/comments, I'll try to implement and issue a pull request.

Contributor
carljm commented Jan 10, 2012

@yaniv-aknin Thanks for the reference to the relevant ABNF in the RFC; I didn't realize that there was a valid standard construction for relative URLs with a specified scheme. It appears to me that you're right; I'd welcome your work on this issue, accompanied by as much normalization of the URL handling as you're able to achieve.

Contributor
carljm commented Jan 10, 2012

Note also the existing patch on #312 - it doesn't standardize url-handling on urllib, though, it just adds a regex-based special case.

rach commented Jun 18, 2013

Hi,

I met the same issue today.

It's a bit annoying that the behavior are not coherent :

-r ../prod-requirement.txt # is relative to the file
../my-app # is relative to where I launched the command

Is there any workaround at this stage to make work relative path ?

Thanks you

stuaxo commented Aug 18, 2014

This may need re-opening, I definitely can't get this to work.

Using

file:./somewhere seems to resolve to /somewhere

./somefile does seem to work, though it is relative to where pip was launched from, not the requirements.txt

Resolving the path to be relative to the requirements.txt seems more logical than it being from where pip is launched from.

@stuaxo stuaxo referenced this issue in davidfischer/requirements-parser Aug 19, 2014
Open

Local file parse failure #18

dashesy commented Jan 6, 2015

on 6.0.6 ../ seems does not work even relative to where pip is launched.

@msabramo msabramo referenced this issue in jaraco/path.py Mar 13, 2015
Closed

Add url method #87

I'm having the same issue of file:./somewhere being resolved to /somewhere which fails. I'm wanting to use relative paths as I have python modules in git submodules and the absolute path is different on my development environment vs production environment.

Would love to see this fixed if it is officially considered a bug?

It appears to be down to this line: https://github.com/pypa/pip/blob/develop/pip/index.py#L954

Which when using urlparse, which is what I think urllib_parse resolves to in Python 2, results in:

>>> scheme, netloc, path, query, fragment = urlparse.urlsplit('file:./somewhere')
>>> urlparse.urlunsplit((scheme, netloc, path, query, None))
'file:///./somewhere'

So the input is a relative path and the output is essentially and incorrectly absolute, as far as I understand. Seems a bit unfortunate as it is a Python standard library module. The method documentation is here: https://docs.python.org/2/library/urlparse.html#urlparse.urlunsplit

Ok, this can be resolved by using file:// instead of just file: which might be more appropriate anyway:

>>> scheme, netloc, path, query, fragment = urlparse.urlsplit('file://somewhere')
>>> urlparse.urlunsplit((scheme, netloc, path, query, None))                                                                                              
'file://somewhere'
>>> scheme, netloc, path, query, fragment = urlparse.urlsplit('file://./somewhere')
>>> urlparse.urlunsplit((scheme, netloc, path, query, None))                                                                                              
'file://./somewhere'

But then we fall afoul of the UNC check here: https://github.com/pypa/pip/blob/develop/pip/download.py#L447

>>> from pip.download import url_to_path
>>> url_to_path('file://somewhere')
'\\\\somewhere'
>>> url_to_path('file://./somewhere')
'\\\\./somewhere'

Could it be worth improving that check? Can we check for the current operating system or any other kind of condition before making that assumption? I'd love some input here as I'm out of my depth. From the little I've seen, UNC is a Windows thing and I'm on Linux.

Edit: As a final data point, if I delete that UNC check then it works fine for me with file://somewhere style paths.

stuaxo commented Jan 9, 2016

I had a quick play with this, without the unc check and it does indeed work. I reckon improving it could work.

file://somewhere and file//./somewhere look a little odd at first but definitely make sense.

stuaxo commented Jan 10, 2016

UNC paths on windows look like \\hostname\share and usually are pointing at an smb (windows share or samba), but could be something more exotic like netware etc.

I don't know if python supports this with forwardslashes etc.

The VMs that Microsoft provide for testing are good for testing this sort of thing (or the same VMs via the script at xdissent https://github.com/xdissent/ievms ).

Contributor
blr246 commented Mar 14, 2017 edited

I ran into an issue where requirements.txt having relative paths does not install properly relative to the path where requirements.txt resides. I tracked it down to a bug in SETUPTOOLS_SHIM where the script tries to execute setup.py using the relative path while also setting cwd to the relative path. This is incorrect, and fixable by a simple call to os.path.abspath on the path to the package source directory.

See #4208 for proposed fix. I am not sure if it addresses other issues raised here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment