-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exclude edge cases where get_relative_url depends on CWD #2296
Conversation
When only one of the two passed paths starts with a slash, this function would produce results that depend on (and expose parts of) the actual current working directory. In a similar manner, it was also possible to "break out" of the "current directory" by starting one of the paths with `../../..` etc. Additionally, the fact that paths always end up being resolved relative to the current working directory made this function less efficient than it needs to be. Fun fact: `get_relative_url('path_a', 'path_b')` actually ended up looking up how to get to `/current/working/directory/path_a` from `/current/working/directory/path_b`. Luckily, none of these behaviors can ever actually come into effect (the function is always called without a leading slash and without paths deliberately trying to escape), but the fix is still good to reduce surprise. *The actual fix* is that we make the leading slash ignored (or quite the opposite - we always add it). Now when either of the two arguments try to go up higher than the top level, they just end up at the top level (e.g. `foo/../..` is effectively `.`), only then the "relative" calculation happens.
If your thinking is that this might get accepted more quickly than your other PR, you're wrong. My assumption is that the standard lib to correct in its handling of any input and performance is not enough of a motivator to get me to spend any time on it. And well, I just don't have the time to go through every scenario in your tests and confirm that every one is correct. Sorry, but I'm not willing to take your assertions as face value. I need to be convinced. To make matters worse, you obscure some of the tests with this: for url_slash in ('', '/'):
for other_slash in ('', '/'): Spell out every combination in text so I can see them all. Then, point out the ones which fail before the change (side-by-side tests showing which pass/fail for each implementation would be ideal). That will give me a lot less to focus on. |
How can you keep saying that the standard lib is correct, if the usage of it is outright wrong. A function that resolves paths relative to the current working directory is being used to resolve abstract paths. How is it in any way OK that To clarify: this comment describes the current state, not the state after this pull request closes those loopholes. I will print out an expanded list here, sure. |
It's not really obscuring as much as, pointing out that the leading slash doesn't matter Sorry for the prior lack of examples, that's a good point. Printed them out here |
UPD: a clearer version: Click to expand: Both with slashes - normally never used like this
Neither has slashes - 100% of real usages are in this category as far as I see
No-slash relative to with-slash - normally never used like this
Slash relative to no-slash - normally never used like this
|
Ha I found an even more interesting result. The tables above were produced when running under the directory The old implementation now almost fully matches the new implementation, but doesn't match itself having run at a different current working directory.
|
Now this is also addressed |
I have done everything you've asked for and everything at all possible. Please work with me on this. I don't know at all why you dismiss this.
I have shown that no, this stdlib function does not handle this usage correctly because it was not designed for it. The output of the old function depends on the current working directory. Please, even run this yourself (without this PR): >>> import os
>>> from mkdocs.utils import get_relative_url
>>> os.chdir('/tmp/testing')
>>> get_relative_url('bar', '../foo')
'../testing/bar'
>>> get_relative_url('../bar', 'foo')
'../../bar'
>>> os.chdir('/')
>>> get_relative_url('bar', '../foo')
'../bar'
>>> get_relative_url('../bar', 'foo')
'../bar' Actually, now that I fully realize this, I have made a much better table. Apparently, there is not a single situation where the new and old functions differ and the old function's result does not depend on the current working directory. If anyone ever used those cases, they are currently getting garbage from the old function. The new function's result is consistent in those cases, and for the already-consistent cases of the old function, the result 100% matches. Click to expand the tables
|
The code used to make that last table: oprypin@2ce90ec |
A fuzzer proving the same - very easy to understand this time: It shows that the new function 100% matches the old function, provided that the old function can match itself rather than varying on the current working directory.
If this doesn't convince you, then I need to be convinced with the reasoning why it doesn't. |
So at this point you have more than demonstrated that an issue exists. But now I need to go through every single one of those tests and confirm that what your "fix" is outputting is correct. That is going to take some time. Please be patient. |
Finally got around to reviewing this. Sorry for the delay and thanks for the work. |
When only one of the two passed paths starts with a slash, this function would produce results that depend on (and expose parts of) the actual current working directory.
In a similar manner, it was also possible to "break out" of the "current directory" by starting one of the paths with
../../..
etc.Additionally, the fact that paths always end up being resolved relative to the current working directory made this function less efficient than it needs to be.
Fun fact:
get_relative_url('path_a', 'path_b')
actually ended up looking up how to get to/current/working/directory/path_a
from/current/working/directory/path_b
.Luckily, none of these behaviors can ever actually come into effect (the function is always called without a leading slash and without paths deliberately trying to escape), but the fix is still good to reduce surprise.
The actual fix is that we make the leading slash ignored (or quite the opposite - we always add it).
Now when either of the two arguments try to go up higher than the top level, they just end up at the top level (e.g.
foo/../..
is effectively.
), only then the "relative" calculation happens.This pull request makes the behavior 100% the same as in #2272, but without the disadvantage of being a "custom replacement for the standard lib function". It also brings 35% of the performance advantage.
Build times of a site that has ~350 pages:
/home/oprypin/repos/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/a/athena-website
:/home/oprypin/repos/athena-website
:(meanwhile, #2272 achieves 6.65 sec)