Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Settings: build all paths with posix_join #2431

Closed
wants to merge 1 commit into from

Conversation

MinchinWeb
Copy link
Contributor

c.f. #2383

@avaris
Copy link
Member

avaris commented Nov 3, 2018

I would've gone the other way. I mean, they are all string literals...

Copy link
Member

@avaris avaris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this is fine too.

@oulenz
Copy link
Contributor

oulenz commented Nov 4, 2018

Just out of curiosity, can I ask why posixpath.join is preferred here? If I understand this correctly, it will always insert /, regardless of the platform the code is run on, so the result is the same, right?

@avaris
Copy link
Member

avaris commented Nov 4, 2018

Some were using posixpath.join and others were strings with /. I think the idea is to unify them. But as I said, I'd have gone the other way: replace all with strings.

@justinmayer justinmayer modified the milestone: 4.0 Nov 12, 2018
@MinchinWeb
Copy link
Contributor Author

MinchinWeb commented Nov 21, 2018

The advantage of using posixpath.join is to be explicit: these are filepaths, and we are using Posix-style paths, even on non-Posix systems (like Windows). Many of the other settings stored as strings are only that: strings to be inserted into various places in the generated site.

Personally, I would prefer to use pathlib which would make it more explicit we are dealing with paths, but that would add another dependency as long as Python 2.7 is supported (pathlib was added to the standard library in Python 3.4).

@oulenz
Copy link
Contributor

oulenz commented Nov 23, 2018

Actually, as far as I can tell the url settings are also just "strings to be inserted into various places in the generated site".

With the save_as settings, can't a user actually use \'es if they are on Windows?

As for being explicit: these are the default settings used internally, they are not user-facing, so it's not clear to me whom this explicitness is aimed at? And even supposing we wanted to be explicit, I know that the first time I saw posixpath.join being used, I wrongly assumed it was somehow system-dependent like os.path.join. Its use here just seems to overcomplicate things to me.)

@MinchinWeb
Copy link
Contributor Author

With the save_as settings, can't a user actually use \'es if they are on Windows?

Sadly no, not without random errors. \ are used by Python to create "escape characters", so the slash will only appear if the following letter doesn't create a valid escape sequence. So cat\dog works, but cat\apple, cat\banana, cat\new, and cat\table (among many others) will do weird things. When dealing with paths on Windows (with Python), there are 4 common approaches:

  1. Escapes the slashes --> cat\\banana
  2. Use posix slashes --> cat/banana
  3. Use raw strings (not designed for this purpose, but generally work anyway) --> r"cat\banana"
  4. Use a class-based system to deal with filenames --> pathlib.Path("cat") / "banana"

@oulenz
Copy link
Contributor

oulenz commented Nov 25, 2018

Ok, but escaping characters in strings is an orthogonal issue. (I'm not sure why you'd say raw strings are not designed for this purpose.)

Once you escape the \'es, do you still get random errors? If so, we should probably either fix them if that's easily done, or explicitly tell windows users not to use backslashes in the documentation.

@justinmayer
Copy link
Member

Thank you for the contribution, @MinchinWeb, and apologies for the long delay in chiming in here. You previously said:

Personally, I would prefer to use pathlib which would make it more explicit we are dealing with paths, but that would add another dependency as long as Python 2.7 is supported (pathlib was added to the standard library in Python 3.4).

Given that Python 2.7 is no longer supported, perhaps we should use this opportunity to switch to a pathlib-based approach. What does everyone think about that?

@avaris
Copy link
Member

avaris commented Apr 16, 2020

Given that Python 2.7 is no longer supported, perhaps we should use this opportunity to switch to a pathlib-based approach. What does everyone think about that?

Generally, I'm in favor of moving to pathlib. One caveat though, full pathlib support came in python3.6, i.e. you can use Path objects direckly in stdlib functions using paths (like open, etc). In python3.5 you still need to convert them to strings before passing them. If we drop python3.5 (which will be EOL later this year, by the way), it'll be a much cleaner switch. Otherwise we need to scatter str(...) calls around.

@justinmayer
Copy link
Member

justinmayer commented Apr 16, 2020

Given that Python 3.5's EOL date is less than five months away, I have no problem dropping support for it now in preparation. For users on Ubuntu "Xenial" 16.04 and other LTS distributions that ship with Python 3.5, they only have another year before their underlying OS reaches its own end-of-life, so they have to make a move soon anyway. Plus, tools like Pyenv, Pythonz, and asdf & its Python plugin make it relatively straightforward to install and use arbitrary Python versions.

I say we make the jump to Python 3.6+.

@dertuxmalwieder
Copy link

I would even support going Python3.7+. There is no reason to rely on 3.6 as 3.8 is already out there.

@oulenz
Copy link
Contributor

oulenz commented Apr 17, 2020

I'm also in favour of dropping 3.5 now (i.e. with the next release). @dertuxmalwieder do you see any particular advantage in also dropping 3.6? I don't think we should unnecessarily limit the supported range of Python versions.

@dertuxmalwieder
Copy link

3.7 is two years old soon, it is not unlikely that most people can use it in 2020.

@justinmayer
Copy link
Member

It's clear how we benefit from dropping Python 3.5 support (pathlib, f-strings, etc.). I think perhaps @oulenz was trying to determine what we would gain from dropping Python 3.6, which isn't as clear to me.

@MinchinWeb
Copy link
Contributor Author

I like Pathlib. I can rework this with Pathlib, but I can't offer an ETA at the moment.

MinchinWeb added a commit to MinchinWeb/pelican that referenced this pull request Apr 23, 2020
@MinchinWeb MinchinWeb mentioned this pull request Apr 23, 2020
@MinchinWeb
Copy link
Contributor Author

I've created #2738 to do settings via Pathlib. It doesn't address dropping Python 3.5 support, although we could typecast these settings to strings to sidestep the issue.

@MinchinWeb
Copy link
Contributor Author

Further to this, after playing with moving to pathlib.Path, I think this approach (poxispath.join) is better.

Ultimately, in the generated sites, the URL's are strings within HTML and most of the websevers I've dealt with assume posix paths in the URLs they're fed, even on Windows (no idea how IIS does it...).

If we do move to pathlib.Path, we should be "normalizing" any output URLs to appease the webservers, and otherwise you'll get weird things like: <p><a href="/category\\yeah.html">Link 2</a></p> (notice the combination of path separators) and there will a bunch of hidden landmines in testing pelican on Windows due to the string matching built into the test suite that assumes posix paths.

@avaris
Copy link
Member

avaris commented Apr 23, 2020

Looking at this again, I agree. I think we should keep these as strings, primarily because they require formatting later. And for that we need to str them if we do Path here. That's unnecessary. Internally, they can be converted to pathlib.Path at appropriate time.

pathlib benefits will be more about internal path related stuff (like: #2678). It seems unnecessary here. Regarding this PR, I'd go back a bit and change posix_joins to literal strings with / when we do pathlib switch:

  1. pathlib.Path can handle paths/like/this just fine in any OS
  2. These are user exposed settings. We don't really expect users to do posix_join when they set them in their config. They will pass strings: pelican needs to handle these as strings anyway. So, default config should reflect that as well.

@MinchinWeb
Copy link
Contributor Author

So after much discussion and some exploration, it seems this is the way to go. @justinmayer do you have to push this to master then to finish it off?

justinmayer added a commit that referenced this pull request May 10, 2020
… instead of `posix_join`. Fixes #2431
@justinmayer
Copy link
Member

If I understand the perspective that @avaris and @oulenz have shared here — and please correct me if I am wrong — it seems the consensus is that this unification effort is indeed worthwhile but that string literals are preferred over posix_join. After re-reading the discourse and thinking about it further, I tend to agree with that viewpoint, so I pushed a new branch containing said unification via: d1338f2 Does that look okay to everyone?

@MinchinWeb: Between this and the Pathlib-related PR, you have spent a decent amount of time on this endeavor, and I want to recognize that and express my gratitude for your efforts. Truly much appreciated! 👏

@justinmayer
Copy link
Member

Last call... Any comments before I merge the aforementioned branch?

@avaris
Copy link
Member

avaris commented May 17, 2020

@justinmayer looks good to me

@MinchinWeb
Copy link
Contributor Author

@justinmayer thank you for shepherding this PR and the project generally. It's always a little disappointing to see something I've worked on not work out, but the discussion here has been constructive in understanding why the final solution was chosen, and I think that discussion helps make me a better programmer. I'm happy to see it resolved!

@MinchinWeb MinchinWeb deleted the posix-paths branch May 22, 2020 03:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants