Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-fill missing leading 0s in URL #2420

Closed
ezio-melotti opened this issue Mar 12, 2022 · 33 comments
Closed

Auto-fill missing leading 0s in URL #2420

ezio-melotti opened this issue Mar 12, 2022 · 33 comments
Labels
enhancement infra Core infrastructure for building and rendering PEPs

Comments

@ezio-melotti
Copy link
Member

I was going to add pep-* autolinking to the python/cpython repo, but I realized that the URL requires the right amount of leading 0 in order to work. For example:
https://peps.python.org/pep-0008/ -- works fine
https://peps.python.org/pep-8/ -- fails

People rarely mention the leading 0s when they write e.g. PEP-8 in a message, and the autolinker doesn't offer any formatting option. Because of this, clicking the link results in a 404.

Would it be possible to create a script that, in case of 404, tries to extract the pep number and redirects to f'https://peps.python.org/pep-{num:04d}/'?

If it's handled on this end, it will have the advantage that every repo will be able to autolink PEPs, and it will also work for people editing the URL manually (something I happen to do from time to time).

@ezio-melotti ezio-melotti added enhancement infra Core infrastructure for building and rendering PEPs labels Mar 12, 2022
@AA-Turner
Copy link
Member

I have a more general solution I'd like to try, but won't be immediate (timeframe of weeks not days). What's the urgency on this?

A

@ezio-melotti
Copy link
Member Author

The migration to GitHub Issues should happen at the end of the months, but PEP autolinking is not high priority -- the autolinking could be disabled until a fix is in place.

Just out of curiosity, what kind of solution are you thinking about?

@AA-Turner
Copy link
Member

Two approaches to try -- the first would create a dummy file for each PEP with the zeros stripped (e.g. 8.html) with per file redirects, the second similar to your suggestion of JS in the 404 page.

The entire PEPs website doesn't require JS at all though (except for a toggle button for colour schemes), so I want to avoid the second approach.

A

@ezio-melotti
Copy link
Member Author

I can see three possible ways of using the dummy files:

  • Full copies of the PEPs (not ideal)
  • Symlinks to the actual PEP (not sure if it works)
  • Separate HTML pages with a JS redirect (and possibly a link) to the right PEP

The downside of these approaches is that they will create a lot of dummy files, and it will still fail with e.g. PEP-008 or similar.

If you have access to the webserver, you could also configure HTTP redirects or use URL rewrite, but I'm not sure if this is possible with the current setup.

I appreciate the intent of keeping the repo as JS-free as possible, but I think having some JS in the 404 page is ok: if people have JS disabled and click on a broken link they will meet the same 404 page they always met; if they have JS enabled they will get redirected.

Adding a link to the PEP index to the 404 page will also help people without JS to reach the index and from there it's easy enough to ctrl+f the PEP number.

@hugovk
Copy link
Member

hugovk commented Mar 12, 2022

Rather than running JavaScript client-side (not everyone has JS enabled, it's relatively slow, more code to maintain), how about Nginx server redirect rules (similar to python/pythondotorg#2006) that will update the URL before even hitting the GitHub Pages/RTD website?

Something along the lines of:

  • \/pep-(\d)$ -> /pep-000$1 e.g. /pep-8 -> /pep-0008
  • \/pep-(\d\d)$ -> /pep-00$1 e.g. /pep-12 -> /pep-0012
  • \/pep-(\d\d\d)$ -> /pep-0$1 e.g. /pep-101 -> /pep-0101

(See also https://stackoverflow.com/a/27504007/724176)

Ping also @ewdurbin for comment :)

@AA-Turner
Copy link
Member

I'll respond to @ezio-melotti's point from #2421 (comment) here too.

My general concern is that whatever we support we will have to support forever, as links will start to propogate with non-canonical forms. Allowing peps.python.org/\d+ to me seems reasonable -- my preference would actually to be stricter and not allow existing zero-padding.

If we set the autolinking to https://peps.python.org/pep-<num>/ and <num> is a valid 4-digits number (e.g. pep-0008), then it will go straight to the right page.

The original rationale was that barely anybody writes the leading zeros, so for e.g. PEP-100 there'd still need to be a redirect. (I think we can safely assume those coming from GitHub have JS enabled).

it's relatively slow

For me with awful internet, it still is fairly fast -- I put the redirect script at the top of <head> so that it would be processed first.

server redirect rules

I don't think peps.python.org is fronted with a CDN currently, so this isn't an option sadly.

A

@ezio-melotti
Copy link
Member Author

Rather than running JavaScript client-side (not everyone has JS enabled, it's relatively slow, more code to maintain), how about Nginx server redirect rules

If that's an option, I think it would be ok. Let's see what @ewdurbin says about that :)

My general concern is that whatever we support we will have to support forever, as links will start to propogate with non-canonical forms. Allowing peps.python.org/\d+ to me seems reasonable

The concern is understandable, but wouldn't this introduce a new type of URL (without the pep- prefix) that we will have to support? If we stick to peps.python.org/pep-<num>/ where <num> might or might not have leading 0, we will still have one canonical form. IMHO supporting PEP numbers without leading 0s is worth the extra effort, since as you also mentioned, barely anybody adds them. The fact that the 0s can be added in different ways (e.g. JS and URL rewrite) means that it shouldn't be difficult to support this even if we move to a different environment.

@CAM-Gerlach
Copy link
Member

CAM-Gerlach commented Mar 12, 2022

This originally came up in #2255 .

To note, the canonical way of referencing PEPs in prose text, as defined by PEP 1 and PEP 12, is with no leading zeros and a space instead of a dash. As GitHub apparently does not allow spaces (or requires a dash) in references, it was decided there that requiring the leading zeros too was acceptable, since it made more clear that the number was a GitHub autolink reference rather than a canonical PEP name, to avoid confusing others about the canonical way of referring to PEPs themselves. I typically do so like this: As in PEP 8 (PEP-0008).

That said, if we do add a new URL form, which the non-leading-zero form already is, I would support the shorter and more distinct peps.python.org/N. Being more clearly distinct from the current form is actually an advantage, as it is more obvious that it is a distinct form, and also ensures unique URLs for all PEPs rather than just <4 digit ones, while avoiding two similar-looking sets of links to the same PEP.

It could be done all in HTML without either JavaScript or server-side redirect rules by using <meta http-equiv="refresh" content="0; url={redirect_url}"> combined with auto-generating the redirect pages via a build script. We use such a script that I wrote in production for the Spyder Docs. It could be used here as a post-build step in the CI/makefile, as we do, with only slight modifications (since almost everything is done via CLI arguments rather than hardcoded).

@AA-Turner
Copy link
Member

#2421 now takes the http-equiv approach (it was simpler than I thought to integrate well in the build process).

A

@hugovk
Copy link
Member

hugovk commented Mar 13, 2022

Rather than running JavaScript client-side (not everyone has JS enabled, it's relatively slow, more code to maintain), how about Nginx server redirect rules (similar to python/pythondotorg#2006) that will update the URL before even hitting the GitHub Pages/RTD website?

Something along the lines of:

  • \/pep-(\d)$ -> /pep-000$1 e.g. /pep-8 -> /pep-0008
  • \/pep-(\d\d)$ -> /pep-00$1 e.g. /pep-12 -> /pep-0012
  • \/pep-(\d\d\d)$ -> /pep-0$1 e.g. /pep-101 -> /pep-0101

(See also stackoverflow.com/a/27504007/724176)

Ping also @ewdurbin for comment :)

Ah right, peps.python.org goes straight to GitHub Pages (https://hostingchecker.com):

image

And not into the same place as python.org:

image

So these server redirects aren't possible.

@ewdurbin
Copy link
Member

Depending on what decision is made as far as what the best course of action is, we can move peps.python.org behind Fastly and support redirects.

@hugovk
Copy link
Member

hugovk commented Mar 14, 2022

Nice, that sounds best to me, we ran into some problems with #2421.

We'd actually want something like:

  • \/(\d)$ -> /pep-000$1/ e.g. /8 -> /pep-0008/
  • \/(\d\d)$ -> /pep-00$1/ e.g. /12 -> /pep-0012/
  • \/(\d\d\d)$ -> /pep-0$1/ e.g. /101 -> /pep-0101/
  • \/(\d\d\d\d)$ -> /pep-$1/ e.g. /3156 -> /pep-3156/

So https://peps.python.org/N goes to a zero-padded https://peps.python.org/pep-NNNN

@CAM-Gerlach @ezio-melotti Is this pattern right? Shall we move peps.python.org behind Fastly and use redirects?

@CAM-Gerlach
Copy link
Member

CAM-Gerlach commented Mar 14, 2022

That's what @AA-Turner preferred and I thought it sounded reasonable, since it is shorter and more distinct, but I'm not sure @ezio-melotti agreed.

The main problem with #2421 was that https://peps.python.org/404 conflicted with the 404 page, meaning all non-existed pages ended up at PEP 404, which would require either not redirecting for just PEP 404, or reverting to the pep-N scheme to disambiguate. @ewdurbin does the approach you mentioned allow us to easily avoid that problem?

@hugovk
Copy link
Member

hugovk commented Mar 25, 2022

Depending on what decision is made as far as what the best course of action is, we can move peps.python.org behind Fastly and support redirects.

Sounds like @CAM-Gerlach, @AA-Turner and I are in agreement with this and @ezio-melotti doesn't mind on the actual solution as long as it works.

@ewdurbin Please could you go ahead with this?

I've edited the suggestion in #2420 (comment) to include a trailing slash on the target, because https://peps.python.org/pep-0008 will redirect to https://peps.python.org/pep-0008/ anyway and we might as well avoid that extra request.


The main problem with #2421 was that https://peps.python.org/404 conflicted with the 404 page, meaning all non-existed pages ended up at PEP 404, which would require either not redirecting for just PEP 404, or reverting to the pep-N scheme to disambiguate. @ewdurbin does the approach you mentioned allow us to easily avoid that problem?

I believe server redirects avoids the problem entirely. The user types in https://peps.python.org/404. That goes to Fastly, which redirects it to https://peps.python.org/pep-0404/, and that's what is actually sent to GitHub Pages, and so the user gets the real PEP 404.

@ezio-melotti
Copy link
Member Author

What happens if the users tries https://peps.python.org/123 ?
Does it get redirected to https://peps.python.org/pep-0123/ -> https://peps.python.org/404 -> https://peps.python.org/pep-0404/ ?
Currently it seems that https://peps.python.org/pep-0123/ shows a 404 page, but the URL doesn't change so maybe it doesn't go through https://peps.python.org/404.

@hugovk
Copy link
Member

hugovk commented Mar 25, 2022

It would match this rule:

  • \/(\d\d\d)$ -> /pep-0$1/ e.g. /101 -> /pep-0101/

Fastly would do this:

https://peps.python.org/123 -> https://peps.python.org/pep-0123/

And then GitHub Pages will show https://peps.python.org/pep-0123/ which is indeed a "404 not found" and correct behaviour.

The problem with #2421 was internal to the website/Sphinx, but with Fastly redirects we're getting the proper redirect URL before the website itself is even loaded.

@ewdurbin
Copy link
Member

I will proceed with this next week!

@CAM-Gerlach
Copy link
Member

Thanks @ewdurbin !

@brettcannon
Copy link
Member

Do we actually still even need the pep- part of the path? Since we have a subdomain doesn't that mean peps.python.org/670 makes sense now?

@CAM-Gerlach
Copy link
Member

@brettcannon Indeed, that's exactly what we're proposing to do, and AFAIK what @ewdurbin agreed to implement :)

@ezio-melotti
Copy link
Member Author

@ewdurbin, what's the status on this?

@ewdurbin
Copy link
Member

ewdurbin commented Apr 7, 2022

Ironically it's been deprioritized by the hiring process for another Infrastructure team member. I should have time to sit down with this and #2493 tomorrow.

@ezio-melotti
Copy link
Member Author

Thanks for the update! The migration to GitHub Issues is scheduled for tomorrow, so the timing is perfect :)

@ewdurbin
Copy link
Member

ewdurbin commented Apr 8, 2022

Lost time to work on this... due to the GitHub issues migration... perhaps this weekend but likely next week.

@ewdurbin
Copy link
Member

peps.python.org deployed to Fastly including these redirects.

They are configured as 301s for now.

Overview of the implementation:

https://fiddle.fastlydemo.net/fiddle/cbc02328

@ewdurbin
Copy link
Member

Please let me know if this is satisfies the requirements by closing this issue :) I'll update to permanent redirects then.

@hugovk
Copy link
Member

hugovk commented Apr 14, 2022

Thanks!

We want the request so we have only digits after https://peps.python.org/ and then the result gets zero padded (width 4) and prefixed with pep-

Some examples:

request result
https://peps.python.org/8 https://peps.python.org/pep-0008/
https://peps.python.org/8/ https://peps.python.org/pep-0008/
https://peps.python.org/12 https://peps.python.org/pep-0012/
https://peps.python.org/12/ https://peps.python.org/pep-0012/
https://peps.python.org/101 https://peps.python.org/pep-0101/
https://peps.python.org/101/ https://peps.python.org/pep-0101/
https://peps.python.org/3156 https://peps.python.org/pep-3156/
https://peps.python.org/3156/ https://peps.python.org/pep-3156/

We didn't discuss having leading zeros in the request, but I guess these are fine too:

request result
https://peps.python.org/08 https://peps.python.org/pep-0008/
https://peps.python.org/08/ https://peps.python.org/pep-0008/
https://peps.python.org/008 https://peps.python.org/pep-0008/
https://peps.python.org/008/ https://peps.python.org/pep-0008/
https://peps.python.org/0008 https://peps.python.org/pep-0008/
https://peps.python.org/0008/ https://peps.python.org/pep-0008/

@ewdurbin
Copy link
Member

d'oh. Ok :) https://fiddle.fastlydemo.net/fiddle/71cc868a is the sketch of the service now. changes are active and cover all above scenarios.

@ewdurbin
Copy link
Member

Opened #2527 to make the fastly service transparent... even if terraform and fastly might be a little obscure 😅

@ewdurbin
Copy link
Member

I'll close this, as requirements in #2420 (comment) are all addressed.

@ezio-melotti
Copy link
Member Author

Thanks for fixing this!
I now updated the autolinking rules of both this repo and python/cpython to:

  • PEP-*https://peps.python.org/*

Here are some tests:

Note that old autolinks might take 1 day or so to update, but these should use the new autolinking rule directly.

@CAM-Gerlach
Copy link
Member

In some sense having the autolink references require the zeros made clear that they weren't actually the correct canonical names for/references to PEPs (even though, admittedly, I was the first to complain about the requirement), but I suppose that's a fairly small (and somewhat pedantic) price to pay for the substantially increased convenience. Thanks @ewdurbin !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement infra Core infrastructure for building and rendering PEPs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants