Infra: generate machine-readable PEP index #2475

hugovk · 2022-03-27T19:02:07Z

PEP 0 is a human-readable PEP index.

It would be useful to create a machine-readable version for, well, machines to read and process. This PR creates a peps.json of the key fields.

Preview:

[
{
"number": 1,
"title": "PEP Purpose and Guidelines",
"authors": "Warsaw, Hylton, Goodger, Coghlan",
"status": "Active",
"type": "Process",
"url": "https://peps.python.org/pep-0001/"
},
{
"number": 2,
"title": "Procedure for Adding New Modules",
"authors": "Faassen",
"status": "Superseded",
"type": "Process",
"url": "https://peps.python.org/pep-0002/"
},
{
"number": 3,
"title": "Guidelines for Handling Bug Reports",
"authors": "Hylton",
"status": "Withdrawn",
"type": "Process",
"url": "https://peps.python.org/pep-0003/"
},
...

https://pep-previews--2475.org.readthedocs.build/peps.json

AA-Turner

Still on my short break, but this PR provided a nice distraction on a Sunday! (Edit: review is purely through the GitHub UI and I haven't run the actual changes to verify.)

It might also be nice to include at least the created date (I don't know what other fields can be guaranteed to be present and correct in all PEPs).

A

pep_sphinx_extensions/pep_zero_generator/pep_index_generator.py

CAM-Gerlach

I was actually thinking something very similar—we could generate a simple static JSON API with the JSONified PEP link and header data for machine-readable applications. Maybe put this under an /api/peps endpoint (at least once we're ready to more publicly expose it)? Then if there was need/desire in the future, we could have an authors endpoint, sub-endpoints api/peps/N to get a single PEP's metadata, etc.

It would be really nice to just expose all the header data, especially given the push toward ensuring they follow a consistent format and are machine-parsable, to both allow us to do useful things in the rendered output (which I have some additional PRs almost ready to go on that improve further).

pep_sphinx_extensions/pep_zero_generator/pep_index_generator.py

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>

hugovk · 2022-03-29T14:17:29Z

It might also be nice to include at least the created date (I don't know what other fields can be guaranteed to be present and correct in all PEPs).

Added! For those no present, their None is included as null in the JSON.

I was actually thinking something very similar—we could generate a simple static JSON API with the JSONified PEP link and header data for machine-readable applications. Maybe put this under an /api/peps endpoint (at least once we're ready to more publicly expose it)?

Shall we put it there now? I've included a commit to move it to /api/peps.json

Then if there was need/desire in the future, we could have an authors endpoint, sub-endpoints api/peps/N to get a single PEP's metadata, etc.

Yes, stuff like that could be useful in the future.

It would be really nice to just expose all the header data, especially given the push toward ensuring they follow a consistent format and are machine-parsable, to both allow us to do useful things in the rendered output (which I have some additional PRs almost ready to go on that improve further).

I've added (all?) the header fields except for these which, contain email addresses:

BDFL-Delegate
PEP-Delegate
Sponsor

And these which are less useful:

Content-Type - always text/x-rst
Last-Modified - mostly $Date$
Version - mostly $Revision$

CAM-Gerlach · 2022-03-29T19:40:54Z

One more thing, sorry—why not make the top-level object an object (dictionary) with the PEP number as the key instead of an array? This would make it much easier and faster for clients to get a specific PEP, just peps[str(pep_num)] rather than having to iterate through the whole list looking for the one with number == pep_num. It also allows adding special top-level metadata.

Added! For those no present, their None is included as null in the JSON.

Sounds good; in the future, if there's any interest, we could actually parse the fields into more structured formats, especially once my PRs land that fully validate their format for much easier and trouble-free internal and external use.

Shall we put it there now? I've included a commit to move it to /api/peps.json

Seems prudent to me.

I've added (all?) the header fields except for these which, contain email addresses:

At least once my forthcoming PR is in (should be in a few hours), the field will be validated to actually match the current or historically-specified formats, such that you can confidently grab only the author name without the email. But we can wait on that, if desired.

Content-Type - always text/x-rst

This isn't really useful now, but if we do decide to keep it around, it could change in the future if we allow, e.g., MyST PEPs (but we could always add it then).

Last-Modified - mostly $Date$
Version - mostly $Revision$

These two are a legacy of the old PEP 9 text format (AFAIK) and haven't done anything for a long time, so yeah they should be elided (though if its easy to do so, you could add the last modified date that's already automatically calculated and displayed below the PEP).

hugovk · 2022-03-30T14:38:59Z

Updated to this structure:

{
 "1": {
  "title": "PEP Purpose and Guidelines",
  "authors": "Warsaw, Hylton, Goodger, Coghlan",
  "discussions_to": null,
  "status": "Active",
  "type": "Process",
  "created": "13-Jun-2000",
  "python_version": null,
  "post_history": "21-Mar-2001, 29-Jul-2002, 03-May-2003, 05-May-2012, 07-Apr-2013",
  "resolution": null,
  "requires": null,
  "replaces": null,
  "superseded_by": null,
  "url": "https://peps.python.org/pep-0001/"
 },
 "2": {
  "title": "Procedure for Adding New Modules",
  "authors": "Faassen",
  "discussions_to": null,
  "status": "Superseded",
  "type": "Process",
  "created": "07-Jul-2001",
  "python_version": null,
  "post_history": "07-Jul-2001, 09-Mar-2002",
  "resolution": null,
  "requires": null,
  "replaces": null,
  "superseded_by": null,
  "url": "https://peps.python.org/pep-0002/"
 },
...

TeamSpen210 · 2022-03-30T23:48:59Z

pep_sphinx_extensions/pep_zero_generator/pep_index_generator.py

+    pep_dict = {
+        pep.number: {
+            "title": pep.title,
+            "authors": ", ".join(pep.authors.nick for pep.authors in pep.authors),


Would it be better to make this an array of nicks, so users don't need to split the string again?

Currently, all of the fields are strings (or null), and match the literal value of the headers (minus cleaned-up whitespace). if we're going to do this, it would be inconsistent if we didn't process the other headers as well into more structured formats, which would be a good idea but better left to a future PR. With #2484 , tools can rely on each of the headers to match a certain format, so they are easier and more consistent to work with as strings and can be split with just .split(","), without having to worry much about edge cases.

As mentioned above,

In the future, if there's any interest, we could actually parse the fields into more structured formats, especially once my PRs land that fully validate their format for much easier and trouble-free internal and external use.

CAM-Gerlach

LGTM now @hugovk , IMO we could iterate on this in the future to produce a more structure format if and when need and time allows, but I think that broadens the scope of this PR too much unless you really want to do it (and it would make sense to wait for #2484 ).

hugovk · 2022-04-01T20:07:31Z

Yes, let's merge this as is. We've not announced or documented this API, so it's fine to iterate and change the schema if needed. Thanks all for the reviews!

hugovk · 2022-04-02T10:51:55Z

I just published the first thing using this JSON file :)

https://pypi.org/project/pepotron/

A CLI to open PEPs in your browser. Type a PEP number, a Python version to see that version's release schedule PEP, or a some words to find the PEP with matching title.

For example:

$ pep 8
https://peps.python.org/pep-0008/
$ pep 3.11
https://peps.python.org/pep-0664/
$ pep "dead batteries"
Score	Result
90	PEP 594: Removing dead batteries from the standard library
55	PEP 288: Generators Attributes and Exceptions
55	PEP 363: Syntax For Dynamic Attribute Access
55	PEP 476: Enabling certificate verification by default for stdlib http clients
52	PEP 349: Allow str() to return unicode strings

https://peps.python.org/pep-0594/

Rosuav · 2022-04-02T10:55:41Z

I like the version number feature! Installed, will be making use of.

CAM-Gerlach · 2022-04-02T15:21:32Z

Sweet! Admittedly, the version number feature is a bit of a workaround for the fact that release schedules are published as arbitrary PEPs and not in one cohesive, dedicated place, but especially as someone who's navigating between dozens of PEPs every day, that's a fantastic tool! (Though, I fear ending up with hundreds of browser tabs if I'm not careful, heh, vs. my current less convenient approach of having a few PEP tabs, typing peps. in the main bar to switch to one, and then replacing the PEP number, heh).

It would be cool to add option flags to list/search by various header fields, or a combination of the same. E.g. pep --author <name> to list all PEPs that include a specific author, or pep --version 3.11 --type "Standards Track" --status "Accepted" to see all feature PEPs accepted for 3.11. Exposing this via Python as well would allow for the equivalent of query-param functionality to consumers without having to run a dynamic server-side backend.

Infra: generate machine-readable PEP index

36b2a5f

hugovk requested a review from AA-Turner as a code owner March 27, 2022 19:02

the-knights-who-say-ni added the CLA signed label Mar 27, 2022

hugovk requested a review from CAM-Gerlach March 27, 2022 19:02

hugovk added the infra Core infrastructure for building and rendering PEPs label Mar 27, 2022

AA-Turner requested changes Mar 27, 2022

View reviewed changes

CAM-Gerlach reviewed Mar 27, 2022

View reviewed changes

Don't create a temp list

c24b85b

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>

hugovk force-pushed the peps-json branch from fb27b72 to f774598 Compare March 29, 2022 12:59

Infra: Include more headers in peps.json

30d0d6c

hugovk force-pushed the peps-json branch from d372948 to 52f8ca8 Compare March 29, 2022 14:44

Infra: Move from /peps.rss to /api/peps.json

089826a

hugovk force-pushed the peps-json branch from 52f8ca8 to 089826a Compare March 29, 2022 14:50

CAM-Gerlach mentioned this pull request Mar 30, 2022

Lint: Update headers and checks per current guidance & provide helpful feedback #2484

Merged

Infra: Build JSON from dict, with PEP number as key, instead of list

5f88ed5

hugovk force-pushed the peps-json branch from 9f0e8c1 to 5f88ed5 Compare March 30, 2022 14:42

TeamSpen210 reviewed Mar 30, 2022

View reviewed changes

CAM-Gerlach approved these changes Mar 31, 2022

View reviewed changes

hugovk merged commit fe2d145 into python:main Apr 1, 2022

hugovk deleted the peps-json branch April 1, 2022 20:07

CAM-Gerlach mentioned this pull request Apr 21, 2022

PEP 11: Add Discussions section #2544

Merged

AA-Turner mentioned this pull request May 7, 2022

Document peps.json and move it to the root #2584

Open

erlend-aasland mentioned this pull request May 13, 2022

pep8/greppable exception messages erlend-aasland/peps#1

Closed

erlend-aasland mentioned this pull request Jun 27, 2022

pep 687/mark as accepted erlend-aasland/peps#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infra: generate machine-readable PEP index #2475

Infra: generate machine-readable PEP index #2475

hugovk commented Mar 27, 2022 •

edited

AA-Turner left a comment •

edited

CAM-Gerlach left a comment

hugovk commented Mar 29, 2022 •

edited

CAM-Gerlach commented Mar 29, 2022

hugovk commented Mar 30, 2022 •

edited

TeamSpen210 Mar 30, 2022

CAM-Gerlach Mar 31, 2022

CAM-Gerlach left a comment

hugovk commented Apr 1, 2022

hugovk commented Apr 2, 2022

Rosuav commented Apr 2, 2022

CAM-Gerlach commented Apr 2, 2022

Infra: generate machine-readable PEP index #2475

Infra: generate machine-readable PEP index #2475

Conversation

hugovk commented Mar 27, 2022 • edited

AA-Turner left a comment • edited

Choose a reason for hiding this comment

CAM-Gerlach left a comment

Choose a reason for hiding this comment

hugovk commented Mar 29, 2022 • edited

CAM-Gerlach commented Mar 29, 2022

hugovk commented Mar 30, 2022 • edited

TeamSpen210 Mar 30, 2022

Choose a reason for hiding this comment

CAM-Gerlach Mar 31, 2022

Choose a reason for hiding this comment

CAM-Gerlach left a comment

Choose a reason for hiding this comment

hugovk commented Apr 1, 2022

hugovk commented Apr 2, 2022

Rosuav commented Apr 2, 2022

CAM-Gerlach commented Apr 2, 2022

hugovk commented Mar 27, 2022 •

edited

AA-Turner left a comment •

edited

hugovk commented Mar 29, 2022 •

edited

hugovk commented Mar 30, 2022 •

edited