Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infra: generate machine-readable PEP index #2475

Merged
merged 5 commits into from
Apr 1, 2022
Merged

Conversation

hugovk
Copy link
Member

@hugovk hugovk commented Mar 27, 2022

PEP 0 is a human-readable PEP index.

It would be useful to create a machine-readable version for, well, machines to read and process. This PR creates a peps.json of the key fields.

Preview:

[
{
"number": 1,
"title": "PEP Purpose and Guidelines",
"authors": "Warsaw, Hylton, Goodger, Coghlan",
"status": "Active",
"type": "Process",
"url": "https://peps.python.org/pep-0001/"
},
{
"number": 2,
"title": "Procedure for Adding New Modules",
"authors": "Faassen",
"status": "Superseded",
"type": "Process",
"url": "https://peps.python.org/pep-0002/"
},
{
"number": 3,
"title": "Guidelines for Handling Bug Reports",
"authors": "Hylton",
"status": "Withdrawn",
"type": "Process",
"url": "https://peps.python.org/pep-0003/"
},
...

https://pep-previews--2475.org.readthedocs.build/peps.json

@hugovk hugovk requested a review from AA-Turner as a code owner March 27, 2022 19:02
@hugovk hugovk requested a review from CAM-Gerlach March 27, 2022 19:02
@hugovk hugovk added the infra Core infrastructure for building and rendering PEPs label Mar 27, 2022
Copy link
Member

@AA-Turner AA-Turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still on my short break, but this PR provided a nice distraction on a Sunday! (Edit: review is purely through the GitHub UI and I haven't run the actual changes to verify.)

It might also be nice to include at least the created date (I don't know what other fields can be guaranteed to be present and correct in all PEPs).

A

Copy link
Member

@CAM-Gerlach CAM-Gerlach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually thinking something very similar—we could generate a simple static JSON API with the JSONified PEP link and header data for machine-readable applications. Maybe put this under an /api/peps endpoint (at least once we're ready to more publicly expose it)? Then if there was need/desire in the future, we could have an authors endpoint, sub-endpoints api/peps/N to get a single PEP's metadata, etc.

It would be really nice to just expose all the header data, especially given the push toward ensuring they follow a consistent format and are machine-parsable, to both allow us to do useful things in the rendered output (which I have some additional PRs almost ready to go on that improve further).

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
@hugovk
Copy link
Member Author

hugovk commented Mar 29, 2022

It might also be nice to include at least the created date (I don't know what other fields can be guaranteed to be present and correct in all PEPs).

Added! For those no present, their None is included as null in the JSON.

I was actually thinking something very similar—we could generate a simple static JSON API with the JSONified PEP link and header data for machine-readable applications. Maybe put this under an /api/peps endpoint (at least once we're ready to more publicly expose it)?

Shall we put it there now? I've included a commit to move it to /api/peps.json

Then if there was need/desire in the future, we could have an authors endpoint, sub-endpoints api/peps/N to get a single PEP's metadata, etc.

Yes, stuff like that could be useful in the future.

It would be really nice to just expose all the header data, especially given the push toward ensuring they follow a consistent format and are machine-parsable, to both allow us to do useful things in the rendered output (which I have some additional PRs almost ready to go on that improve further).

I've added (all?) the header fields except for these which, contain email addresses:

  • BDFL-Delegate
  • PEP-Delegate
  • Sponsor

And these which are less useful:

  • Content-Type - always text/x-rst
  • Last-Modified - mostly $Date$
  • Version - mostly $Revision$

@CAM-Gerlach
Copy link
Member

One more thing, sorry—why not make the top-level object an object (dictionary) with the PEP number as the key instead of an array? This would make it much easier and faster for clients to get a specific PEP, just peps[str(pep_num)] rather than having to iterate through the whole list looking for the one with number == pep_num. It also allows adding special top-level metadata.

Added! For those no present, their None is included as null in the JSON.

Sounds good; in the future, if there's any interest, we could actually parse the fields into more structured formats, especially once my PRs land that fully validate their format for much easier and trouble-free internal and external use.

Shall we put it there now? I've included a commit to move it to /api/peps.json

Seems prudent to me.

I've added (all?) the header fields except for these which, contain email addresses:

At least once my forthcoming PR is in (should be in a few hours), the field will be validated to actually match the current or historically-specified formats, such that you can confidently grab only the author name without the email. But we can wait on that, if desired.

Content-Type - always text/x-rst

This isn't really useful now, but if we do decide to keep it around, it could change in the future if we allow, e.g., MyST PEPs (but we could always add it then).

Last-Modified - mostly $Date$
Version - mostly $Revision$

These two are a legacy of the old PEP 9 text format (AFAIK) and haven't done anything for a long time, so yeah they should be elided (though if its easy to do so, you could add the last modified date that's already automatically calculated and displayed below the PEP).

@hugovk
Copy link
Member Author

hugovk commented Mar 30, 2022

Updated to this structure:

{
 "1": {
  "title": "PEP Purpose and Guidelines",
  "authors": "Warsaw, Hylton, Goodger, Coghlan",
  "discussions_to": null,
  "status": "Active",
  "type": "Process",
  "created": "13-Jun-2000",
  "python_version": null,
  "post_history": "21-Mar-2001, 29-Jul-2002, 03-May-2003, 05-May-2012, 07-Apr-2013",
  "resolution": null,
  "requires": null,
  "replaces": null,
  "superseded_by": null,
  "url": "https://peps.python.org/pep-0001/"
 },
 "2": {
  "title": "Procedure for Adding New Modules",
  "authors": "Faassen",
  "discussions_to": null,
  "status": "Superseded",
  "type": "Process",
  "created": "07-Jul-2001",
  "python_version": null,
  "post_history": "07-Jul-2001, 09-Mar-2002",
  "resolution": null,
  "requires": null,
  "replaces": null,
  "superseded_by": null,
  "url": "https://peps.python.org/pep-0002/"
 },
...

pep_dict = {
pep.number: {
"title": pep.title,
"authors": ", ".join(pep.authors.nick for pep.authors in pep.authors),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to make this an array of nicks, so users don't need to split the string again?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, all of the fields are strings (or null), and match the literal value of the headers (minus cleaned-up whitespace). if we're going to do this, it would be inconsistent if we didn't process the other headers as well into more structured formats, which would be a good idea but better left to a future PR. With #2484 , tools can rely on each of the headers to match a certain format, so they are easier and more consistent to work with as strings and can be split with just .split(","), without having to worry much about edge cases.

As mentioned above,

In the future, if there's any interest, we could actually parse the fields into more structured formats, especially once my PRs land that fully validate their format for much easier and trouble-free internal and external use.

Copy link
Member

@CAM-Gerlach CAM-Gerlach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now @hugovk , IMO we could iterate on this in the future to produce a more structure format if and when need and time allows, but I think that broadens the scope of this PR too much unless you really want to do it (and it would make sense to wait for #2484 ).

@hugovk
Copy link
Member Author

hugovk commented Apr 1, 2022

Yes, let's merge this as is. We've not announced or documented this API, so it's fine to iterate and change the schema if needed. Thanks all for the reviews!

@hugovk hugovk merged commit fe2d145 into python:main Apr 1, 2022
@hugovk hugovk deleted the peps-json branch April 1, 2022 20:07
@hugovk
Copy link
Member Author

hugovk commented Apr 2, 2022

I just published the first thing using this JSON file :)

https://pypi.org/project/pepotron/

A CLI to open PEPs in your browser. Type a PEP number, a Python version to see that version's release schedule PEP, or a some words to find the PEP with matching title.

For example:

$ pep 8
https://peps.python.org/pep-0008/
$ pep 3.11
https://peps.python.org/pep-0664/
$ pep "dead batteries"
Score	Result
90	PEP 594: Removing dead batteries from the standard library
55	PEP 288: Generators Attributes and Exceptions
55	PEP 363: Syntax For Dynamic Attribute Access
55	PEP 476: Enabling certificate verification by default for stdlib http clients
52	PEP 349: Allow str() to return unicode strings

https://peps.python.org/pep-0594/

@Rosuav
Copy link
Contributor

Rosuav commented Apr 2, 2022

I like the version number feature! Installed, will be making use of.

@CAM-Gerlach
Copy link
Member

Sweet! Admittedly, the version number feature is a bit of a workaround for the fact that release schedules are published as arbitrary PEPs and not in one cohesive, dedicated place, but especially as someone who's navigating between dozens of PEPs every day, that's a fantastic tool! (Though, I fear ending up with hundreds of browser tabs if I'm not careful, heh, vs. my current less convenient approach of having a few PEP tabs, typing peps. in the main bar to switch to one, and then replacing the PEP number, heh).

It would be cool to add option flags to list/search by various header fields, or a combination of the same. E.g. pep --author <name> to list all PEPs that include a specific author, or pep --version 3.11 --type "Standards Track" --status "Accepted" to see all feature PEPs accepted for 3.11. Exposing this via Python as well would allow for the equivalent of query-param functionality to consumers without having to run a dynamic server-side backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA signed infra Core infrastructure for building and rendering PEPs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants