Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDEP-1: Purpose and guidelines for pandas enhancement proposals #47444

Merged
merged 18 commits into from
Aug 3, 2022

Conversation

datapythonista
Copy link
Member

@datapythonista datapythonista commented Jun 21, 2022

Closes #28568

Initial PDEP to define purpose and guidelines for pandas enhancement proposals (equivalent to PEPs or NEPs). Feedback very welcome.

This PR also makes the PDEPs public in the roadmap page of our website.

@datapythonista datapythonista added Web pandas website PDEP pandas enhancement proposal labels Jun 21, 2022
datapythonista and others added 2 commits June 21, 2022 20:23
Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>
Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>
@mroeschke
Copy link
Member

This would probably close #28568

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't immediately see this, but it would be good to explain how an accepted/rejected decision is made. This looks to be Numpy's: https://numpy.org/neps/nep-0000.html#how-a-nep-becomes-accepted

For example, it would be good to define:

  • How consensus is defined: 100%, 2/3rds, 51% approval from core devs?
  • How to collect consensus: PDEP PR? Monthly dev meeting?
  • Should a "call to vote" action be available to unblock perpetual discussion?

@datapythonista
Copy link
Member Author

I didn't immediately see this, but it would be good to explain how an accepted/rejected decision is made.

Very good point. I thought this was quite a big topic, and maybe more related to general governance than specific to PDEPs. That's why for now I decided to just link to the governance documentation. Based on the current governance, decisions are made by consensus, with the BDFL (Wes) having the final word when consensus is not possible.

I think the governance documentation is 7 years old. Many things changed since then, including the structure of the team (from like 3 active core devs to like 20, Python moved out of a BDFL model...). Also, besides the decision making, I think our governance seems outdated. I personally don't know most of the people in the pandas CoC committee, and I assume they are not performing any task, and probably don't even know they are part of it.

While I fully agree on what you say, I'd personally keep the discussions separate, as I think both are quiet big, and can be taken care of separately. What do you think?

@mroeschke mroeschke added this to the 1.5 milestone Jun 22, 2022
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. Agreed that general decision making details can go under governance.

Copy link
Member

@lithomas1 lithomas1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG overall. Just had a few questions as I was going through it.


#### Accepted PDEP

A PDEP will be accepted by the core development team, and decisions will be made
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe reword "will" to something else? This language makes it feel as if all PDEP's will be accepted.

#### Submitting a PDEP

Proposing a PDEP is done by creating a PR adding a new file to `web/pdeps/accepted/`.
The file is a markdown file, you can use `web/pdeps/accepted/0001.md` as a reference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it would be convenient to have an actual blank template PDEP file to be filled out.

It might also be nice to have a script that generates the next PDEP number for you(I anticipate it might be hard to find the next PDEP number if a lot of PDEPs are submitted in the future).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd personally leave that for later, once we need it. Those sound like good ideas to me, but I wouldn't implement them initially, I would wait to see how things work first. If we merge like PDEP per month, I think checking the last PDEP number before merging is easier than a system to autogenerate them. And for a template, I'd wait to have few actual PDEPs before deciding if it helps, or if PDEPs are too different from one to another.

Does it make sense to you to start simple and iterate later as we have more experience?

## Evolution of this PDEP

While most PDEPs aren't expected to change after accepted, in some cases like this
PDEP, they will be updated to contain its latest version, if things evolve. A log
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably clarify when a PDEP can be revised v.s. when a new PDEP has to be submitted superseding the old PDEP.

The file is a markdown file, you can use `web/pdeps/accepted/0001.md` as a reference
for the expected format.

By default, we expect a PDEP will be accepted, so the PR of a PDEP should be done
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to also have a public comment period on PDEP's like tensorflow has with their RFC's?

We should probably also clarify when voting happens(define what proportion of core team members need to consider a PDEP ready before voting like @mroeschke stated) and how long core team members have to vote.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to discuss and hear other opinions. But to me personally, I wouldn't have a voting period or deadline.

I see PDEPs more like ongoing discussions, with feedback and updates, than just a voting process. Also, I assume tensorflow core devs are mainly google employees, so imposing a deadline makes more sense, as they are supposed to be working on the project X hours. But for a mostly volunteer developed project like pandas, I'd leave it more open.

But again, I'd start by seeing how things work, and if we see PDEPs keep open for too long, and seems like having a timeframe should help, surely worth giving it a try.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But again, I'd start by seeing how things work, and if we see PDEPs keep open for too long, and seems like having a timeframe should help, surely worth giving it a try.

I agree, just having a PDEP is akin to a new process, so having a process for the PDEP is maybe like introducing another meta process at the same time. So in response to this comment and others related to the process, I agree with @datapythonista that to begin with we just discuss whether we want to use PDEPs, what we want out of a PDEP, and roughly what it should contain and iterate on the gaps over time. The only caveat here, is that once someone has spent effort preparing a PDEP and it is approved that we don't have "blockers" implementing, so that does affect the approval process to some degree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree - we need a voting process here - we could emulate that of NEP which is pretty simple

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a voting process here

I think what Marc proposed above is to defer a discussion about the decision process (basically our governance model) for a subsequent discussion.

(which sounds good to me to keep the discussion manageable)

for the expected format.

By default, we expect a PDEP will be accepted, so the PR of a PDEP should be done
in the `accepted` directory and contain `Status: Accepted`. If a PDEP is finally
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could alsof default to start a PDEP in a "draft" status (or "under discussion", I see there is a section for those)?
To me that seems to better signal the actual status (when going through the diff of this one, I wondered why it was saying " accepted" since we are still discussing the proposal). When we decide to merge, it's then only a small change to update the status before merging. And for larger PDEPs it might also be useful to merge it already in draft status (to have a rendered version to link to, like they do with PEPs).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

Regarding merging PDEPs as draft, feels like it could make discussions more difficult. Like where to have the discussion once the PR is merged. Via new PRs? Or in a second PR, not being able to comment in parts not in the diff. But I guess in some cases it can be useful, maybe a PR with different parts, and merging as a Draft once the first part has been discussed, and before working on other parts.

I'll update the document if there are no objections to using Under discussion status (or Draft if people prefer this name) until PR is Accepted. And to consider merging PRs before they are accepted when it's useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO I think the current guidance stated here is clear as long as a PDEP is an open PR, it should be considered "under discussion" and a draft PDEP can be a draft PR for example. I agree with @datapythonista that merging intermediate states of a PDEP would fragment the discussion and there shouldn't be more than one page/PR/location where a PDEP can be discussed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding merging PDEPs as draft, feels like it could make discussions more difficult.

We will probably have to see how it goes in practice. For many PDEPs a single PR with proposal, discussion, acceptance all in one will quite likely be sufficient.
But I can imagine for larger/longer-running discussions (I am thinking eg about the copy/view proposal), that it could be useful to already have a hosted version of PDEP that is still under discussion.

As some references: numpy seems to already merge NEP PRs if they are still under discussion (eg numpy/numpy#18456), but they also have substantial discussion on the mailing list. Similarly in Python, PEPs are merged quickly, but also there the discussion is mostly happening on mailing list or discourse. That's of course different with the current way in pandas where most discussion typically happens on GitHub.

@datapythonista
Copy link
Member Author

I updated the PR to incorporate the feedback from comments, and I replied to the ones I didn't directly incorporate.

I also fixes couple of visualization issues, and improved a bit the styles, so the roadmap and also other pages are clearer.

Let me know of additional feedback, and feel free to approve when you're happy with getting this merged, or add a Request changes flagged if you see any blocker. Thanks!

@datapythonista
Copy link
Member Author

@pandas-dev/pandas-core any other comment before merging this? If there are no objections I'll merge this in couple of days, and we can keep iterating as needed in separate PRs.

The file is a markdown file, you can use `web/pdeps/accepted/0001.md` as a reference
for the expected format.

By default, we expect a PDEP will be accepted, so the PR of a PDEP should be done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure this is true - sure a lot will easily be accepted but some might be controversial and ultimately not accepted

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. maybe just qualify this by changing By default, we expect a PDEP will be accepted to something like By default, we expect a PDEP (with some preliminary discussion) will be accepted

before this text we have...

A PDEP should be used for changes that are not
immediate and not obvious, and are expected to require a significant amount of
discussion and require detailed documentation before being implemented.

so the PEP is part of the detailed documentation and an issue should perhaps be the initial part of the discussion. (just as I would normally expect say a bug fix PR to have an associated issue)

and we also have

If you are unsure if you should be opening
an issue or creating a PDEP, it's probably safe to start by
opening an issue, which can
be eventually moved to a PDEP.

I think that maybe issues should always be opened before submitting a PEP, either as a specific issue or an existing issue concluding that a PEP is required. Maybe we also need to ensure that @pandas-dev/pandas-core is also notified in these cases.

If we have some form of gate for opening a PEP, then most should be accepted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also just leave out the "By default, we expect a PDEP will be accepted" ? To me it doesn't seem to add much, rather than a potentially wrong/confusing message (in general PDEPs are for topics that are not trivial and thus will not always be accepted)

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @datapythonista generally lgtm.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful to somewhere list the possible "status" values? (now those seem to be included in the different workflow sections, but it's not very explicit)


## PDEP definition, purpose and scope

A PDEP (pandas enhancement proposal) is a proposal to a **major** change in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A PDEP (pandas enhancement proposal) is a proposal to a **major** change in
A PDEP (pandas enhancement proposal) is a proposal for a **major** change in

? (not fully sure, but "proposal to" sounds like it should be followed by a verb


- The core development team, who will have the final decision on whether a PDEP
is approved or not
- Developers of pandas and other related projects, and experienced users. Their
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Developers of pandas and other related projects, and experienced users. Their
- Contributors to pandas and other related projects, and experienced users. Their

? (might be a bit more generic wording, as "developer" sounds more code-centric?)

The file is a markdown file, you can use `web/pdeps/accepted/0001.md` as a reference
for the expected format.

By default, we expect a PDEP will be accepted, so the PR of a PDEP should be done
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also just leave out the "By default, we expect a PDEP will be accepted" ? To me it doesn't seem to add much, rather than a potentially wrong/confusing message (in general PDEPs are for topics that are not trivial and thus will not always be accepted)

#### Rejected PDEP

A PDEP can be rejected when the final decision is that its implementation is
not the best for the interests of the project. They are as useful as accepted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word "accepted" seems a bit strange here, since this is about PDEPs that are not accepted?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to contrast rejected PDEPs against accepted PDEPs; i.e. "even rejected PDEPs are useful". I think the wording is okay, but maybe "Rejected PDEPs are just as useful as accepted PDEPs..." would make it more clear?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed the first "as" in the sentence, so I read it as "they are useful as accepted PDEP", instead of "they are as useful as accepted PDEPs", hence the confusion.

@@ -0,0 +1,121 @@
# PDEP-1: Purpose and guidelines

- Date: 21 June 2022
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "Created" instead of "Date"? Since this will typically the date when the process is started, and not when it was eg accepted, "created" might denote that more correctly (and both NEPs and PEPs seem to use that)

@jorisvandenbossche
Copy link
Member

Related to my previous comment (#47444 (comment)) about starting a new PDEP as draft / under discussion: this has been changed in the text, but at the same time we still say to create the new PDEP file in the web/pdeps/accepted/ directory.
Personally, I still find that a bit confusing and sending a potentially wrong message. If we have an under_discussion status, why not also reflect that in the directory / path? (although this probably also comes back to whether we have the intention to ever merge PRs in that status, because indeed if we never do that, we would actually never use that directory in practice)

Taking one step back: do we actually need those different directories? (I suppose they are currently used to simplify the script to automate the website content to list the PDEPs)
For example, NumPy (and also Python PEPs) uses a flat directory for the documents: https://github.com/numpy/numpy/tree/main/doc/neps. And it then infers the status from the header. If we have a standard preamble that has the status in that list, that also sounds doable to automate this (instead of being based on directories)? It would make my above discussion point (about in which directory to start the proposal) moot, and can also simplify the overall process a little bit (eg you never need to move files for changing a status, but just change the Status line in the document (eg when a PDEP gets "implemented")).

@datapythonista
Copy link
Member Author

Thanks all for the feedback, very interesting points.

For the voting, I'm working on the governance documents in parallel, and I'll propose changes there to have a way to decide when a PDEP is ready to merge.

For the name to use, there are several comments about names, and many valid points. It's probably not so important, but to find something that most people are happy with, I think we can run a quick poll. I'll write here different options including the ones already proposed. Feel free to suggest more, and when we're ready I'll send a link to the pandas-dev list with a poll.

  • PDEP
  • pdEP
  • PEP
  • EP
  • PCP (pandas change proposal)
  • DEP (dataframe enhancement proposal)

I'll be addressing the rest of the comments in this PR.

@noatamir
Copy link
Member

May I suggest adding a template for new proposals as part of this PR, or as a follow up if that's preferable? 😇

I find templates useful to help facilitate structured discussions, and that important points relevant for consideration aren't missed by the proposal author(s) before submitting them, e.g. how would this affect backwards compatibility.

For reference I include these two templates which could be relevant as a starting point:

@datapythonista
Copy link
Member Author

Thanks all for the feedback. Seems like the preferred name is PDEP, so keeping it. I addressed all the comments, there are two pending things, that I think make more sense to address in future PRs:

  • How the exact decision on when a PDEP is approved (which is part of a wider discussion about governance)
  • Adding a template for PDEPs. While this seems a great idea, I personally think it's premature to have it initially, and I'd wait until we've got few PDEPs, and then decide if we think it adds value, and what should be the format, with concrete examples

Let me know if any other comment. @jreback you reviewed with Requested changes, I rewrote that whole section that other people also found misleading, I think your comments should be addressed now.

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


<div class="alert alert-warning" role="alert">
pandas is in the process of moving roadmap points to PDEPs (implemented in
June 2022). During the transition, some roadmap points will exist as PDEPs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
June 2022). During the transition, some roadmap points will exist as PDEPs,
July 2022). During the transition, some roadmap points will exist as PDEPs,

to match the PDEP created date? but probably August.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah switch to August :->

web/pandas/pdeps/0001-purpose-and-guidelines.md Outdated Show resolved Hide resolved
web/pandas/pdeps/0001-purpose-and-guidelines.md Outdated Show resolved Hide resolved
status will be changed to `Status: Implemented`, so there is visibility that the PDEP
is not part of the roadmap and future plans, but a change that it already
happened. The first pandas version in which the PDEP implementation is
available will also be included in the PDEP.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not yet relevant, but would this be just a note in the revision history or something else?

web/pandas/pdeps/0001-purpose-and-guidelines.md Outdated Show resolved Hide resolved
web/pandas/pdeps/0001-purpose-and-guidelines.md Outdated Show resolved Hide resolved
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. some open comments by @simonjayhawkins


<div class="alert alert-warning" role="alert">
pandas is in the process of moving roadmap points to PDEPs (implemented in
June 2022). During the transition, some roadmap points will exist as PDEPs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah switch to August :->

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding putting up PDEPs under discussion as open PRs, the one negative aspect is that they aren't protected. The author can force-push making changes hard to follow. I think this will work well assuming no bad actors, and I think that's an okay assumption to make, but wanted to be explicit about it. Also, this downside is outweighed by the upside of being able to have all discussion in a consolidated place in my opinion.

datapythonista and others added 5 commits August 3, 2022 10:13
Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>
Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>
Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>
Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>
@datapythonista datapythonista merged commit 62646b5 into pandas-dev:main Aug 3, 2022
@datapythonista
Copy link
Member Author

Thanks all for the reviews and feedback!

@h-vetinari
Copy link
Contributor

An unexpected & pleasant surprise that this old issue of mine got fixed. Not sure I have the time/energy to pour those old PDEP-sized ideas into documents, but glad to know that there'd now be a way to do it.

@simonjayhawkins
Copy link
Member

Thanks @datapythonista for the PR.

An unexpected & pleasant surprise that this old issue of mine got fixed. Not sure I have the time/energy to pour those old PDEP-sized ideas into documents, but glad to know that there'd now be a way to do it.

Thanks @h-vetinari. feel free to comment on those issues that a PDEP may be a good idea as others may think that they are worth pushing now.

@simonjayhawkins
Copy link
Member

and if you have already put a lot of time and effort in the discussion, you may have already made it alot easier to progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PDEP pandas enhancement proposal Web pandas website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pandas Enhancement Proposals?
10 participants