API: keywords only arguments #14714

tupui · 2021-09-10T08:52:45Z

Is your feature request related to a problem? Please describe.

Keywords only arguments are arguably clearer and allow easier refactoring on our side. When adding a new feature, we are already thinking about using * to declare keyword only arguments.

Describe the solution you'd like.

scikit-learn already went through this process and with their v1 coming up, the whole code base is already keywords-only arguments. Hence, if accepted, I propose we follow the same path.

They did it in two phases:

Deprecation phase with a wrapper: it takes a function and look at which arguments are keywords only. If you call the function using a positional argument instead of a keyword only argument (if f(a, *, b) and you do f(1, 2) instead of f(1, b=2)) then it will raise a deprecation warning. The function will still be working as the keyword arguments would not be enforced yet. The wrapper will convert the positional argument into a keyword argument.
Remove the wrapper: after the deprecation cycle, remove the wrapper and we are done.

xref scikit-learn/scikit-learn#15005

The text was updated successfully, but these errors were encountered:

tylerjereddy · 2021-09-12T00:16:11Z

I'm certainly in favor for new functions, but I'm perhaps less convinced that overhauling the signatures for older functions is a good place to spend energy for us or our consumers. I don't feel too strongly about it either way, but certainly may cause quite a bit of entropy on downstream code for an arguably-modest maintenance improvement.

I suppose if scikit-learn went through with it we should at least consider the idea though.

rgommers · 2021-09-14T16:25:33Z

+1 for using in new code, keyword-only args are very helpful.

For existing code, I don't think we can do it for the whole code base - too painful. We could look at it for a specific submodule if there's a good reason for it I think (for example, we want to add more keyword arguments and the order of arguments is becoming quite odd).

tupui · 2021-09-22T15:46:42Z

Ok then, let's have this for new code and keep the current case-by-case changes we already do.

What should be our workflow for proposals? Should we just close this? Have some dedicated section for proposals in the doc? I see that NumPy as something to keep track of changes https://github.com/numpy/numpy/tree/main/doc/release/upcoming_changes. Some other libraries have like a decision diary.

If we would just add this in the doc, this would already be covered with gh-13955 (which IMO we should merge)

ilayn · 2021-09-22T17:33:14Z

One particular detail is that scikit-learn API is generally very involved so probably users are using keywords anyways (somehow VSCode is affecting if I have to guess). But on SciPy side this is not the case; If we force it on very often used functions this goes straight to annoyance, say eig or curve_fit or lfilter etc.

So it is the predicted usage frequency should be the deciding factor and not the module in my opinion.

ilayn · 2021-09-22T17:34:56Z

For example, optimize.minimize is a very good candidate for kw-only usage to avoid misconceptions.

rgommers · 2021-09-22T19:17:16Z

So it is the predicted usage frequency should be the deciding factor and not the module in my opinion.

I agree, it's a judgement call. It doesn't need complex signatures to be useful though. Even simple numpy function calls are often easier to read with keywords, like np.sum(x, axis=1) should be preferred over np.sum(x, 1). Only if the positional version is unambiguous for a reader who does not remember a function signature it's good to allow both (e.g. for expand_dims(x, 1) it's pretty clear that 1 is a dimension).

josef-pkt · 2021-09-22T19:32:03Z

Another example for not using the keywords

I'm using loc and scale in stats distributions most of the time as positional arguments to do something like pdf(x, *params)

Also it becomes more important to have keyword names that are consistent across similar functions.

(In statsmodels copulas I switched to generic pdf(u, args=() because, similar to shape parameters in scipy distributions, there are no common names for copula parameters across types of copulas (theta, alpha, a1, a2, corr, ....)
args is ugly but I can write consistent code across all classes and applications.

tupui · 2021-09-22T19:42:46Z

(In statsmodels copulas I switched to generic pdf(u, args=() because, similar to shape parameters in scipy distributions, there are no common names for copula parameters across types of copulas (theta, alpha, a1, a2, corr, ....)

args is ugly but I can write consistent code across all classes and applications.

I personally disagree. This interface is not a good design and for Copulas we could have done something else. I had another design before you took over my PR.

Having common interface is handy for objects which are relevant between each other like 2 Archimedean copulas. But wanting the same interface for any distribution for instance is pointless since a user would almost never consider switching from one distribution to any other. They would rather use some defined subset of possibilities.

This is even more true when we have objects doing lot of very specialized work (like optimization methods or copulas where assumptions and use cases might be very different from each other).

Just going through old matplotlib code is terrible and should be enough to convince anyone that args, kwargs should be avoided as much as possible.

josef-pkt · 2021-09-22T19:52:09Z

All the regression models in statsmodels use params and that works great for generic code.
We can add a new model in 10 lines and (almost) everything works automatically.

The same will apply once with have regression models estimated by MLE with copulas on the inside.
From what I have seen, articles on very similar applications use all types of copulas, eg. extreme value or elliptical for extreme weather prediction.

@tupui Try to write some generic code with scipy distributions that uses shape parameters as keywords. They might be nice for users that work only with one distribution, but not if you want to write code that estimates a large number of distributions.

tupui · 2021-09-23T13:45:59Z

All the regression models in statsmodels use params and that works great for generic code.
We can add a new model in 10 lines and (almost) everything works automatically.

My objective when writing code here is not to make my-as in a maintainer-life easier. Things like type hints are always a pain point for me to add, but I am the first one to be happy to have these while I am using the library. Hence I am pushing for it. As major libraries, we should have a strong focus on the user experience. Of course the developer experience should not be completely left aside, but I would argue that most things we would discuss here are in the end not that hard... Long story short, if it's not something totally obscure, I would favour user experience over mine (type hints again).

The same will apply once with have regression models estimated by MLE with copulas on the inside.
From what I have seen, articles on very similar applications use all types of copulas, eg. extreme value or elliptical for extreme weather prediction.

I hear your points, but if you read these articles you will not find *args, **kwargs in the equations 😉 There are a handful of possibilities (and mathematical symbols) at most and we can almost always make logical groups with named parameters. We are writing scientific software here, not something ultra generic like a parser. Things like args, polymorphism, and other dynamic concepts are really nice but we should not abuse them. It just make the code impossible to use for a user otherwise. Look at things like boto, good luck finding anything even inspecting the code.

@tupui Try to write some generic code with scipy distributions that uses shape parameters as keywords. They might be nice for users that work only with one distribution, but not if you want to write code that estimates a large number of distributions.

Same as above. As Ralf already said on the mailing list, there is lot of legacy here and we would probably not redo this if we would give the module a fresh start. In the meantime, you could just have a mapping.

josef-pkt · 2021-09-23T14:31:12Z

I hear your points, but if you read these articles you will not find *args, **kwargs in the equations. There are a handful of possibilities (and mathematical symbols)

That's on of my favorites, hundreds of greek symbols and decorated characters and not consistently used across articles, books and fields. After reading 3 pages, I don't remember which is which. And I really like authors that don't use words in sentences that might give a hint about what this funny, squiggly character, that was defined many pages ago, stands for.

What's phi and psi in archimedean copulas?
The literature is pretty evenly split on whether phi or psi is used.

More specific to scipy.
Before minimize was added, it was not easy to switch optimizers with keywords inconsistent across the fmin functions.
We had to write our own wrapper in statsmodels that predates scipy.minimize.

In the fmin case, using arguments as positional wouldn't work either because keyword positions or ordering varies also across function.

tupui · 2021-09-23T14:34:42Z

That's on of my favorites, hundreds of greek symbols and decorated characters and not consistently used across articles, books and fields. After reading 3 pages, I don't remember which is which. And I really like authors that don't use words in sentences that might give a hint about what this funny, squiggly character, that was defined many pages ago, stands for.

What's phi and psi in archimedean copulas?
The literature is pretty evenly split on whether phi or psi is used.

As with software, there are some inconsistencies. It's just a matter of taking side here. Just either pick phi or psi. My point is that there is not an infinite number of parameters.

In the fmin case, using arguments as positional wouldn't work either because keyword positions or ordering varies also across function.

It's exactly the use case of having keywords only arguments. The order does not matter.

h-vetinari · 2022-03-26T13:25:04Z

@tupui
Any context/update for closing this? I still think it would be a good idea to transition to keyword-only in many of our APIs.

tupui · 2022-03-27T14:59:27Z

@tupui

Any context/update for closing this? I still think it would be a good idea to transition to keyword-only in many of our APIs.

The consensus here and during meetings was: we do this for new code (as we've been doing), we don't update all old code. On a case by case basis, we can update old code if it really makes sense doing it.

h-vetinari · 2022-03-28T01:46:37Z

Thanks @tupui; I cannot read much more than what's in this issue (so it's hard to know what was discussed and how consensus was achieved), but I'd like to know if a lightweight (at the function-definition site at least) decorator has been considered that would let us allow to move to kwarg-only functions where we deem it useful (which, IMO, is the great majority):

Say we have a function right now:

# status as of 1.8
def scipy_func(x, y, kw1=None, kw2=None):
    """The docstring for scipy_func"""
    return "a" * x + "b" * y + "c" * (kw1 is not None) + "d" * (kw2 is not None)

We could transition this as follows:

# transition in scipy 1.N: make main implementation kwarg-only (note the "*"!), but private
def _scipy_func(x, y, *, kw1=None, kw2=None):
    """The docstring for scipy_func"""
    return "a" * x + "b" * y + "c" * (kw1 is not None) + "d" * (kw2 is not None)

# create wrapper with existing behaviour that warns if keyword arguments are passed as positional
@_pos_to_kwonly_decorator_factory(_scipy_func, pos_arg_names=["x", "y"],
                                  kwonly_arg_names=["kw1", "kw2"])
def scipy_func(x, y, kw1=None, kw2=None):
    pass  # <- not just for exposition; we don't need to redefine the body here

With this wrapper, the deprecation would work as expected:

>>> scipy_func(1, 2, kw1=3, kw2=4)
'abbcd'
>>> scipy_func(1, 2, 3, 4)
<stdin>:9: DeprecationWarning: use of scipy_func with positional arguments for the following keyword arguments is deprecated:
kw1, kw2
Please add the respective argument names to the call-site, e.g. 'kw1=...'
'abbcd'
>>> scipy_func.__doc__
'The docstring for scipy_func'

After a suitable transition, we delete the wrapper and make the keyword-only function public:

# future in 1.(N+2) after transition
def scipy_func(x, y, *, kw1=None, kw2=None):
    """The docstring for scipy_func"""
    return "a" * x + "b" * y + "c" * (kw1 is not None) + "d" * (kw2 is not None)

The only thing I've left out so far is the actual logic to do the decorator; while it can surely be improved/prettified, it's not that hard:

import functools
import warnings

def _pos_to_kwonly_decorator_factory(kwonly_func, pos_arg_names, kwonly_arg_names):
    # QoI: add some checks that the signatures of kwonly_func/dummy_func
    #      match the one that's declared by the factory
    def actual_decorator_after_applying_factory_params(dummy_func):
        # note that we intentionally do not use the function passed
        # to the decorator but use the one given to the factory
        @functools.wraps(kwonly_func)
        def wrapper(*args, **kwargs):
            num_pos = len(pos_arg_names)
            if len(args) > num_pos:
                warnings.warn(f"use of {kwonly_func.__name__.strip('_')} with "
                              f"positional arguments for the following "
                              f"keyword arguments is deprecated:\n"
                              f"{', '.join(kwonly_arg_names)}\n"
                              f"Please add the respective argument names to "
                              f"the call-site, e.g. '{kwonly_arg_names[0]}=...'",
                              DeprecationWarning)
                pos_args = args[:num_pos]
                kwonly_args = dict(zip(kwonly_arg_names, args[num_pos:])) | kwargs
            else:
                pos_args, kwonly_args = args, kwargs
            return kwonly_func(*pos_args, **kwonly_args)
        return wrapper
    return actual_decorator_after_applying_factory_params

tupui · 2022-03-28T09:30:21Z

@h-vetinari we did not have to discuss code. If we would have done something, sklearn already went down that route and we could just have taken their approach verbatim as it worked well. I don't think there would be any push back if you added such decorator in our utils, but as I said it was decided to not change anything in batch.

I suggest the following, if you identify a function that you think we should refactor, open a discussion for this case. If the outcome is that we want to refactor, then we can introduce the decorator for the rewrite.

h-vetinari · 2022-03-28T10:17:36Z

OK cool. The scikit helper is also much prettier than the one I hacked together.

I think it would be good idea to have something of a standing policy that if we're touching the signature of a function in a way that needs a deprecation cycle (e.g. remove/rename a keyword) to also use that to turn the keywords into keyword-only arguments, but for now I'll just try to remember that when I come across a case.

tupui · 2022-03-28T10:18:52Z

@h-vetinari feel free to put this on the agenda of our next community meeting.

rgommers · 2022-03-28T10:49:04Z

Adding a note to http://scipy.github.io/devdocs/dev/api-dev/api-dev-toc.html about positional-only and keyword-only arguments sounds like a good idea to me.

Re wholesale changes: that's going to break a ton of code, which isn't justified. It's also not the case that all keywords should be turned into keyword-only arguments, there are cases where positional usage is just fine. So it'd be a lot of work to make decisions as well. tl;dr let's not do that.

ev-br · 2022-03-28T11:38:47Z

What might be worth it if someone wants to spend effort on, is to check various functions which receive user callables: whether they handle keyword-only arguments. I suspect we do func(x, *args) in several places (e.g. curve_fit).

OTOH, there were no user bug reports, so maybe even this may wait until actual need arises.

h-vetinari · 2022-03-28T11:43:57Z

Adding a note to http://scipy.github.io/devdocs/dev/api-dev/api-dev-toc.html about positional-only and keyword-only arguments sounds like a good idea to me.

I'll try to look into that, but if someone gets there first, please don't hesitate!

Re wholesale changes: that's going to break a ton of code, which isn't justified. It's also not the case that all keywords should be turned into keyword-only arguments, there are cases where positional usage is just fine. So it'd be a lot of work to make decisions as well. tl;dr let's not do that.

I'm not sure if that's splitting hairs, but I don't consider the addition of a deprecation warning a breaking change. If there's enough time to transition then that's just part of the normal cycle.

That said, I wasn't proposing wholesale conversion (neither in bulk nor per function - as my example showed, it's fine IMO if positional arguments stay positional), but certainly functions with many keyword arguments (say more >2) should IMO consider enforcing that "keywordness", at least long-term.

rgommers · 2022-03-28T11:57:10Z

I'm not sure if that's splitting hairs, but I don't consider the addition of a deprecation warning a breaking change. If there's enough time to transition then that's just part of the normal cycle.

That's not how it works. For some packages, deprecation warnings need to be addressed immediately. But more importantly, after 2 or more releases we will remove the deprecated features, and then it's a hard break. Which is a problem for end user code, snippets on SO, and all kinds of other static or not actively-maintained code.

h-vetinari · 2022-03-28T12:37:30Z

That's not how it works. For some packages, deprecation warnings need to be addressed immediately. But more importantly, after 2 or more releases we will remove the deprecated features, and then it's a hard break. Which is a problem for end user code, snippets on SO, and all kinds of other static or not actively-maintained code.

I respectfully disagree in turn. Projects choosing to treat warnings as errors voluntarily onboard the responsibility to deal with such breaks (they're warnings and not errors for the precise reason to not cause a hard break - if a project chooses to equate them then it's up to them to deal with the consequences). In turn, I don't think it's reasonable to call it a "hard break" if the behaviour was raising deprecation warnings before, aside from the fact that the deprecation-to-removal window could easily be more than 2 releases in cases that merit it.

The SO questions are a muddier question, but that code as well is "maintained" by the community with a certain lag (aside from being clearly timestamped and thus associated with a given API epoch), and not IMO in itself a good enough reason to indefinitely forgo (sufficiently impactful) evolution.

Backward compatibility is a good thing, but as with anything, dosis sola facit venenum, and I am opposed to elevate it to the point where worthwhile changes are stunted indefinitely based on the fact-of-life that unmaintained code will bitrot - it will do that in any case, whether we try to support that illusion or not.

rgommers · 2022-03-28T18:24:51Z

Backward compatibility is a good thing, but as with anything, dosis sola facit venenum, and I am opposed to elevate it to the point where worthwhile changes are stunted indefinitely based on the fact-of-life that unmaintained code will bitrot - it will do that in any case, whether we try to support that illusion or not.

This is not something I disagree with. That said, keyword-only arguments aren't a significant enough improvement to do a very large scale deprecation. They don't add any new functionality nor do they change how idiomatic code is already written after all.

h-vetinari · 2022-03-28T21:37:39Z

That said, keyword-only arguments aren't a significant enough improvement to do a very large scale deprecation.

This I can understand.

They don't add any new functionality nor do they change how idiomatic code is already written after all.

I'd say the provide an incremental API-safety benefit (e.g. putting supposed keyword args in the wrong order positionally); I hope we can iterate our way there on a per-function or per-module basis over time.

tupui added SciPEP SciPy Enhancement Proposal deprecated Items related to behavior that has been deprecated labels Sep 10, 2021

h-vetinari mentioned this issue Mar 11, 2022

META: deprecation tracker #15765

Open

41 tasks

tupui closed this as completed Mar 25, 2022

rossbar mentioned this issue Mar 26, 2022

Use keyword-only and position arg-only function signatures where reasonable networkx/networkx#4100

Open

This was referenced May 31, 2023

DEP: deprecate positional arguments in stats.kendalltau #18578

Closed

DEP: adjust deprecation of positional arguments #18624

Merged

DOC/DEP: update documentation on deprecation process #18638

Draft

h-vetinari mentioned this issue Jun 19, 2023

DEP: deprecate positional arguments for signatures being changed in 1.15 cycle #18703

Open

1 task

h-vetinari mentioned this issue Jul 24, 2023

REF/Design: deprecate positional use of keyword parameters statsmodels/statsmodels#8960

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: keywords only arguments #14714

API: keywords only arguments #14714

tupui commented Sep 10, 2021 •

edited

Loading

tylerjereddy commented Sep 12, 2021

rgommers commented Sep 14, 2021

tupui commented Sep 22, 2021

ilayn commented Sep 22, 2021

ilayn commented Sep 22, 2021

rgommers commented Sep 22, 2021

josef-pkt commented Sep 22, 2021

tupui commented Sep 22, 2021 •

edited

Loading

josef-pkt commented Sep 22, 2021

tupui commented Sep 23, 2021 •

edited

Loading

josef-pkt commented Sep 23, 2021

tupui commented Sep 23, 2021

h-vetinari commented Mar 26, 2022

tupui commented Mar 27, 2022

h-vetinari commented Mar 28, 2022 •

edited

Loading

tupui commented Mar 28, 2022

h-vetinari commented Mar 28, 2022

tupui commented Mar 28, 2022

rgommers commented Mar 28, 2022

ev-br commented Mar 28, 2022

h-vetinari commented Mar 28, 2022

rgommers commented Mar 28, 2022

h-vetinari commented Mar 28, 2022

rgommers commented Mar 28, 2022

h-vetinari commented Mar 28, 2022

API: keywords only arguments #14714

API: keywords only arguments #14714

Comments

tupui commented Sep 10, 2021 • edited Loading

Is your feature request related to a problem? Please describe.

Describe the solution you'd like.

tylerjereddy commented Sep 12, 2021

rgommers commented Sep 14, 2021

tupui commented Sep 22, 2021

ilayn commented Sep 22, 2021

ilayn commented Sep 22, 2021

rgommers commented Sep 22, 2021

josef-pkt commented Sep 22, 2021

tupui commented Sep 22, 2021 • edited Loading

josef-pkt commented Sep 22, 2021

tupui commented Sep 23, 2021 • edited Loading

josef-pkt commented Sep 23, 2021

tupui commented Sep 23, 2021

h-vetinari commented Mar 26, 2022

tupui commented Mar 27, 2022

h-vetinari commented Mar 28, 2022 • edited Loading

tupui commented Mar 28, 2022

h-vetinari commented Mar 28, 2022

tupui commented Mar 28, 2022

rgommers commented Mar 28, 2022

ev-br commented Mar 28, 2022

h-vetinari commented Mar 28, 2022

rgommers commented Mar 28, 2022

h-vetinari commented Mar 28, 2022

rgommers commented Mar 28, 2022

h-vetinari commented Mar 28, 2022

tupui commented Sep 10, 2021 •

edited

Loading

tupui commented Sep 22, 2021 •

edited

Loading

tupui commented Sep 23, 2021 •

edited

Loading

h-vetinari commented Mar 28, 2022 •

edited

Loading