From 0b43d6a30f16a29161be4c38a05a717d305f4ed1 Mon Sep 17 00:00:00 2001 From: Stefano Borini Date: Mon, 24 Aug 2020 19:25:37 +0100 Subject: [PATCH 01/29] PEP-XXXX Added original document from PEP-0472 --- pep-XXXX.txt | 654 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 654 insertions(+) create mode 100644 pep-XXXX.txt diff --git a/pep-XXXX.txt b/pep-XXXX.txt new file mode 100644 index 00000000000..ddec13dce5e --- /dev/null +++ b/pep-XXXX.txt @@ -0,0 +1,654 @@ +PEP: 472 +Title: Support for indexing with keyword arguments +Version: $Revision$ +Last-Modified: $Date$ +Author: Stefano Borini, Joseph Martinot-Lagarde +Discussions-To: python-ideas@python.org +Status: Rejected +Type: Standards Track +Content-Type: text/x-rst +Created: 24-Jun-2014 +Python-Version: 3.6 +Post-History: 02-Jul-2014 +Resolution: https://mail.python.org/pipermail/python-dev/2019-March/156693.html + +Abstract +======== + +This PEP proposes an extension of the indexing operation to support keyword +arguments. Notations in the form ``a[K=3,R=2]`` would become legal syntax. +For future-proofing considerations, ``a[1:2, K=3, R=4]`` are considered and +may be allowed as well, depending on the choice for implementation. In addition +to a change in the parser, the index protocol (``__getitem__``, ``__setitem__`` +and ``__delitem__``) will also potentially require adaptation. + +Motivation +========== + +The indexing syntax carries a strong semantic content, differentiating it from +a method call: it implies referring to a subset of data. We believe this +semantic association to be important, and wish to expand the strategies allowed +to refer to this data. + +As a general observation, the number of indices needed by an indexing operation +depends on the dimensionality of the data: one-dimensional data (e.g. a list) +requires one index (e.g. ``a[3]``), two-dimensional data (e.g. a matrix) requires +two indices (e.g. ``a[2,3]``) and so on. Each index is a selector along one of the +axes of the dimensionality, and the position in the index tuple is the +metainformation needed to associate each index to the corresponding axis. + +The current python syntax focuses exclusively on position to express the +association to the axes, and also contains syntactic sugar to refer to +non-punctiform selection (slices) + +:: + + >>> a[3] # returns the fourth element of a + >>> a[1:10:2] # slice notation (extract a non-trivial data subset) + >>> a[3,2] # multiple indexes (for multidimensional arrays) + +The additional notation proposed in this PEP would allow notations involving +keyword arguments in the indexing operation, e.g. + +:: + + >>> a[K=3, R=2] + +which would allow to refer to axes by conventional names. + +One must additionally consider the extended form that allows both positional +and keyword specification + +:: + + >>> a[3,R=3,K=4] + +This PEP will explore different strategies to enable the use of these notations. + +Use cases +========= + +The following practical use cases present two broad categories of usage of a +keyworded specification: Indexing and contextual option. For indexing: + +1. To provide a more communicative meaning to the index, preventing e.g. accidental + inversion of indexes + + :: + + >>> gridValues[x=3, y=5, z=8] + >>> rain[time=0:12, location=location] + +2. In some domain, such as computational physics and chemistry, the use of a + notation such as ``Basis[Z=5]`` is a Domain Specific Language notation to represent + a level of accuracy + + :: + + >>> low_accuracy_energy = computeEnergy(molecule, BasisSet[Z=3]) + + In this case, the index operation would return a basis set at the chosen level + of accuracy (represented by the parameter Z). The reason behind an indexing is that + the BasisSet object could be internally represented as a numeric table, where + rows (the "coefficient" axis, hidden to the user in this example) are associated + to individual elements (e.g. row 0:5 contains coefficients for element 1, + row 5:8 coefficients for element 2) and each column is associated to a given + degree of accuracy ("accuracy" or "Z" axis) so that first column is low + accuracy, second column is medium accuracy and so on. With that indexing, + the user would obtain another object representing the contents of the column + of the internal table for accuracy level 3. + +Additionally, the keyword specification can be used as an option contextual to +the indexing. Specifically: + +1. A "default" option allows to specify a default return value when the index + is not present + + :: + + >>> lst = [1, 2, 3] + >>> value = lst[5, default=0] # value is 0 + +2. For a sparse dataset, to specify an interpolation strategy + to infer a missing point from e.g. its surrounding data. + + :: + + >>> value = array[1, 3, interpolate=spline_interpolator] + +3. A unit could be specified with the same mechanism + + :: + + >>> value = array[1, 3, unit="degrees"] + +How the notation is interpreted is up to the implementing class. + +Current implementation +====================== + +Currently, the indexing operation is handled by methods ``__getitem__``, +``__setitem__`` and ``__delitem__``. These methods' signature accept one argument +for the index (with ``__setitem__`` accepting an additional argument for the set +value). In the following, we will analyze ``__getitem__(self, idx)`` exclusively, +with the same considerations implied for the remaining two methods. + +When an indexing operation is performed, ``__getitem__(self, idx)`` is called. +Traditionally, the full content between square brackets is turned into a single +object passed to argument ``idx``: + +- When a single element is passed, e.g. ``a[2]``, ``idx`` will be ``2``. +- When multiple elements are passed, they must be separated by commas: ``a[2, 3]``. + In this case, ``idx`` will be a tuple ``(2, 3)``. With ``a[2, 3, "hello", {}]`` + ``idx`` will be ``(2, 3, "hello", {})``. +- A slicing notation e.g. ``a[2:10]`` will produce a slice object, or a tuple + containing slice objects if multiple values were passed. + +Except for its unique ability to handle slice notation, the indexing operation +has similarities to a plain method call: it acts like one when invoked with +only one element; If the number of elements is greater than one, the ``idx`` +argument behaves like a ``*args``. However, as stated in the Motivation section, +an indexing operation has the strong semantic implication of extraction of a +subset out of a larger set, which is not automatically associated to a regular +method call unless appropriate naming is chosen. Moreover, its different visual +style is important for readability. + +Specifications +============== + +The implementation should try to preserve the current signature for +``__getitem__``, or modify it in a backward-compatible way. We will present +different alternatives, taking into account the possible cases that need +to be addressed + +:: + + C0. a[1]; a[1,2] # Traditional indexing + C1. a[Z=3] + C2. a[Z=3, R=4] + C3. a[1, Z=3] + C4. a[1, Z=3, R=4] + C5. a[1, 2, Z=3] + C6. a[1, 2, Z=3, R=4] + C7. a[1, Z=3, 2, R=4] # Interposed ordering + +Strategy "Strict dictionary" +---------------------------- + +This strategy acknowledges that ``__getitem__`` is special in accepting only +one object, and the nature of that object must be non-ambiguous in its +specification of the axes: it can be either by order, or by name. As a result +of this assumption, in presence of keyword arguments, the passed entity is a +dictionary and all labels must be specified. + +:: + + C0. a[1]; a[1,2] -> idx = 1; idx = (1, 2) + C1. a[Z=3] -> idx = {"Z": 3} + C2. a[Z=3, R=4] -> idx = {"Z": 3, "R": 4} + C3. a[1, Z=3] -> raise SyntaxError + C4. a[1, Z=3, R=4] -> raise SyntaxError + C5. a[1, 2, Z=3] -> raise SyntaxError + C6. a[1, 2, Z=3, R=4] -> raise SyntaxError + C7. a[1, Z=3, 2, R=4] -> raise SyntaxError + +Pros +'''' + +- Strong conceptual similarity between the tuple case and the dictionary case. + In the first case, we are specifying a tuple, so we are naturally defining + a plain set of values separated by commas. In the second, we are specifying a + dictionary, so we are specifying a homogeneous set of key/value pairs, as + in ``dict(Z=3, R=4)``; +- Simple and easy to parse on the ``__getitem__`` side: if it gets a tuple, + determine the axes using positioning. If it gets a dictionary, use + the keywords. +- C interface does not need changes. + +Neutral +''''''' + +- Degeneracy of ``a[{"Z": 3, "R": 4}]`` with ``a[Z=3, R=4]`` means the notation + is syntactic sugar. + +Cons +'''' + +- Very strict. +- Destroys ordering of the passed arguments. Preserving the + order would be possible with an OrderedDict as drafted by PEP-468 [#PEP-468]_. +- Does not allow use cases with mixed positional/keyword arguments such as + ``a[1, 2, default=5]``. + +Strategy "mixed dictionary" +--------------------------- + +This strategy relaxes the above constraint to return a dictionary containing +both numbers and strings as keys. + +:: + + C0. a[1]; a[1,2] -> idx = 1; idx = (1, 2) + C1. a[Z=3] -> idx = {"Z": 3} + C2. a[Z=3, R=4] -> idx = {"Z": 3, "R": 4} + C3. a[1, Z=3] -> idx = { 0: 1, "Z": 3} + C4. a[1, Z=3, R=4] -> idx = { 0: 1, "Z": 3, "R": 4} + C5. a[1, 2, Z=3] -> idx = { 0: 1, 1: 2, "Z": 3} + C6. a[1, 2, Z=3, R=4] -> idx = { 0: 1, 1: 2, "Z": 3, "R": 4} + C7. a[1, Z=3, 2, R=4] -> idx = { 0: 1, "Z": 3, 2: 2, "R": 4} + +Pros +'''' +- Opens for mixed cases. + +Cons +'''' +- Destroys ordering information for string keys. We have no way of saying if + ``"Z"`` in C7 was in position 1 or 3. +- Implies switching from a tuple to a dict as soon as one specified index + has a keyword argument. May be confusing to parse. + +Strategy "named tuple" +----------------------- + +Return a named tuple for ``idx`` instead of a tuple. Keyword arguments would +obviously have their stated name as key, and positional argument would have an +underscore followed by their order: + +:: + + C0. a[1]; a[1,2] -> idx = 1; idx = (_0=1, _1=2) + C1. a[Z=3] -> idx = (Z=3) + C2. a[Z=3, R=2] -> idx = (Z=3, R=2) + C3. a[1, Z=3] -> idx = (_0=1, Z=3) + C4. a[1, Z=3, R=2] -> idx = (_0=1, Z=3, R=2) + C5. a[1, 2, Z=3] -> idx = (_0=1, _2=2, Z=3) + C6. a[1, 2, Z=3, R=4] -> (_0=1, _1=2, Z=3, R=4) + C7. a[1, Z=3, 2, R=4] -> (_0=1, Z=3, _1=2, R=4) + or (_0=1, Z=3, _2=2, R=4) + or raise SyntaxError + +The required typename of the namedtuple could be ``Index`` or the name of the +argument in the function definition, it keeps the ordering and is easy to +analyse by using the ``_fields`` attribute. It is backward compatible, provided +that C0 with more than one entry now passes a namedtuple instead of a plain +tuple. + +Pros +'''' +- Looks nice. namedtuple transparently replaces tuple and gracefully + degrades to the old behavior. +- Does not require a change in the C interface + +Cons +'''' +- According to some sources [#namedtuple]_ namedtuple is not well developed. + To include it as such important object would probably require rework + and improvement; +- The namedtuple fields, and thus the type, will have to change according + to the passed arguments. This can be a performance bottleneck, and makes + it impossible to guarantee that two subsequent index accesses get the same + Index class; +- the ``_n`` "magic" fields are a bit unusual, but ipython already uses them + for result history. +- Python currently has no builtin namedtuple. The current one is available + in the "collections" module in the standard library. +- Differently from a function, the two notations ``gridValues[x=3, y=5, z=8]`` + and ``gridValues[3,5,8]`` would not gracefully match if the order is modified + at call time (e.g. we ask for ``gridValues[y=5, z=8, x=3])``. In a function, + we can pre-define argument names so that keyword arguments are properly + matched. Not so in ``__getitem__``, leaving the task for interpreting and + matching to ``__getitem__`` itself. + + +Strategy "New argument contents" +-------------------------------- + +In the current implementation, when many arguments are passed to ``__getitem__``, +they are grouped in a tuple and this tuple is passed to ``__getitem__`` as the +single argument ``idx``. This strategy keeps the current signature, but expands the +range of variability in type and contents of ``idx`` to more complex representations. + +We identify four possible ways to implement this strategy: + +- **P1**: uses a single dictionary for the keyword arguments. +- **P2**: uses individual single-item dictionaries. +- **P3**: similar to **P2**, but replaces single-item dictionaries with a ``(key, value)`` tuple. +- **P4**: similar to **P2**, but uses a special and additional new object: ``keyword()`` + +Some of these possibilities lead to degenerate notations, i.e. indistinguishable +from an already possible representation. Once again, the proposed notation +becomes syntactic sugar for these representations. + +Under this strategy, the old behavior for C0 is unchanged. + +:: + + C0: a[1] -> idx = 1 # integer + a[1,2] -> idx = (1,2) # tuple + +In C1, we can use either a dictionary or a tuple to represent key and value pair +for the specific indexing entry. We need to have a tuple with a tuple in C1 +because otherwise we cannot differentiate ``a["Z", 3]`` from ``a[Z=3]``. + +:: + + C1: a[Z=3] -> idx = {"Z": 3} # P1/P2 dictionary with single key + or idx = (("Z", 3),) # P3 tuple of tuples + or idx = keyword("Z", 3) # P4 keyword object + +As you can see, notation P1/P2 implies that ``a[Z=3]`` and ``a[{"Z": 3}]`` will +call ``__getitem__`` passing the exact same value, and is therefore syntactic +sugar for the latter. Same situation occurs, although with different index, for +P3. Using a keyword object as in P4 would remove this degeneracy. + +For the C2 case: + +:: + + C2. a[Z=3, R=4] -> idx = {"Z": 3, "R": 4} # P1 dictionary/ordereddict + or idx = ({"Z": 3}, {"R": 4}) # P2 tuple of two single-key dict + or idx = (("Z", 3), ("R", 4)) # P3 tuple of tuples + or idx = (keyword("Z", 3), + keyword("R", 4) ) # P4 keyword objects + + +P1 naturally maps to the traditional ``**kwargs`` behavior, however it breaks +the convention that two or more entries for the index produce a tuple. P2 +preserves this behavior, and additionally preserves the order. Preserving the +order would also be possible with an OrderedDict as drafted by PEP-468 [#PEP-468]_. + +The remaining cases are here shown: + +:: + + C3. a[1, Z=3] -> idx = (1, {"Z": 3}) # P1/P2 + or idx = (1, ("Z", 3)) # P3 + or idx = (1, keyword("Z", 3)) # P4 + + C4. a[1, Z=3, R=4] -> idx = (1, {"Z": 3, "R": 4}) # P1 + or idx = (1, {"Z": 3}, {"R": 4}) # P2 + or idx = (1, ("Z", 3), ("R", 4)) # P3 + or idx = (1, keyword("Z", 3), + keyword("R", 4)) # P4 + + C5. a[1, 2, Z=3] -> idx = (1, 2, {"Z": 3}) # P1/P2 + or idx = (1, 2, ("Z", 3)) # P3 + or idx = (1, 2, keyword("Z", 3)) # P4 + + C6. a[1, 2, Z=3, R=4] -> idx = (1, 2, {"Z":3, "R": 4}) # P1 + or idx = (1, 2, {"Z": 3}, {"R": 4}) # P2 + or idx = (1, 2, ("Z", 3), ("R", 4)) # P3 + or idx = (1, 2, keyword("Z", 3), + keyword("R", 4)) # P4 + + C7. a[1, Z=3, 2, R=4] -> idx = (1, 2, {"Z": 3, "R": 4}) # P1. Pack the keyword arguments. Ugly. + or raise SyntaxError # P1. Same behavior as in function calls. + or idx = (1, {"Z": 3}, 2, {"R": 4}) # P2 + or idx = (1, ("Z", 3), 2, ("R", 4)) # P3 + or idx = (1, keyword("Z", 3), + 2, keyword("R", 4)) # P4 + +Pros +'''' +- Signature is unchanged; +- P2/P3 can preserve ordering of keyword arguments as specified at indexing, +- P1 needs an OrderedDict, but would destroy interposed ordering if allowed: + all keyword indexes would be dumped into the dictionary; +- Stays within traditional types: tuples and dicts. Evt. OrderedDict; +- Some proposed strategies are similar in behavior to a traditional function call; +- The C interface for ``PyObject_GetItem`` and family would remain unchanged. + +Cons +'''' +- Apparently complex and wasteful; +- Degeneracy in notation (e.g. ``a[Z=3]`` and ``a[{"Z":3}]`` are equivalent and + indistinguishable notations at the ``__[get|set|del]item__`` level). + This behavior may or may not be acceptable. +- for P4, an additional object similar in nature to slice() is needed, + but only to disambiguate the above degeneracy. +- ``idx`` type and layout seems to change depending on the whims of the caller; +- May be complex to parse what is passed, especially in the case of tuple of tuples; +- P2 Creates a lot of single keys dictionary as members of a tuple. Looks ugly. + P3 would be lighter and easier to use than the tuple of dicts, and still + preserves order (unlike the regular dict), but would result in clumsy + extraction of keywords. + +Strategy "kwargs argument" +--------------------------- + +``__getitem__`` accepts an optional ``**kwargs`` argument which should be keyword only. +``idx`` also becomes optional to support a case where no non-keyword arguments are allowed. +The signature would then be either + +:: + + __getitem__(self, idx) + __getitem__(self, idx, **kwargs) + __getitem__(self, **kwargs) + +Applied to our cases would produce: + +:: + + C0. a[1,2] -> idx=(1,2); kwargs={} + C1. a[Z=3] -> idx=None ; kwargs={"Z":3} + C2. a[Z=3, R=4] -> idx=None ; kwargs={"Z":3, "R":4} + C3. a[1, Z=3] -> idx=1 ; kwargs={"Z":3} + C4. a[1, Z=3, R=4] -> idx=1 ; kwargs={"Z":3, "R":4} + C5. a[1, 2, Z=3] -> idx=(1,2); kwargs={"Z":3} + C6. a[1, 2, Z=3, R=4] -> idx=(1,2); kwargs={"Z":3, "R":4} + C7. a[1, Z=3, 2, R=4] -> raise SyntaxError # in agreement to function behavior + +Empty indexing ``a[]`` of course remains invalid syntax. + +Pros +'''' +- Similar to function call, evolves naturally from it; +- Use of keyword indexing with an object whose ``__getitem__`` + doesn't have a kwargs will fail in an obvious way. + That's not the case for the other strategies. + +Cons +'''' +- It doesn't preserve order, unless an OrderedDict is used; +- Forbids C7, but is it really needed? +- Requires a change in the C interface to pass an additional + PyObject for the keyword arguments. + + +C interface +=========== + +As briefly introduced in the previous analysis, the C interface would +potentially have to change to allow the new feature. Specifically, +``PyObject_GetItem`` and related routines would have to accept an additional +``PyObject *kw`` argument for Strategy "kwargs argument". The remaining +strategies would not require a change in the C function signatures, but the +different nature of the passed object would potentially require adaptation. + +Strategy "named tuple" would behave correctly without any change: the class +returned by the factory method in collections returns a subclass of tuple, +meaning that ``PyTuple_*`` functions can handle the resulting object. + +Alternative Solutions +===================== + +In this section, we present alternative solutions that would workaround the +missing feature and make the proposed enhancement not worth of implementation. + +Use a method +------------ + +One could keep the indexing as is, and use a traditional ``get()`` method for those +cases where basic indexing is not enough. This is a good point, but as already +reported in the introduction, methods have a different semantic weight from +indexing, and you can't use slices directly in methods. Compare e.g. +``a[1:3, Z=2]`` with ``a.get(slice(1,3), Z=2)``. + +The authors however recognize this argument as compelling, and the advantage +in semantic expressivity of a keyword-based indexing may be offset by a rarely +used feature that does not bring enough benefit and may have limited adoption. + +Emulate requested behavior by abusing the slice object +------------------------------------------------------ + +This extremely creative method exploits the slice objects' behavior, provided +that one accepts to use strings (or instantiate properly named placeholder +objects for the keys), and accept to use ":" instead of "=". + +:: + + >>> a["K":3] + slice('K', 3, None) + >>> a["K":3, "R":4] + (slice('K', 3, None), slice('R', 4, None)) + >>> + +While clearly smart, this approach does not allow easy inquire of the key/value +pair, it's too clever and esotheric, and does not allow to pass a slice as in +``a[K=1:10:2]``. + +However, Tim Delaney comments + + "I really do think that ``a[b=c, d=e]`` should just be syntax sugar for + ``a['b':c, 'd':e]``. It's simple to explain, and gives the greatest backwards + compatibility. In particular, libraries that already abused slices in this + way will just continue to work with the new syntax." + +We think this behavior would produce inconvenient results. The library Pandas uses +strings as labels, allowing notation such as + +:: + + >>> a[:, "A":"F"] + +to extract data from column "A" to column "F". Under the above comment, this notation +would be equally obtained with + +:: + + >>> a[:, A="F"] + +which is weird and collides with the intended meaning of keyword in indexing, that +is, specifying the axis through conventional names rather than positioning. + +Pass a dictionary as an additional index +---------------------------------------- + +:: + + >>> a[1, 2, {"K": 3}] + +this notation, although less elegant, can already be used and achieves similar +results. It's evident that the proposed Strategy "New argument contents" can be +interpreted as syntactic sugar for this notation. + +Additional Comments +=================== + +Commenters also expressed the following relevant points: + +Relevance of ordering of keyword arguments +------------------------------------------ + +As part of the discussion of this PEP, it's important to decide if the ordering +information of the keyword arguments is important, and if indexes and keys can +be ordered in an arbitrary way (e.g. ``a[1,Z=3,2,R=4]``). PEP-468 [#PEP-468]_ +tries to address the first point by proposing the use of an ordereddict, +however one would be inclined to accept that keyword arguments in indexing are +equivalent to kwargs in function calls, and therefore as of today equally +unordered, and with the same restrictions. + +Need for homogeneity of behavior +-------------------------------- + +Relative to Strategy "New argument contents", a comment from Ian Cordasco +points out that + + "it would be unreasonable for just one method to behave totally + differently from the standard behaviour in Python. It would be confusing for + only ``__getitem__`` (and ostensibly, ``__setitem__``) to take keyword + arguments but instead of turning them into a dictionary, turn them into + individual single-item dictionaries." We agree with his point, however it must + be pointed out that ``__getitem__`` is already special in some regards when it + comes to passed arguments. + +Chris Angelico also states: + + "it seems very odd to start out by saying "here, let's give indexing the + option to carry keyword args, just like with function calls", and then come + back and say "oh, but unlike function calls, they're inherently ordered and + carried very differently"." Again, we agree on this point. The most + straightforward strategy to keep homogeneity would be Strategy "kwargs + argument", opening to a ``**kwargs`` argument on ``__getitem__``. + +One of the authors (Stefano Borini) thinks that only the "strict dictionary" +strategy is worth of implementation. It is non-ambiguous, simple, does not +force complex parsing, and addresses the problem of referring to axes either +by position or by name. The "options" use case is probably best handled with +a different approach, and may be irrelevant for this PEP. The alternative +"named tuple" is another valid choice. + +Having .get() become obsolete for indexing with default fallback +---------------------------------------------------------------- + +Introducing a "default" keyword could make ``dict.get()`` obsolete, which would be +replaced by ``d["key", default=3]``. Chris Angelico however states: + + "Currently, you need to write ``__getitem__`` (which raises an exception on + finding a problem) plus something else, e.g. ``get()``, which returns a default + instead. By your proposal, both branches would go inside ``__getitem__``, which + means they could share code; but there still need to be two branches." + +Additionally, Chris continues: + + "There'll be an ad-hoc and fairly arbitrary puddle of names (some will go + ``default=``, others will say that's way too long and go ``def=``, except that + that's a keyword so they'll use ``dflt=`` or something...), unless there's a + strong force pushing people to one consistent name.". + +This argument is valid but it's equally valid for any function call, and is +generally fixed by established convention and documentation. + +On degeneracy of notation +------------------------- + +User Drekin commented: "The case of ``a[Z=3]`` and ``a[{"Z": 3}]`` is similar to +current ``a[1, 2]`` and ``a[(1, 2)]``. Even though one may argue that the parentheses +are actually not part of tuple notation but are just needed because of syntax, +it may look as degeneracy of notation when compared to function call: ``f(1, 2)`` +is not the same thing as ``f((1, 2))``.". + +References +========== + +.. [#keyword-1] "keyword-only args in __getitem__" + (http://article.gmane.org/gmane.comp.python.ideas/27584) + +.. [#keyword-2] "Accepting keyword arguments for __getitem__" + (https://mail.python.org/pipermail/python-ideas/2014-June/028164.html) + +.. [#keyword-3] "PEP pre-draft: Support for indexing with keyword arguments" + https://mail.python.org/pipermail/python-ideas/2014-July/028250.html + +.. [#namedtuple] "namedtuple is not as good as it should be" + (https://mail.python.org/pipermail/python-ideas/2013-June/021257.html) + +.. [#PEP-468] "Preserving the order of \*\*kwargs in a function." + http://legacy.python.org/dev/peps/pep-0468/ + +Copyright +========= + +This document has been placed in the public domain. + + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: From 0974b036af0af2aa85170338bcdc5f2895f769be Mon Sep 17 00:00:00 2001 From: Stefano Borini Date: Mon, 24 Aug 2020 19:38:09 +0100 Subject: [PATCH 02/29] Removed metadata no longer relevant --- pep-XXXX.txt | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/pep-XXXX.txt b/pep-XXXX.txt index ddec13dce5e..e7a27a70066 100644 --- a/pep-XXXX.txt +++ b/pep-XXXX.txt @@ -1,16 +1,16 @@ -PEP: 472 +PEP: XXXX Title: Support for indexing with keyword arguments Version: $Revision$ Last-Modified: $Date$ -Author: Stefano Borini, Joseph Martinot-Lagarde +Author: Stefano Borini, Jonathan Fine Discussions-To: python-ideas@python.org -Status: Rejected +Status: Type: Standards Track Content-Type: text/x-rst -Created: 24-Jun-2014 -Python-Version: 3.6 -Post-History: 02-Jul-2014 -Resolution: https://mail.python.org/pipermail/python-dev/2019-March/156693.html +Created: 24-Aug-2020 +Python-Version: 3.10 +Post-History: +Resolution: Abstract ======== From df56201a0173ab85c37b60a00ef1296974a628eb Mon Sep 17 00:00:00 2001 From: Stefano Borini Date: Mon, 24 Aug 2020 22:39:41 +0100 Subject: [PATCH 03/29] Analysed the mails from 4 oct 2019 --- pep-XXXX.txt | 142 ++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 136 insertions(+), 6 deletions(-) diff --git a/pep-XXXX.txt b/pep-XXXX.txt index e7a27a70066..ca001925619 100644 --- a/pep-XXXX.txt +++ b/pep-XXXX.txt @@ -15,12 +15,142 @@ Resolution: Abstract ======== -This PEP proposes an extension of the indexing operation to support keyword -arguments. Notations in the form ``a[K=3,R=2]`` would become legal syntax. -For future-proofing considerations, ``a[1:2, K=3, R=4]`` are considered and -may be allowed as well, depending on the choice for implementation. In addition -to a change in the parser, the index protocol (``__getitem__``, ``__setitem__`` -and ``__delitem__``) will also potentially require adaptation. +This PEP is a rework of PEP-0472, where an extension of the indexing operation +to support keyword arguments was discussed. + +This PEP would allow for keyword-like arguments to be accepted during indexing +operations. Notations in the form ``a[K=3, R=2]`` would become legal syntax. +A final strategy will be proposed in terms of semantics and implementation. + +Background +========== + +PEP-0472 was opened in 2014. The PEP focused on various use cases and was extracted +from a broad discussion on implementation strategies. The PEP was eventually rejected +in 2019 [rejection] mostly due to lack of interest despite its 5 years of existence. + +However, with the introduction of type hints [pep-0484] the square bracket notation has +been used consistently to enrich the typing annotations, e.g. to specify a list of integers +as Sequence[int]. As a result, a renewed interest in a more flexible syntax that would allow +for named information has been expressed in many different threads. + +During the investigation of PEP-0472, many different strategies have been proposed to +expand the language, but no real consensus was reached. Many corner cases have +been examined more closely and felt awkward, backward incompatible or both. + +In the words of D'Aprano + + To me, "()" says "arbitrary function call potentially with side- + effects". "[]" says "lookup". + + +Current status +============== + +Before attacking the problem of adding keyword arguments to the indexing +notation, it is relevant to analyse how the indexing notation works today, +in which contexts, and how it is different from a function call. + +The first critical difference of the indexing notation compared to a function +is that indexing can be used for both getting and setting operations: +in python, a function cannot be on the left hand side of an assignment. In other words, +both of these are valid + + x = a[1, 2] + a[1, 2] = 5 + +but only the first one of these is valid + + x = f(1, 2) + f(1, 2) = 5 # invalid + +This asymmetry is important to understand that there is a natural imbalance +between the two forms, and therefore it is not a given that the two should +behave transparently and symmetrically. + +The second critical difference is that functions have names assigned to their +arguments, unless the passed parameters are captured with *args, in which case +they end up as entries in the args tuple. In other words, functions already +have anonymous argument semantic, exactly like the indexing operation, and +already collect the passed arguments in a tuple, although with a different +syntax: __getitem__ and __setitem__ always receive a tuple. + +The third critical difference is that the indexing operation knows how to convert +colons to slices. This is valid + + a[1:3] + +this one isn't + + f(1:3) + + +Use cases +--------- + +Pandas currently uses a notation such as + + db[db['x'] == 1] + +which could be replaced with db[x=1]. + +xarray has named dimensions. +Positional indexing really doesn't make all that much sense with xarray, being able to use keyword-based indexing would drastically simplify things. +Here is an example modified from the xarray documentation, where you want to assign to a subset of your array: + +da.isel(space=0, time=slice(None, 2))[...] = spam + +With this syntax this could be changed to: + +da[space=0, time=:2] = spam + + +Hard points +----------- + +* Invoking indexing _must_ accept some object. E.g. a[] is still syntax error. + +* We want to be able to mix single values and named indexes. + + +I don't see what's confusing. All that's needed is for the slice syntax ['colon operator', if you like] to have higher precedence than the keyword syntax, as it already has higher precedence than the comma. + +As I said originally, I'm +0 on the whole feature but I think weird restrictions like "slice syntax only works for positional arguments" or "can't have both positional and keyword args" will be surprising to most people. + +* No walrus possible. + +Questions to address +-------------------- + +> If only keyword arguments are passed what happens to the positional index? Is it the empty tuple? + +> Currently subscript with no index (`dict()[]`) is a syntax error should it continue to be? + + +Ideas +----- +x[foo=:2] would be equivalent to x[{'foo': slice(None, 2)}] +da[space=0, time=:2] + +I read it as a slice: + + da[ slice( (space=0, time=), 2, None) ] + +and thought "That must be a typo, because the time keyword +doesn't have a value." + + +References +========== + +.. [#rejection] "Rejection of PEP-0472" + (https://mail.python.org/pipermail/python-dev/2019-March/156693.html) +.. [#pep-0484] "PEP-0484 -- Type hints" + (https://www.python.org/dev/peps/pep-0484) +.. [#request-1] "Allow kwargs in __{get|set|del}item__" + (https://mail.python.org/archives/list/python-ideas@python.org/thread/EUGDRTRFIY36K4RM3QRR52CKCI7MIR2M/) +-------------------------------------------------- + Motivation ========== From 683ed105d3401d3bab77de4a6d767329fb58cb45 Mon Sep 17 00:00:00 2001 From: Stefano Borini Date: Tue, 1 Sep 2020 07:53:04 +0100 Subject: [PATCH 04/29] Moved pep-XXXX to pep-9999 as from pep-1 --- pep-XXXX.txt => pep-9999.txt | 31 ++++++++++++++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-) rename pep-XXXX.txt => pep-9999.txt (96%) diff --git a/pep-XXXX.txt b/pep-9999.txt similarity index 96% rename from pep-XXXX.txt rename to pep-9999.txt index ca001925619..ee191806317 100644 --- a/pep-XXXX.txt +++ b/pep-9999.txt @@ -126,7 +126,6 @@ Questions to address > Currently subscript with no index (`dict()[]`) is a syntax error should it continue to be? - Ideas ----- x[foo=:2] would be equivalent to x[{'foo': slice(None, 2)}] @@ -140,6 +139,8 @@ and thought "That must be a typo, because the time keyword doesn't have a value." + + References ========== @@ -149,8 +150,36 @@ References (https://www.python.org/dev/peps/pep-0484) .. [#request-1] "Allow kwargs in __{get|set|del}item__" (https://mail.python.org/archives/list/python-ideas@python.org/thread/EUGDRTRFIY36K4RM3QRR52CKCI7MIR2M/) + + -------------------------------------------------- +I'm really interested in this and it's very helpful that you included two versions of the proposed API for people to try out: the jfine version (with the kw object) and the stevedaprano version (with kwd arguments passed directly to the dunder methods). + +My point is that all existing `__getitem__` implementations will raise errors +if any keywords are given, even if the keyword happens to correspond to the +name of the argument (say, `index`). This is to counter Chris B's concern that +if an existing `__getitem__` implementation didn't use the '/' notation to +indicate that `self` and `index` are positional, it would have a bug. I claim +that the bug will *only* be present when someone adds keyword support to their +`__getitem__` method without using the '/'. Since that's not existing code, +this "proves" that adding this feature would not introduce a subtle bug in a +lot of existing code -- only in carelessly written new (or updated) code. + + + + + + + + + + + + + + + Motivation ========== From 3387fcc8ca07d887904e15d3a11d4bc2fb09e816 Mon Sep 17 00:00:00 2001 From: Stefano Borini Date: Tue, 1 Sep 2020 08:19:54 +0100 Subject: [PATCH 05/29] First draft with no implementation --- pep-9999.txt | 795 +++++++-------------------------------------------- 1 file changed, 97 insertions(+), 698 deletions(-) diff --git a/pep-9999.txt b/pep-9999.txt index ee191806317..af47d2df352 100644 --- a/pep-9999.txt +++ b/pep-9999.txt @@ -1,10 +1,11 @@ -PEP: XXXX +PEP: 9999 Title: Support for indexing with keyword arguments Version: $Revision$ Last-Modified: $Date$ Author: Stefano Borini, Jonathan Fine +Sponsor: Discussions-To: python-ideas@python.org -Status: +Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 24-Aug-2020 @@ -15,189 +16,42 @@ Resolution: Abstract ======== -This PEP is a rework of PEP-0472, where an extension of the indexing operation -to support keyword arguments was discussed. +This PEP proposed extending python to allow keyword-like arguments to be +accepted during indexing operations. Notations in the form ``a[K=3, R=2]`` +would become legal syntax. A final strategy will be proposed in terms of +semantics and implementation. -This PEP would allow for keyword-like arguments to be accepted during indexing -operations. Notations in the form ``a[K=3, R=2]`` would become legal syntax. -A final strategy will be proposed in terms of semantics and implementation. +This PEP is a rework and expansion of PEP-0472, where an extension of the +indexing operation to support keyword arguments was analysed. PEP-0472 was +Rejected due to apparent lack of interest back in 2019. However, renewed +interest has prompted a re-analysis and therefore this PEP. Background ========== PEP-0472 was opened in 2014. The PEP focused on various use cases and was extracted from a broad discussion on implementation strategies. The PEP was eventually rejected -in 2019 [rejection] mostly due to lack of interest despite its 5 years of existence. +in 2019 [#rejection]_ mostly due to lack of interest despite its 5 years of existence. -However, with the introduction of type hints [pep-0484] the square bracket notation has -been used consistently to enrich the typing annotations, e.g. to specify a list of integers -as Sequence[int]. As a result, a renewed interest in a more flexible syntax that would allow -for named information has been expressed in many different threads. +However, with the introduction of type hints in PEP-0484 [#pep-0484]_ the +square bracket notation has been used consistently to enrich the typing +annotations, e.g. to specify a list of integers as Sequence[int]. As a result, +a renewed interest in a more flexible syntax that would allow for named +information has been expressed in many different threads. -During the investigation of PEP-0472, many different strategies have been proposed to -expand the language, but no real consensus was reached. Many corner cases have -been examined more closely and felt awkward, backward incompatible or both. +During the investigation of PEP-0472, many different strategies have been +proposed to expand the language, but no real consensus was reached. Many corner +cases have been examined more closely and felt awkward, backward incompatible +or both. Renewed interest was prompted by Caleb Donovick [#request-1]_ in 2019 +and Andras Tantos [#request-2]_ in 2020. These requests prompted a strong activity +on the python-ideas mailing list, where various options have been discussed and +a general consensus has been reached. -In the words of D'Aprano - - To me, "()" says "arbitrary function call potentially with side- - effects". "[]" says "lookup". - - -Current status -============== - -Before attacking the problem of adding keyword arguments to the indexing -notation, it is relevant to analyse how the indexing notation works today, -in which contexts, and how it is different from a function call. - -The first critical difference of the indexing notation compared to a function -is that indexing can be used for both getting and setting operations: -in python, a function cannot be on the left hand side of an assignment. In other words, -both of these are valid - - x = a[1, 2] - a[1, 2] = 5 - -but only the first one of these is valid - - x = f(1, 2) - f(1, 2) = 5 # invalid - -This asymmetry is important to understand that there is a natural imbalance -between the two forms, and therefore it is not a given that the two should -behave transparently and symmetrically. - -The second critical difference is that functions have names assigned to their -arguments, unless the passed parameters are captured with *args, in which case -they end up as entries in the args tuple. In other words, functions already -have anonymous argument semantic, exactly like the indexing operation, and -already collect the passed arguments in a tuple, although with a different -syntax: __getitem__ and __setitem__ always receive a tuple. - -The third critical difference is that the indexing operation knows how to convert -colons to slices. This is valid - - a[1:3] - -this one isn't - - f(1:3) - - -Use cases ---------- - -Pandas currently uses a notation such as - - db[db['x'] == 1] - -which could be replaced with db[x=1]. - -xarray has named dimensions. -Positional indexing really doesn't make all that much sense with xarray, being able to use keyword-based indexing would drastically simplify things. -Here is an example modified from the xarray documentation, where you want to assign to a subset of your array: - -da.isel(space=0, time=slice(None, 2))[...] = spam - -With this syntax this could be changed to: - -da[space=0, time=:2] = spam - - -Hard points ------------ - -* Invoking indexing _must_ accept some object. E.g. a[] is still syntax error. - -* We want to be able to mix single values and named indexes. - - -I don't see what's confusing. All that's needed is for the slice syntax ['colon operator', if you like] to have higher precedence than the keyword syntax, as it already has higher precedence than the comma. - -As I said originally, I'm +0 on the whole feature but I think weird restrictions like "slice syntax only works for positional arguments" or "can't have both positional and keyword args" will be surprising to most people. - -* No walrus possible. - -Questions to address --------------------- - -> If only keyword arguments are passed what happens to the positional index? Is it the empty tuple? - -> Currently subscript with no index (`dict()[]`) is a syntax error should it continue to be? - -Ideas ------ -x[foo=:2] would be equivalent to x[{'foo': slice(None, 2)}] -da[space=0, time=:2] - -I read it as a slice: - - da[ slice( (space=0, time=), 2, None) ] - -and thought "That must be a typo, because the time keyword -doesn't have a value." - - - - -References -========== - -.. [#rejection] "Rejection of PEP-0472" - (https://mail.python.org/pipermail/python-dev/2019-March/156693.html) -.. [#pep-0484] "PEP-0484 -- Type hints" - (https://www.python.org/dev/peps/pep-0484) -.. [#request-1] "Allow kwargs in __{get|set|del}item__" - (https://mail.python.org/archives/list/python-ideas@python.org/thread/EUGDRTRFIY36K4RM3QRR52CKCI7MIR2M/) - - --------------------------------------------------- - -I'm really interested in this and it's very helpful that you included two versions of the proposed API for people to try out: the jfine version (with the kw object) and the stevedaprano version (with kwd arguments passed directly to the dunder methods). - -My point is that all existing `__getitem__` implementations will raise errors -if any keywords are given, even if the keyword happens to correspond to the -name of the argument (say, `index`). This is to counter Chris B's concern that -if an existing `__getitem__` implementation didn't use the '/' notation to -indicate that `self` and `index` are positional, it would have a bug. I claim -that the bug will *only* be present when someone adds keyword support to their -`__getitem__` method without using the '/'. Since that's not existing code, -this "proves" that adding this feature would not introduce a subtle bug in a -lot of existing code -- only in carelessly written new (or updated) code. - - - - - - - - - - - - - - - - -Motivation -========== - -The indexing syntax carries a strong semantic content, differentiating it from -a method call: it implies referring to a subset of data. We believe this -semantic association to be important, and wish to expand the strategies allowed -to refer to this data. - -As a general observation, the number of indices needed by an indexing operation -depends on the dimensionality of the data: one-dimensional data (e.g. a list) -requires one index (e.g. ``a[3]``), two-dimensional data (e.g. a matrix) requires -two indices (e.g. ``a[2,3]``) and so on. Each index is a selector along one of the -axes of the dimensionality, and the position in the index tuple is the -metainformation needed to associate each index to the corresponding axis. +Motivation and Use cases +======================== The current python syntax focuses exclusively on position to express the -association to the axes, and also contains syntactic sugar to refer to +index, nd also contains syntactic sugar to refer to non-punctiform selection (slices) :: @@ -213,19 +67,17 @@ keyword arguments in the indexing operation, e.g. >>> a[K=3, R=2] -which would allow to refer to axes by conventional names. +which would allow a more flexible way to indicise content. One must additionally consider the extended form that allows both positional and keyword specification :: - >>> a[3,R=3,K=4] - -This PEP will explore different strategies to enable the use of these notations. + >>> a[3, R=3, K=4] Use cases -========= +--------- The following practical use cases present two broad categories of usage of a keyworded specification: Indexing and contextual option. For indexing: @@ -238,564 +90,111 @@ keyworded specification: Indexing and contextual option. For indexing: >>> gridValues[x=3, y=5, z=8] >>> rain[time=0:12, location=location] -2. In some domain, such as computational physics and chemistry, the use of a - notation such as ``Basis[Z=5]`` is a Domain Specific Language notation to represent - a level of accuracy +2. To enrich the typing notation with keywords, especially during the use of generics - :: + :: - >>> low_accuracy_energy = computeEnergy(molecule, BasisSet[Z=3]) - - In this case, the index operation would return a basis set at the chosen level - of accuracy (represented by the parameter Z). The reason behind an indexing is that - the BasisSet object could be internally represented as a numeric table, where - rows (the "coefficient" axis, hidden to the user in this example) are associated - to individual elements (e.g. row 0:5 contains coefficients for element 1, - row 5:8 coefficients for element 2) and each column is associated to a given - degree of accuracy ("accuracy" or "Z" axis) so that first column is low - accuracy, second column is medium accuracy and so on. With that indexing, - the user would obtain another object representing the contents of the column - of the internal table for accuracy level 3. + def function(value: MyType[T=int]): -Additionally, the keyword specification can be used as an option contextual to -the indexing. Specifically: -1. A "default" option allows to specify a default return value when the index - is not present +3. In some domain, such as computational physics and chemistry, the use of a + notation such as ``Basis[Z=5]`` is a Domain Specific Language notation to represent + a level of accuracy :: - >>> lst = [1, 2, 3] - >>> value = lst[5, default=0] # value is 0 + >>> low_accuracy_energy = computeEnergy(molecule, BasisSet[Z=3]) -2. For a sparse dataset, to specify an interpolation strategy - to infer a missing point from e.g. its surrounding data. +4. Pandas currently uses a notation such as :: + + >>> db[db['x'] == 1] - >>> value = array[1, 3, interpolate=spline_interpolator] - -3. A unit could be specified with the same mechanism - - :: + which could be replaced with db[x=1]. - >>> value = array[1, 3, unit="degrees"] +5. xarray has named dimensions. Currently these are handled with functions .isel: -How the notation is interpreted is up to the implementing class. + :: + + >>> data.isel(row=10) # Returns the tenth row -Current implementation -====================== + which could also be replaced with data[row=10]. A more complex example: -Currently, the indexing operation is handled by methods ``__getitem__``, -``__setitem__`` and ``__delitem__``. These methods' signature accept one argument -for the index (with ``__setitem__`` accepting an additional argument for the set -value). In the following, we will analyze ``__getitem__(self, idx)`` exclusively, -with the same considerations implied for the remaining two methods. + :: -When an indexing operation is performed, ``__getitem__(self, idx)`` is called. -Traditionally, the full content between square brackets is turned into a single -object passed to argument ``idx``: + >>> da.isel(space=0, time=slice(None, 2))[...] = spam + >>> da[space=0, time=:2] = spam -- When a single element is passed, e.g. ``a[2]``, ``idx`` will be ``2``. -- When multiple elements are passed, they must be separated by commas: ``a[2, 3]``. - In this case, ``idx`` will be a tuple ``(2, 3)``. With ``a[2, 3, "hello", {}]`` - ``idx`` will be ``(2, 3, "hello", {})``. -- A slicing notation e.g. ``a[2:10]`` will produce a slice object, or a tuple - containing slice objects if multiple values were passed. -Except for its unique ability to handle slice notation, the indexing operation -has similarities to a plain method call: it acts like one when invoked with -only one element; If the number of elements is greater than one, the ``idx`` -argument behaves like a ``*args``. However, as stated in the Motivation section, -an indexing operation has the strong semantic implication of extraction of a -subset out of a larger set, which is not automatically associated to a regular -method call unless appropriate naming is chosen. Moreover, its different visual -style is important for readability. +It is important to note that How the notation is interpreted is up to the +implementation. This PEP only defines and dictates the behavior of python +regarding passed keyword arguments. Not how these arguments should be +interpreted and used by the implementing class. -Specifications +Current status ============== -The implementation should try to preserve the current signature for -``__getitem__``, or modify it in a backward-compatible way. We will present -different alternatives, taking into account the possible cases that need -to be addressed - -:: - - C0. a[1]; a[1,2] # Traditional indexing - C1. a[Z=3] - C2. a[Z=3, R=4] - C3. a[1, Z=3] - C4. a[1, Z=3, R=4] - C5. a[1, 2, Z=3] - C6. a[1, 2, Z=3, R=4] - C7. a[1, Z=3, 2, R=4] # Interposed ordering - -Strategy "Strict dictionary" ----------------------------- - -This strategy acknowledges that ``__getitem__`` is special in accepting only -one object, and the nature of that object must be non-ambiguous in its -specification of the axes: it can be either by order, or by name. As a result -of this assumption, in presence of keyword arguments, the passed entity is a -dictionary and all labels must be specified. - -:: - - C0. a[1]; a[1,2] -> idx = 1; idx = (1, 2) - C1. a[Z=3] -> idx = {"Z": 3} - C2. a[Z=3, R=4] -> idx = {"Z": 3, "R": 4} - C3. a[1, Z=3] -> raise SyntaxError - C4. a[1, Z=3, R=4] -> raise SyntaxError - C5. a[1, 2, Z=3] -> raise SyntaxError - C6. a[1, 2, Z=3, R=4] -> raise SyntaxError - C7. a[1, Z=3, 2, R=4] -> raise SyntaxError - -Pros -'''' - -- Strong conceptual similarity between the tuple case and the dictionary case. - In the first case, we are specifying a tuple, so we are naturally defining - a plain set of values separated by commas. In the second, we are specifying a - dictionary, so we are specifying a homogeneous set of key/value pairs, as - in ``dict(Z=3, R=4)``; -- Simple and easy to parse on the ``__getitem__`` side: if it gets a tuple, - determine the axes using positioning. If it gets a dictionary, use - the keywords. -- C interface does not need changes. - -Neutral -''''''' - -- Degeneracy of ``a[{"Z": 3, "R": 4}]`` with ``a[Z=3, R=4]`` means the notation - is syntactic sugar. - -Cons -'''' - -- Very strict. -- Destroys ordering of the passed arguments. Preserving the - order would be possible with an OrderedDict as drafted by PEP-468 [#PEP-468]_. -- Does not allow use cases with mixed positional/keyword arguments such as - ``a[1, 2, default=5]``. - -Strategy "mixed dictionary" ---------------------------- - -This strategy relaxes the above constraint to return a dictionary containing -both numbers and strings as keys. - -:: - - C0. a[1]; a[1,2] -> idx = 1; idx = (1, 2) - C1. a[Z=3] -> idx = {"Z": 3} - C2. a[Z=3, R=4] -> idx = {"Z": 3, "R": 4} - C3. a[1, Z=3] -> idx = { 0: 1, "Z": 3} - C4. a[1, Z=3, R=4] -> idx = { 0: 1, "Z": 3, "R": 4} - C5. a[1, 2, Z=3] -> idx = { 0: 1, 1: 2, "Z": 3} - C6. a[1, 2, Z=3, R=4] -> idx = { 0: 1, 1: 2, "Z": 3, "R": 4} - C7. a[1, Z=3, 2, R=4] -> idx = { 0: 1, "Z": 3, 2: 2, "R": 4} - -Pros -'''' -- Opens for mixed cases. - -Cons -'''' -- Destroys ordering information for string keys. We have no way of saying if - ``"Z"`` in C7 was in position 1 or 3. -- Implies switching from a tuple to a dict as soon as one specified index - has a keyword argument. May be confusing to parse. - -Strategy "named tuple" ------------------------ - -Return a named tuple for ``idx`` instead of a tuple. Keyword arguments would -obviously have their stated name as key, and positional argument would have an -underscore followed by their order: - -:: - - C0. a[1]; a[1,2] -> idx = 1; idx = (_0=1, _1=2) - C1. a[Z=3] -> idx = (Z=3) - C2. a[Z=3, R=2] -> idx = (Z=3, R=2) - C3. a[1, Z=3] -> idx = (_0=1, Z=3) - C4. a[1, Z=3, R=2] -> idx = (_0=1, Z=3, R=2) - C5. a[1, 2, Z=3] -> idx = (_0=1, _2=2, Z=3) - C6. a[1, 2, Z=3, R=4] -> (_0=1, _1=2, Z=3, R=4) - C7. a[1, Z=3, 2, R=4] -> (_0=1, Z=3, _1=2, R=4) - or (_0=1, Z=3, _2=2, R=4) - or raise SyntaxError - -The required typename of the namedtuple could be ``Index`` or the name of the -argument in the function definition, it keeps the ordering and is easy to -analyse by using the ``_fields`` attribute. It is backward compatible, provided -that C0 with more than one entry now passes a namedtuple instead of a plain -tuple. - -Pros -'''' -- Looks nice. namedtuple transparently replaces tuple and gracefully - degrades to the old behavior. -- Does not require a change in the C interface - -Cons -'''' -- According to some sources [#namedtuple]_ namedtuple is not well developed. - To include it as such important object would probably require rework - and improvement; -- The namedtuple fields, and thus the type, will have to change according - to the passed arguments. This can be a performance bottleneck, and makes - it impossible to guarantee that two subsequent index accesses get the same - Index class; -- the ``_n`` "magic" fields are a bit unusual, but ipython already uses them - for result history. -- Python currently has no builtin namedtuple. The current one is available - in the "collections" module in the standard library. -- Differently from a function, the two notations ``gridValues[x=3, y=5, z=8]`` - and ``gridValues[3,5,8]`` would not gracefully match if the order is modified - at call time (e.g. we ask for ``gridValues[y=5, z=8, x=3])``. In a function, - we can pre-define argument names so that keyword arguments are properly - matched. Not so in ``__getitem__``, leaving the task for interpreting and - matching to ``__getitem__`` itself. - - -Strategy "New argument contents" --------------------------------- - -In the current implementation, when many arguments are passed to ``__getitem__``, -they are grouped in a tuple and this tuple is passed to ``__getitem__`` as the -single argument ``idx``. This strategy keeps the current signature, but expands the -range of variability in type and contents of ``idx`` to more complex representations. - -We identify four possible ways to implement this strategy: - -- **P1**: uses a single dictionary for the keyword arguments. -- **P2**: uses individual single-item dictionaries. -- **P3**: similar to **P2**, but replaces single-item dictionaries with a ``(key, value)`` tuple. -- **P4**: similar to **P2**, but uses a special and additional new object: ``keyword()`` - -Some of these possibilities lead to degenerate notations, i.e. indistinguishable -from an already possible representation. Once again, the proposed notation -becomes syntactic sugar for these representations. - -Under this strategy, the old behavior for C0 is unchanged. - -:: - - C0: a[1] -> idx = 1 # integer - a[1,2] -> idx = (1,2) # tuple - -In C1, we can use either a dictionary or a tuple to represent key and value pair -for the specific indexing entry. We need to have a tuple with a tuple in C1 -because otherwise we cannot differentiate ``a["Z", 3]`` from ``a[Z=3]``. - -:: - - C1: a[Z=3] -> idx = {"Z": 3} # P1/P2 dictionary with single key - or idx = (("Z", 3),) # P3 tuple of tuples - or idx = keyword("Z", 3) # P4 keyword object - -As you can see, notation P1/P2 implies that ``a[Z=3]`` and ``a[{"Z": 3}]`` will -call ``__getitem__`` passing the exact same value, and is therefore syntactic -sugar for the latter. Same situation occurs, although with different index, for -P3. Using a keyword object as in P4 would remove this degeneracy. - -For the C2 case: - -:: - - C2. a[Z=3, R=4] -> idx = {"Z": 3, "R": 4} # P1 dictionary/ordereddict - or idx = ({"Z": 3}, {"R": 4}) # P2 tuple of two single-key dict - or idx = (("Z", 3), ("R", 4)) # P3 tuple of tuples - or idx = (keyword("Z", 3), - keyword("R", 4) ) # P4 keyword objects - - -P1 naturally maps to the traditional ``**kwargs`` behavior, however it breaks -the convention that two or more entries for the index produce a tuple. P2 -preserves this behavior, and additionally preserves the order. Preserving the -order would also be possible with an OrderedDict as drafted by PEP-468 [#PEP-468]_. - -The remaining cases are here shown: - -:: - - C3. a[1, Z=3] -> idx = (1, {"Z": 3}) # P1/P2 - or idx = (1, ("Z", 3)) # P3 - or idx = (1, keyword("Z", 3)) # P4 - - C4. a[1, Z=3, R=4] -> idx = (1, {"Z": 3, "R": 4}) # P1 - or idx = (1, {"Z": 3}, {"R": 4}) # P2 - or idx = (1, ("Z", 3), ("R", 4)) # P3 - or idx = (1, keyword("Z", 3), - keyword("R", 4)) # P4 - - C5. a[1, 2, Z=3] -> idx = (1, 2, {"Z": 3}) # P1/P2 - or idx = (1, 2, ("Z", 3)) # P3 - or idx = (1, 2, keyword("Z", 3)) # P4 - - C6. a[1, 2, Z=3, R=4] -> idx = (1, 2, {"Z":3, "R": 4}) # P1 - or idx = (1, 2, {"Z": 3}, {"R": 4}) # P2 - or idx = (1, 2, ("Z", 3), ("R", 4)) # P3 - or idx = (1, 2, keyword("Z", 3), - keyword("R", 4)) # P4 - - C7. a[1, Z=3, 2, R=4] -> idx = (1, 2, {"Z": 3, "R": 4}) # P1. Pack the keyword arguments. Ugly. - or raise SyntaxError # P1. Same behavior as in function calls. - or idx = (1, {"Z": 3}, 2, {"R": 4}) # P2 - or idx = (1, ("Z", 3), 2, ("R", 4)) # P3 - or idx = (1, keyword("Z", 3), - 2, keyword("R", 4)) # P4 - -Pros -'''' -- Signature is unchanged; -- P2/P3 can preserve ordering of keyword arguments as specified at indexing, -- P1 needs an OrderedDict, but would destroy interposed ordering if allowed: - all keyword indexes would be dumped into the dictionary; -- Stays within traditional types: tuples and dicts. Evt. OrderedDict; -- Some proposed strategies are similar in behavior to a traditional function call; -- The C interface for ``PyObject_GetItem`` and family would remain unchanged. - -Cons -'''' -- Apparently complex and wasteful; -- Degeneracy in notation (e.g. ``a[Z=3]`` and ``a[{"Z":3}]`` are equivalent and - indistinguishable notations at the ``__[get|set|del]item__`` level). - This behavior may or may not be acceptable. -- for P4, an additional object similar in nature to slice() is needed, - but only to disambiguate the above degeneracy. -- ``idx`` type and layout seems to change depending on the whims of the caller; -- May be complex to parse what is passed, especially in the case of tuple of tuples; -- P2 Creates a lot of single keys dictionary as members of a tuple. Looks ugly. - P3 would be lighter and easier to use than the tuple of dicts, and still - preserves order (unlike the regular dict), but would result in clumsy - extraction of keywords. - -Strategy "kwargs argument" ---------------------------- - -``__getitem__`` accepts an optional ``**kwargs`` argument which should be keyword only. -``idx`` also becomes optional to support a case where no non-keyword arguments are allowed. -The signature would then be either - -:: - - __getitem__(self, idx) - __getitem__(self, idx, **kwargs) - __getitem__(self, **kwargs) - -Applied to our cases would produce: - -:: - - C0. a[1,2] -> idx=(1,2); kwargs={} - C1. a[Z=3] -> idx=None ; kwargs={"Z":3} - C2. a[Z=3, R=4] -> idx=None ; kwargs={"Z":3, "R":4} - C3. a[1, Z=3] -> idx=1 ; kwargs={"Z":3} - C4. a[1, Z=3, R=4] -> idx=1 ; kwargs={"Z":3, "R":4} - C5. a[1, 2, Z=3] -> idx=(1,2); kwargs={"Z":3} - C6. a[1, 2, Z=3, R=4] -> idx=(1,2); kwargs={"Z":3, "R":4} - C7. a[1, Z=3, 2, R=4] -> raise SyntaxError # in agreement to function behavior - -Empty indexing ``a[]`` of course remains invalid syntax. - -Pros -'''' -- Similar to function call, evolves naturally from it; -- Use of keyword indexing with an object whose ``__getitem__`` - doesn't have a kwargs will fail in an obvious way. - That's not the case for the other strategies. - -Cons -'''' -- It doesn't preserve order, unless an OrderedDict is used; -- Forbids C7, but is it really needed? -- Requires a change in the C interface to pass an additional - PyObject for the keyword arguments. - - -C interface -=========== - -As briefly introduced in the previous analysis, the C interface would -potentially have to change to allow the new feature. Specifically, -``PyObject_GetItem`` and related routines would have to accept an additional -``PyObject *kw`` argument for Strategy "kwargs argument". The remaining -strategies would not require a change in the C function signatures, but the -different nature of the passed object would potentially require adaptation. - -Strategy "named tuple" would behave correctly without any change: the class -returned by the factory method in collections returns a subclass of tuple, -meaning that ``PyTuple_*`` functions can handle the resulting object. - -Alternative Solutions -===================== - -In this section, we present alternative solutions that would workaround the -missing feature and make the proposed enhancement not worth of implementation. - -Use a method ------------- - -One could keep the indexing as is, and use a traditional ``get()`` method for those -cases where basic indexing is not enough. This is a good point, but as already -reported in the introduction, methods have a different semantic weight from -indexing, and you can't use slices directly in methods. Compare e.g. -``a[1:3, Z=2]`` with ``a.get(slice(1,3), Z=2)``. - -The authors however recognize this argument as compelling, and the advantage -in semantic expressivity of a keyword-based indexing may be offset by a rarely -used feature that does not bring enough benefit and may have limited adoption. - -Emulate requested behavior by abusing the slice object ------------------------------------------------------- - -This extremely creative method exploits the slice objects' behavior, provided -that one accepts to use strings (or instantiate properly named placeholder -objects for the keys), and accept to use ":" instead of "=". - -:: - - >>> a["K":3] - slice('K', 3, None) - >>> a["K":3, "R":4] - (slice('K', 3, None), slice('R', 4, None)) - >>> - -While clearly smart, this approach does not allow easy inquire of the key/value -pair, it's too clever and esotheric, and does not allow to pass a slice as in -``a[K=1:10:2]``. - -However, Tim Delaney comments - - "I really do think that ``a[b=c, d=e]`` should just be syntax sugar for - ``a['b':c, 'd':e]``. It's simple to explain, and gives the greatest backwards - compatibility. In particular, libraries that already abused slices in this - way will just continue to work with the new syntax." - -We think this behavior would produce inconvenient results. The library Pandas uses -strings as labels, allowing notation such as - -:: - - >>> a[:, "A":"F"] - -to extract data from column "A" to column "F". Under the above comment, this notation -would be equally obtained with - -:: - - >>> a[:, A="F"] - -which is weird and collides with the intended meaning of keyword in indexing, that -is, specifying the axis through conventional names rather than positioning. - -Pass a dictionary as an additional index ----------------------------------------- - -:: - - >>> a[1, 2, {"K": 3}] - -this notation, although less elegant, can already be used and achieves similar -results. It's evident that the proposed Strategy "New argument contents" can be -interpreted as syntactic sugar for this notation. - -Additional Comments -=================== - -Commenters also expressed the following relevant points: - -Relevance of ordering of keyword arguments ------------------------------------------- - -As part of the discussion of this PEP, it's important to decide if the ordering -information of the keyword arguments is important, and if indexes and keys can -be ordered in an arbitrary way (e.g. ``a[1,Z=3,2,R=4]``). PEP-468 [#PEP-468]_ -tries to address the first point by proposing the use of an ordereddict, -however one would be inclined to accept that keyword arguments in indexing are -equivalent to kwargs in function calls, and therefore as of today equally -unordered, and with the same restrictions. - -Need for homogeneity of behavior --------------------------------- - -Relative to Strategy "New argument contents", a comment from Ian Cordasco -points out that +Before attacking the problem of adding keyword arguments to the indexing +notation, it is relevant to analyse how the indexing notation works today, +in which contexts, and how it is different from a function call. - "it would be unreasonable for just one method to behave totally - differently from the standard behaviour in Python. It would be confusing for - only ``__getitem__`` (and ostensibly, ``__setitem__``) to take keyword - arguments but instead of turning them into a dictionary, turn them into - individual single-item dictionaries." We agree with his point, however it must - be pointed out that ``__getitem__`` is already special in some regards when it - comes to passed arguments. +The first critical difference of the indexing notation compared to a function +is that indexing can be used for both getting and setting operations: +in python, a function cannot be on the left hand side of an assignment. In other words, +both of these are valid -Chris Angelico also states: + x = a[1, 2] + a[1, 2] = 5 - "it seems very odd to start out by saying "here, let's give indexing the - option to carry keyword args, just like with function calls", and then come - back and say "oh, but unlike function calls, they're inherently ordered and - carried very differently"." Again, we agree on this point. The most - straightforward strategy to keep homogeneity would be Strategy "kwargs - argument", opening to a ``**kwargs`` argument on ``__getitem__``. +but only the first one of these is valid -One of the authors (Stefano Borini) thinks that only the "strict dictionary" -strategy is worth of implementation. It is non-ambiguous, simple, does not -force complex parsing, and addresses the problem of referring to axes either -by position or by name. The "options" use case is probably best handled with -a different approach, and may be irrelevant for this PEP. The alternative -"named tuple" is another valid choice. + x = f(1, 2) + f(1, 2) = 5 # invalid -Having .get() become obsolete for indexing with default fallback ----------------------------------------------------------------- +This asymmetry is important to understand that there is a natural imbalance +between the two forms, and therefore it is not a given that the two should +behave transparently and symmetrically. + +The second critical difference is that functions have names assigned to their +arguments, unless the passed parameters are captured with *args, in which case +they end up as entries in the args tuple. In other words, functions already +have anonymous argument semantic, exactly like the indexing operation, and +already collect the passed arguments in a tuple, although with a different +syntax: __getitem__ and __setitem__ always receive a tuple. -Introducing a "default" keyword could make ``dict.get()`` obsolete, which would be -replaced by ``d["key", default=3]``. Chris Angelico however states: +The third critical difference is that the indexing operation knows how to convert +colon notations to slices, thanks to support from the parser. This is valid - "Currently, you need to write ``__getitem__`` (which raises an exception on - finding a problem) plus something else, e.g. ``get()``, which returns a default - instead. By your proposal, both branches would go inside ``__getitem__``, which - means they could share code; but there still need to be two branches." + a[1:3] -Additionally, Chris continues: +this one isn't + + f(1:3) - "There'll be an ad-hoc and fairly arbitrary puddle of names (some will go - ``default=``, others will say that's way too long and go ``def=``, except that - that's a keyword so they'll use ``dflt=`` or something...), unless there's a - strong force pushing people to one consistent name.". +Compatibility Hard points +------------------------- -This argument is valid but it's equally valid for any function call, and is -generally fixed by established convention and documentation. +After discussion, it was found out that the new syntax will have a fixed set of hard points, no matter +the final implementation: -On degeneracy of notation -------------------------- +* Invoking indexing _must_ accept some object. E.g. `a[]` is still syntax error. +* It must be possible to mix single values and named indexes, e.g. `a[1, 2, foo=3]` +* No walrus allowed. E.g. `a[foo:=3] is disallowed. -User Drekin commented: "The case of ``a[Z=3]`` and ``a[{"Z": 3}]`` is similar to -current ``a[1, 2]`` and ``a[(1, 2)]``. Even though one may argue that the parentheses -are actually not part of tuple notation but are just needed because of syntax, -it may look as degeneracy of notation when compared to function call: ``f(1, 2)`` -is not the same thing as ``f((1, 2))``.". References ========== -.. [#keyword-1] "keyword-only args in __getitem__" - (http://article.gmane.org/gmane.comp.python.ideas/27584) - -.. [#keyword-2] "Accepting keyword arguments for __getitem__" - (https://mail.python.org/pipermail/python-ideas/2014-June/028164.html) - -.. [#keyword-3] "PEP pre-draft: Support for indexing with keyword arguments" - https://mail.python.org/pipermail/python-ideas/2014-July/028250.html - -.. [#namedtuple] "namedtuple is not as good as it should be" - (https://mail.python.org/pipermail/python-ideas/2013-June/021257.html) +.. [#rejection] "Rejection of PEP-0472" + (https://mail.python.org/pipermail/python-dev/2019-March/156693.html) +.. [#pep-0484] "PEP-0484 -- Type hints" + (https://www.python.org/dev/peps/pep-0484) +.. [#request-1] "Allow kwargs in __{get|set|del}item__" + (https://mail.python.org/archives/list/python-ideas@python.org/thread/EUGDRTRFIY36K4RM3QRR52CKCI7MIR2M/) +.. [#request-2] "PEP 472 -- Support for indexing with keyword arguments" + (https://mail.python.org/archives/list/python-ideas@python.org/thread/6OGAFDWCXT5QVV23OZWKBY4TXGZBVYZS/) -.. [#PEP-468] "Preserving the order of \*\*kwargs in a function." - http://legacy.python.org/dev/peps/pep-0468/ Copyright ========= From 0e0d4e7f22af3321d418ffbb2e77eb2b1ff86577 Mon Sep 17 00:00:00 2001 From: Stefano Borini Date: Wed, 2 Sep 2020 07:43:43 +0100 Subject: [PATCH 06/29] Addressed review comments --- pep-9999.txt | 29 +++++++++++++++++------------ 1 file changed, 17 insertions(+), 12 deletions(-) diff --git a/pep-9999.txt b/pep-9999.txt index af47d2df352..7cb49415485 100644 --- a/pep-9999.txt +++ b/pep-9999.txt @@ -21,25 +21,30 @@ accepted during indexing operations. Notations in the form ``a[K=3, R=2]`` would become legal syntax. A final strategy will be proposed in terms of semantics and implementation. -This PEP is a rework and expansion of PEP-0472, where an extension of the -indexing operation to support keyword arguments was analysed. PEP-0472 was +This PEP is a rework and expansion of PEP 472, where an extension of the +indexing operation to support keyword arguments was analysed. PEP 472 was Rejected due to apparent lack of interest back in 2019. However, renewed interest has prompted a re-analysis and therefore this PEP. Background ========== -PEP-0472 was opened in 2014. The PEP focused on various use cases and was extracted +PEP 472 was opened in 2014. The PEP focused on various use cases and was extracted from a broad discussion on implementation strategies. The PEP was eventually rejected in 2019 [#rejection]_ mostly due to lack of interest despite its 5 years of existence. -However, with the introduction of type hints in PEP-0484 [#pep-0484]_ the +However, with the introduction of type hints in PEP 484 [#pep-0484]_ the square bracket notation has been used consistently to enrich the typing -annotations, e.g. to specify a list of integers as Sequence[int]. As a result, -a renewed interest in a more flexible syntax that would allow for named -information has been expressed in many different threads. +annotations, e.g. to specify a list of integers as Sequence[int]. Additionally, +there has been an expanded growth of packages for data analysis such as pandas +and xarray, which use names to describe columns in a table (pandas) or axis in +an nd-array (xarray). These packages allow users to access specific data by +names, but cannot currently use index notation ([]) for this functionality. -During the investigation of PEP-0472, many different strategies have been +As a result, a renewed interest in a more flexible syntax that would allow for +named information has been expressed in many different threads. + +During the investigation of PEP 472, many different strategies have been proposed to expand the language, but no real consensus was reached. Many corner cases have been examined more closely and felt awkward, backward incompatible or both. Renewed interest was prompted by Caleb Donovick [#request-1]_ in 2019 @@ -109,9 +114,9 @@ keyworded specification: Indexing and contextual option. For indexing: :: - >>> db[db['x'] == 1] + >>> df[df['x'] == 1] - which could be replaced with db[x=1]. + which could be replaced with df[x=1]. 5. xarray has named dimensions. Currently these are handled with functions .isel: @@ -186,9 +191,9 @@ the final implementation: References ========== -.. [#rejection] "Rejection of PEP-0472" +.. [#rejection] "Rejection of PEP 472" (https://mail.python.org/pipermail/python-dev/2019-March/156693.html) -.. [#pep-0484] "PEP-0484 -- Type hints" +.. [#pep-0484] "PEP 484 -- Type hints" (https://www.python.org/dev/peps/pep-0484) .. [#request-1] "Allow kwargs in __{get|set|del}item__" (https://mail.python.org/archives/list/python-ideas@python.org/thread/EUGDRTRFIY36K4RM3QRR52CKCI7MIR2M/) From f6846c1afadb438daa44c04d728d75aa7575c456 Mon Sep 17 00:00:00 2001 From: Stefano Borini Date: Wed, 2 Sep 2020 08:35:36 +0100 Subject: [PATCH 07/29] Added clarification for the current status of index call --- pep-9999.txt | 81 ++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 73 insertions(+), 8 deletions(-) diff --git a/pep-9999.txt b/pep-9999.txt index 7cb49415485..ad2d527950f 100644 --- a/pep-9999.txt +++ b/pep-9999.txt @@ -149,13 +149,17 @@ is that indexing can be used for both getting and setting operations: in python, a function cannot be on the left hand side of an assignment. In other words, both of these are valid - x = a[1, 2] - a[1, 2] = 5 + :: + + >>> x = a[1, 2] + >>> a[1, 2] = 5 but only the first one of these is valid - x = f(1, 2) - f(1, 2) = 5 # invalid + :: + + >>> x = f(1, 2) + >>> f(1, 2) = 5 # invalid This asymmetry is important to understand that there is a natural imbalance between the two forms, and therefore it is not a given that the two should @@ -164,9 +168,38 @@ behave transparently and symmetrically. The second critical difference is that functions have names assigned to their arguments, unless the passed parameters are captured with *args, in which case they end up as entries in the args tuple. In other words, functions already -have anonymous argument semantic, exactly like the indexing operation, and -already collect the passed arguments in a tuple, although with a different -syntax: __getitem__ and __setitem__ always receive a tuple. +have anonymous argument semantic, exactly like the indexing operation. However, +__(get|set|del)item__ is not always receiving a tuple as the `index` argument +(to be uniform in behavior with *args). In fact, given a trivial class: + + + :: + + class X: + def __getitem__(self, index): + print(index) + +The index operation basically forwards the content of the square brackets "as is" +in the `index` argument: + + :: + + >>> x=X() + >>> x[0] + 0 + >>> x[0,1] + (0, 1) + >>> x[(0,1)] + (0, 1) + >>> + >>> x[()] + () + >>> x[{1,2,3}] + {1, 2, 3} + >>> x["hello"] + hello + >>> x["hello", "hi"] + ('hello', 'hi') The third critical difference is that the indexing operation knows how to convert colon notations to slices, thanks to support from the parser. This is valid @@ -177,7 +210,7 @@ this one isn't f(1:3) -Compatibility Hard points +Compatibility Hard Points ------------------------- After discussion, it was found out that the new syntax will have a fixed set of hard points, no matter @@ -188,6 +221,38 @@ the final implementation: * No walrus allowed. E.g. `a[foo:=3] is disallowed. +Syntax and semantics +==================== + +Ricky Teachey's proposal. + + +Alternative Syntax and Semantics +================================ + +Steven's proposal + + +Rejected Ideas +============== + + + + + + + + + + + + + + + + + + References ========== From 1232ffc23bf490eacf9cff3d8bbd4f1682f76a25 Mon Sep 17 00:00:00 2001 From: Stefano Borini Date: Thu, 3 Sep 2020 08:54:37 +0100 Subject: [PATCH 08/29] Reformatted discourse for better flow. Brought some concepts from 4/10/2019 py-ideas thread. --- pep-9999.txt | 201 ++++++++++++++++++++++++++++++++++----------------- 1 file changed, 134 insertions(+), 67 deletions(-) diff --git a/pep-9999.txt b/pep-9999.txt index ad2d527950f..ea3352e6558 100644 --- a/pep-9999.txt +++ b/pep-9999.txt @@ -17,7 +17,7 @@ Abstract ======== This PEP proposed extending python to allow keyword-like arguments to be -accepted during indexing operations. Notations in the form ``a[K=3, R=2]`` +accepted during indexing operations. Notations in the form ``a[42, K=3, R=2]`` would become legal syntax. A final strategy will be proposed in terms of semantics and implementation. @@ -26,8 +26,11 @@ indexing operation to support keyword arguments was analysed. PEP 472 was Rejected due to apparent lack of interest back in 2019. However, renewed interest has prompted a re-analysis and therefore this PEP. +Overview +======== + Background -========== +---------- PEP 472 was opened in 2014. The PEP focused on various use cases and was extracted from a broad discussion on implementation strategies. The PEP was eventually rejected @@ -52,48 +55,21 @@ and Andras Tantos [#request-2]_ in 2020. These requests prompted a strong activi on the python-ideas mailing list, where various options have been discussed and a general consensus has been reached. -Motivation and Use cases -======================== - -The current python syntax focuses exclusively on position to express the -index, nd also contains syntactic sugar to refer to -non-punctiform selection (slices) - -:: - - >>> a[3] # returns the fourth element of a - >>> a[1:10:2] # slice notation (extract a non-trivial data subset) - >>> a[3,2] # multiple indexes (for multidimensional arrays) - -The additional notation proposed in this PEP would allow notations involving -keyword arguments in the indexing operation, e.g. - -:: - - >>> a[K=3, R=2] - -which would allow a more flexible way to indicise content. - -One must additionally consider the extended form that allows both positional -and keyword specification - -:: - - >>> a[3, R=3, K=4] - Use cases --------- -The following practical use cases present two broad categories of usage of a -keyworded specification: Indexing and contextual option. For indexing: +The following practical use cases present different cases where a keyworded +specification would improve notation and provide additional value: 1. To provide a more communicative meaning to the index, preventing e.g. accidental inversion of indexes :: - >>> gridValues[x=3, y=5, z=8] - >>> rain[time=0:12, location=location] + >>> grid_position[x=3, y=5, z=8] + >>> rain_amount[time=0:12, location=location] + >>> matrix[row=20, col=40] + 2. To enrich the typing notation with keywords, especially during the use of generics @@ -124,30 +100,62 @@ keyworded specification: Indexing and contextual option. For indexing: >>> data.isel(row=10) # Returns the tenth row - which could also be replaced with data[row=10]. A more complex example: + which could also be replaced with `data[row=10]`. A more complex example: :: - >>> da.isel(space=0, time=slice(None, 2))[...] = spam - >>> da[space=0, time=:2] = spam + >>> # old syntax + >>> da.isel(space=0, time=slice(None, 2))[...] = spam + >>> # new syntax + >>> da[space=0, time=:2] = spam # new syntax -It is important to note that How the notation is interpreted is up to the +It is important to note that how the notation is interpreted is up to the implementation. This PEP only defines and dictates the behavior of python -regarding passed keyword arguments. Not how these arguments should be +regarding passed keyword arguments, not how these arguments should be interpreted and used by the implementing class. +Syntax and Semantics +==================== + Current status -============== +-------------- -Before attacking the problem of adding keyword arguments to the indexing -notation, it is relevant to analyse how the indexing notation works today, -in which contexts, and how it is different from a function call. +Before attacking the problem of detailing the new syntax and semantics to the +indexing notation, it is relevant to analyse how the indexing notation works +today, in which contexts, and how it is different from a function call. -The first critical difference of the indexing notation compared to a function -is that indexing can be used for both getting and setting operations: -in python, a function cannot be on the left hand side of an assignment. In other words, -both of these are valid +Subscripting obj[x] is, effectively, an alternate and specialised form of +function call syntax with a number of differences and restrictions compared to +obj(x). The current python syntax focuses exclusively on position to express +the index, and also contains syntactic sugar to refer to non-punctiform +selection (slices). Some common examples: + +:: + + >>> a[3] # returns the fourth element of a + >>> a[1:10:2] # slice notation (extract a non-trivial data subset) + >>> a[3, 2] # multiple indexes (for multidimensional arrays) + +This translates into a __(get|set|del)item__ dunder call which is passed a single +parameter containing the index (for __getitem__ and __delitem__) or two parameters +containing index and value (for __setitem__). + +The behavior of the indexing call is fundamentally different from a function call +in various aspects: + +The first difference is in meaning to the reader. A function call says +"arbitrary function call potentially with side-effects". An indexing operation +says "lookup", typically to point at a subset or specific sub-aspect of an +entity (as in the case of typing notation). This fundamental difference means +that, while we cannot prevent abuse, implementors should be aware that the +introduction of keyword arguments to alter the behavior of the lookup may +violate this intrinsic meaning. + +The second difference of the indexing notation compared to a function +is that indexing can be used for both getting and setting operations. +In python, a function cannot be on the left hand side of an assignment. In +other words, both of these are valid :: @@ -163,9 +171,9 @@ but only the first one of these is valid This asymmetry is important to understand that there is a natural imbalance between the two forms, and therefore it is not a given that the two should -behave transparently and symmetrically. +behave transparently and symmetrically. -The second critical difference is that functions have names assigned to their +The third difference is that functions have names assigned to their arguments, unless the passed parameters are captured with *args, in which case they end up as entries in the args tuple. In other words, functions already have anonymous argument semantic, exactly like the indexing operation. However, @@ -175,9 +183,9 @@ __(get|set|del)item__ is not always receiving a tuple as the `index` argument :: - class X: - def __getitem__(self, index): - print(index) + class X: + def __getitem__(self, index): + print(index) The index operation basically forwards the content of the square brackets "as is" in the `index` argument: @@ -187,32 +195,52 @@ in the `index` argument: >>> x=X() >>> x[0] 0 - >>> x[0,1] + >>> x[0, 1] (0, 1) - >>> x[(0,1)] + >>> x[(0, 1)] (0, 1) >>> >>> x[()] () - >>> x[{1,2,3}] + >>> x[{1, 2, 3}] {1, 2, 3} >>> x["hello"] hello >>> x["hello", "hi"] ('hello', 'hi') -The third critical difference is that the indexing operation knows how to convert +The fourth difference is that the indexing operation knows how to convert +colon notations to slices, thanks to support from the parser. This is valid + + :: + + a[1:3] + +this one isn't + + :: + + f(1:3) + +The fifth difference is that there's no zero-argument form. This is valid + colon notations to slices, thanks to support from the parser. This is valid - a[1:3] + :: + + f() this one isn't + + :: + + a[] - f(1:3) Compatibility Hard Points ------------------------- +Any change to the current behavior After discussion, it was found out that the new syntax will have a fixed set of hard points, no matter the final implementation: @@ -221,34 +249,73 @@ the final implementation: * No walrus allowed. E.g. `a[foo:=3] is disallowed. -Syntax and semantics -==================== +New Proposal +------------ -Ricky Teachey's proposal. +We propose to allow notations involving keyword arguments in the indexing +operation, e.g. +:: -Alternative Syntax and Semantics -================================ + >>> a[K=3, R=2] -Steven's proposal +which would allow a more flexible way to indicise content. +One must additionally consider the extended form that allows both positional +and keyword specification -Rejected Ideas -============== +:: + + >>> a[3, R=3, K=4] + + +We also ensure that the current semantic for slices is applied to keyword arguments +as well. This syntax is valid: + +:: + + >>> a[3, R=3:10, K=4] +Syntax and semantics (Ricky Teachey's proposal) +=============================================== +This proposal introduces new dunders __(get|set|del)item_ex__ +that are invoked over the __(get|set|del)item__ triad, if they are present. +The rationale around this choice is to make the intuition around how to add kwd +arg support to square brackets more obvious and in line with the function +behavior. It would also make writing code for specialized libraries that tend +to use item dunders, like pandas and xarray, much easier. Right now such +libraries have to rely on their own efforts to break up a key, or use +"functions in stead" (e.g. iloc()) +Problems with this approach: +* __setitem_ex__ value would need to be the first element, because the index is of arbitrary length. +Alternative Syntax and Semantics +================================ + +Steven's proposal + + +Rejected Ideas +============== + +PEP 472 presents a good amount of ideas that are now all to be considered Rejected. A personal email from D'Aprano to the Author +specifically said: +"I have now carefully read through PEP 472 in full, and I am afraid I +cannot support any of the strategies currently in the PEP." +Moreover, additional ideas and discussion occurred during the re-evaluation of the PEP: +1. create a new "kwslice" object From 21bdab8953450a82af511dd25d8002404f9cf517 Mon Sep 17 00:00:00 2001 From: Stefano Borini Date: Mon, 7 Sep 2020 08:10:10 +0100 Subject: [PATCH 09/29] Added more email distilled --- pep-9999.txt | 280 +++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 274 insertions(+), 6 deletions(-) diff --git a/pep-9999.txt b/pep-9999.txt index ea3352e6558..4549f31c29c 100644 --- a/pep-9999.txt +++ b/pep-9999.txt @@ -3,7 +3,7 @@ Title: Support for indexing with keyword arguments Version: $Revision$ Last-Modified: $Date$ Author: Stefano Borini, Jonathan Fine -Sponsor: +Sponsor: Steven D'Aprano Discussions-To: python-ideas@python.org Status: Draft Type: Standards Track @@ -278,8 +278,197 @@ as well. This syntax is valid: -Syntax and semantics (Ricky Teachey's proposal) -=============================================== + +Alternative Syntax and Semantics (Steven's proposal) +==================================================== + +Steven's proposal + +thing[ind1, ind2, kwd1=v1, kw2=v2] = value + +Translating to: + +thing.__setitem__(self, (ind1, ind2), value, kwd1=v1, kw2=v2) + +which is pretty darn weird -- particularly if you try to write the handler this way: + +def __setitem__(self, *args, **kwargs): + +so: args would always be a 2-tuple, something like: ((ind1, ind2), value) + + + def __getitem__(self, index, *, spam, eggs=None): + + +Is that even a question? + + obj[index, keyword=value] + +where index is any comma-separated list of expressions, including +slices, keyword is an identifier, and value is any expression including +slices. Are there even any other options being considered? + +A few points: + +- The index is optional, but it will be a runtime TypeError if the +method isn't defined with the corresponding parameter taking a default +value. + +- You can have multiple keyword=value pairs, separated by commas. They +must all follow the index part. Duplicate keywords is a runtime error, +just as they are are for function calls. + +- An empty subscript remains a syntax error, even if the method +signature would allow it. + + think the simplest default value would be Ellipsis. So foo[a=1, b=2] would be equivalent to foo[..., a=1, b=2] + +But I don't see why this is a problem we have to deal with. The index argument can just not be passed at all, and it is up to the class developer to pick an appropriate sentinel if needed. +The index argument can just not be passed at all, and it is up to the class developer to pick an appropriate sentinel if needed. +though it would be a (TypeError, rather than a SyntaxError) if no index were passed in. + +As "proper" exception handling should be close to the operation, and catch specific Exceptions, most case will probably be fine. But not all. For example, there might be code in the wild that does +try: + a_function(something) +except TypeError: + do_something + +And there is something in a_function that uses indexing -- someone messes with that code, and puts something new in an index that used to be a SyntaxError and is now a TypeError -- in the past, that wouldn't have even run, but now it will, and the problem might not be caught in tests because the TypeError is being handled. + +Given that this would be a change from a compile time error to a runtime error, there is no code that used to work that will break, but it would be easier to write broken code in certain situations -- maybe not a huge deal, but worth thinking about. + + + + +MISSING = object() + +def __getitem__(self, key, x=MISSING, y=MISSING): + if x is MISSING and y is MISSING:: + x, y = key + if x is missing: + x, = key + if y is MISSING: + y, = key + +And probably that code I just wrote has bugs. And it gets more complicated if we want to have more arguments than just two. And even more complicated if we want some of the arguments to be positional only or any other combination of things. + +This is code you would not have to write if we could do this instead with a new dunder or subscript processor: + +def __getx__(self, x, y): ... + +And these all just work: + +q[1, 2] +q[1, y=2] +q[y=2, x=1] + +1 is assigned to x and 2 is assigned to y in all of these for both versions, but the second certain requires not parsing of parameters. Python does it for us. That's a lot of easily available flexibility. + + +We could treat d[1, a=3] either as d[1,] + kwargs or as d[1] + kwargs. Have people debated this yet? + + +I don't think that anyone wants adding a keyword to a single-valued +subscript to change it to a tuple. At least, I really hope that nobody +wants this! + +So given the current behaviour: + + obj[1] # calls __getitem__(1) + obj[1,] # calls __getitem__((1,)) + +I expect that the first will be the most common. If we add a keyword to +the subscript: + + obj[1, a=3] + +I would expect that the it turns into `__getitem__(1, a=3)` which is +almost surely what the reader and coder expects. It would be quite weird +for the subscript 1 to turn into a tuple just because I add a keyword. + +That does leave the second case a little trickier to add a keyword to, +it would require a pair of parens to disambiguate it from above: + + obj[(1,), a=3] + +but I think that's likely to be obvious to the developer who is adding +in the keyword where previously no keyword existed. + + +That's a fair ruling. In general, when keywords are present, the rule that you can always omit an outermost pair of parentheses is no longer true. That is, d[(...)] and d[...] are always equivalent regardless what "..." stands for, as long as (...) is a valid expression (which it isn't if there are slices involved). Example: +``` +d[1] ~~~ d[(1)] +d[1,] ~~~ d[(1,)] +d[1, 2] ~~~ d[(1, 2)] +``` +But there is absolutely no such rule if keywords are present. + +FYI, Jonathan's post (once I "got" it) led me to a new way of reasoning about the various proposals (__keyfn__, __subscript__ and what I will keep calling "Steven's proposal") based on what the compiler and interpreter need to do to support this corner case. My tentative conclusion is that Steven's proposal is superior. But I have been reviewing my reasoning and pseudo-code a few times and I'm still not happy with it, so posting it will have to wait. + + +What the dict class's `__getitem__` would do with that is a different issue -- probably it would be an error. + + +> Doesn't that mean that a "index" will not be an allowable index label, and that this conflict will depend on knowing the particular implementation details of the dunder methods? +> + +Yes, that would be correct. However, the function could instead be defined as: + +def __getitem__(self, index, /, **kwargs): + ... + +and then there'd be no conflict (as "self" and "index" must be passed +positionally). + +Good edge case to consider. But would it really be such a problem? If you have an existing class like this: + + class C: + def __getitem__(self, index): ... + + c = C() + +then presumably calling `c[1, index=2]` would just be an error (since it would be like attempting to call the method with two values for the `index` argument), and ditto for `c[1, 2, 3, index=4]`. The only odd case might be `c[index=1]` -- but presumably that would be equivalent to `c[(), index=1]` so it would still fail. + +My point is that all existing `__getitem__` implementations will raise errors if any keywords are given, even if the keyword happens to correspond to the name of the argument (say, `index`). This is to counter Chris B's concern that if an existing `__getitem__` implementation didn't use the '/' notation to indicate that `self` and `index` are positional, it would have a bug. I claim that the bug will *only* be present when someone adds keyword support to their `__getitem__` method without using the '/'. Since that's not existing code, this "proves" that adding this feature would not introduce a subtle bug in a lot of existing code -- only in carelessly written new (or updated) code. + +The primary issue I was trying to find a way to reliably and clearly avoid conflicts between the index labels and the positional argument names. So if you have: + +__getitem__(self, index, **kwargs) + +You can't have an index label named "index", because it conflicts with the "index" positional argument. + +Apparently that isn't an issue if you structure it like this instead: + +__getitem__(self, index, /, **kwargs) + +But projects would need to know to do that. + +> In the "New syntax", wouldn't these examples map to: +> +> d[1, 2, a=3] => d.__getitem__((1, 2), a=3) +> and +> d[(1, 2), a=3] => d.__getitem__((1, 2), a=3) +> +Not quite. The second should be: + +d[(1, 2), a=3] => d.__getitem__(((1, 2),), a=3) + + py> d = Demo() + py> d[(1, 2)] # Tuple single arg. + (1, 2) + py> d[1, 2] # Still a tuple. + (1, 2) + + +Adding a keyword arg should not change this. + + + +Alternative Syntax and semantics +================================ + +Adding new dunders +------------------ This proposal introduces new dunders __(get|set|del)item_ex__ that are invoked over the __(get|set|del)item__ triad, if they are present. @@ -296,12 +485,88 @@ libraries have to rely on their own efforts to break up a key, or use Problems with this approach: * __setitem_ex__ value would need to be the first element, because the index is of arbitrary length. +* It will slow down subscripting. For every subscript access, this new dunder + attribute gets investigated on the class, and if it is not present then the + default key translation function is executed. Different ideas were proposed to handle this, from wrapping the method + only at class instantiation time (would not work when monkeypatching) +* It adds complexity -Alternative Syntax and Semantics -================================ +Again, implicit on your argument here is the assumption that all keyword indices necessarily map into positional indices. This may be the case with the use-case you had in mind. But for other use-cases brought up so far that assumption is false. +xarray, which is the primary python package for numpy arrays with labelled dimensions. It supports adding and indexing by additional dimensions that don't correspond directly to the dimensions of the underlying numpy array, and those have no position to match up to. They are called "non-dimension coordinates". + +Other people have wanted to allow parameters to be added when indexing, arguments in the index that change how the indexing behaves. These don't correspond to any dimension, either. + +possibly have to invert value and index?s +lass A: + __keyfn__ = None + def __setitem__(self, val, x=0, y=0, z=0): + print((val, x, y, z)) + + >>> a = A() + >>> a[1, z=2] = 'hello' + ('hello', 1, 0, 2)* + + +Adding an adapter function +-------------------------- + +Similar to the above, in the sense that a pre-function would be called to convert the "new style" indexing into "old style indexing" that is then passed. +Has problems similar to the above. + + +A single bit to change the behavior +----------------------------------- + +Ricky has given some examples. Here are more, all assuming + __keyfn__ = True + +First, this use of __keyfn__ would allow + >>> d[1, 2, z=3] +to result in + >>> d.__getitem__(1, 2, z=3) + +Some further examples: + >>> d[1, 2] + >>> d.__getitem__(1, 2) + + >>> d[(1, 2)] + >>> d.__getitem__((1, 2)) + + >>> d[a=1, b=2] + >>> d.__getitem__(a=1, b=2) + +I find the above easy to understand and use. For Steven's proposal the calls to __getitem__ would be + + >>> d[1, 2, z=3] + >>> d.__getitem__((1, 2), z=3) + + >>> d[1, 2] + >>> d.__getitem__((1, 2) + + >>> d[(1, 2)] # Same result as d[1, 2] + >>> d.__getitem__((1, 2)) # From d[(1, 2)] + + >>> d[a=1, b=2] + >>> d.__getitem__((), a=1, b=2) + +I find these harder to understand and use, which is precisely the point Ricky made in his most recent post. That's because there's a clear and precise analogy between + >>> x(1, 2, a=3, b=4) + >>> x[1, 2, a=3, b=4] + +I think it reasonable to argue adding a single bit to every class is not worth the benefit it provides. However, this argument should be supported by evidence. (As indeed should the argument that it is worth the benefit.) + +I also think it reasonable to argue that now is not the time to allow __keyfn__ to have values other than None or True. And that allowing further values should require an additional PEP. + +I don't recall seeing an argument that Steven's proposal is as easy to understand and use as mine (with __keyfn__ == None). + + +Yes. I find it a big flaw that the signature of __setitem__ is so strongly influenced by the value of __keyfunc__. For example, a static type checker (since PEP 484 I care deeply about those and they're popping up like mushrooms :-) would have to hard-code a special case for this, because there really is nothing else in Python where the signature of a dunder depends on the value of another dunder. + +And in case you don't care about static type checkers, I think it's the same for human readers. Whenever I see a __setitem__ function I must look everywhere else in the class (and in all its base classes) for a __keyfn__ before I can understand how the __setitem__ function's signature is mapped from the d[...] notation. + +Finally, I am unsure how you would deal with the difference between d[1] and d[1,], which must be preserved (for __keyfn__ = True or absent, for backwards compatibility). The bytecode compiler cannot assume to know the value of __keyfn__ (because d could be defined in another module or could be an instance of one of several classes defined in the current module). (I think this problem is also present in the __subscript__ version.) -Steven's proposal Rejected Ideas @@ -317,6 +582,9 @@ Moreover, additional ideas and discussion occurred during the re-evaluation of t 1. create a new "kwslice" object +Has anyone suggested attaching the keyword args as attributes +on the slice object? + From 53ad5df4dcd0bc1bbc806fce96334f58c6d404a0 Mon Sep 17 00:00:00 2001 From: Stefano Borini Date: Thu, 10 Sep 2020 08:37:31 +0100 Subject: [PATCH 10/29] Added more extracts from emails --- pep-9999.txt | 185 ++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 184 insertions(+), 1 deletion(-) diff --git a/pep-9999.txt b/pep-9999.txt index 4549f31c29c..96946744fbc 100644 --- a/pep-9999.txt +++ b/pep-9999.txt @@ -107,8 +107,21 @@ specification would improve notation and provide additional value: >>> # old syntax >>> da.isel(space=0, time=slice(None, 2))[...] = spam >>> # new syntax - >>> da[space=0, time=:2] = spam # new syntax + >>> da[space=0, time=:2] = spam + Another example: + + :: + + >>> # old syntax + >>> ds["empty"].loc[dict(lon=5, lat=6)] = 10 + >>> # new syntax + >>> ds["empty"][lon=5, lat=6] = 10 + + >>> # old syntax + >>> ds["empty"].loc[dict(lon=slice(1, 5), lat=slice(3, None))] = 10 + >>> # new syntax + >>> ds["empty"][lon=1:5, lat=6:] = 10 It is important to note that how the notation is interpreted is up to the implementation. This PEP only defines and dictates the behavior of python @@ -462,6 +475,49 @@ d[(1, 2), a=3] => d.__getitem__(((1, 2),), a=3) Adding a keyword arg should not change this. +An extra **kwds would be quite sufficient for xarray. We don't need to distinguish between `d[day=3, detector=4]` and `d[day=4, detector=3]`, at least not any differently from normal Python keyword arguments. + +One question that comes up: should d[**kwargs] be valid syntax? d[*args] currently is not, but that's OK since d[tuple(args)] is identical. + +On the other hand, we probably do need d[**kwargs] since there's no way to dynamically unpack keyword arguments (short of directly calling __getitem__). And perhaps for symmetry this suggests d[*args] should be valid, too, defined as equivalent to d[tuple(args)]. + + +If d[] were to be allowed, I would expect it to pass an empty +tuple as the index, since it's the limiting case of reducing the +number of positional indices. + + +We have `Tuple[int, int]` as a tuple of two integers. And we have `Tuple[int]` as a tuple of one integer. And occasionally we need to spell a tuple of *no* values, since that's the type of `()`. But we currently are forced to write that as `Tuple[()]`. If we allowed `Tuple[]` that odd edge case would be removed. + +So I probably would be okay with allowing `obj[]` syntactically, as long as the dict type could be made to reject it. + +For what its worth, NumPy uses `None` to indicate inserting a new +axis/dimensions (we have an `np.newaxis` alias as well): + + arr = np.array(5) + arr.ndim == 0 + arr[None].ndim == arr[None,].ndim == 1 + +So that would be problematic. There are two (subtly different [1]) +acceptable choices for `ndarray[]`: + +I think it is worth directly discussing the availability of slices in PEP 472-style keyword indices, since we seem to have mostly converged on a dunder method signature. This is an issue that has been alluded to regarding keyword-based (labelled) indices but not directly addressed. The basic syntax would be something like d[x=1:3]. + + +Type hints are indeed dispatched differently, but this is done based on information that is only available at runtime. Since PEP 560, for `x[y]`, if no `__getitem__` method is found, and `x` is a type (class) object, and `x` has a class method `__class_getitem__`, that method is called. Extending this with keyword args is straightforward. Modifying the compiler to generate different bytecode for this case is essentially impossible. + +See https://github.com/python/cpython/blob/6844b56176c41f0a0e25fcd4fef5463bcdbc7d7c/Objects/abstract.c#L181-L198 for the code (it's part of PyObject_GetItem). + + +Should +q[1, 2, k=3] +q[(1, 2), k=3] + +evaluate the same way? + +hat is, `d[::]` is syntactically valid, but `d[(::)]` is not. Try it. + + Alternative Syntax and semantics @@ -508,6 +564,22 @@ lass A: ('hello', 1, 0, 2)* +Objection 1: Slowing Things Down + +The INTENDED EFFECT of the changes to internals will be as Jonathan Fine described: every time a subscript operation occurs, this new dunder attribute gets investigated on the class, and if it is not present then the default key translation function is executed. + +If things were implemented in exactly that way, obviously it would slow everything down a lot. Every subscripting operation gets slowed down everywhere and that's probably not an option. + +the new dunder is *only* effective if added at class creation time, +not if it's added later. You may not care about this, but it is a very +different behaviour than any other dunder method Python supports - so +quite apart from the problems people would have learning and +remembering that this is a special case, you have to document *in your +proposal* that you intend to allow this. And other people *do* care +about disallowing dynamic features like monkeypatching. + + + Adding an adapter function -------------------------- @@ -585,6 +657,117 @@ Moreover, additional ideas and discussion occurred during the re-evaluation of t Has anyone suggested attaching the keyword args as attributes on the slice object? +We'll also need to decide how to combine subscripts and keywords: + + obj[a, b:c, x=1] + # is this a tuple argument (a, slice(b, c), key(x=1)) + # or key argument key(a, slice(b, c), x=1) + +would get the job done, but requires everyone who +needs keyword arguments to parse the tuple and/or key object by hand to +extract them. Having done something similiar in the past (emulating +keyword-only arguments in Python 2), I can tell you this is painful. + +It would also open up to the get/set/del function to always accept arbitrary +keyword arguments, whether they make sense or not. We want the developer +to be able to specify which arguments make sense and which ones do not. + + +Again, implicit on your argument here is the assumption that all keyword +indices necessarily map into positional indices. This may be the case with the +use-case you had in mind. But for other use-cases brought up so far that +assumption is false. Your approach would make those use cases extremely +difficult if not impossible. + +xarray, which is the primary python package for numpy arrays with labelled +dimensions. It supports adding and indexing by additional dimensions that +don't correspond directly to the dimensions of the underlying numpy array, and +those have no position to match up to. They are called "non-dimension +coordinates". + +Other people have wanted to allow parameters to be added when indexing, +arguments in the index that change how the indexing behaves. These don't +correspond to any dimension, either. + + +Adding keywords to indexation for custom classes is not +the same as modifying the standard dict type for typing. + + +Common objections +================= + +> Just use a method call. + +One of the use cases is typing, where the [] is used exclusively, and function calls are out of the question. +Moreover, function calls do not handle slice notation, which is commonly used in some cases for arrays. + +One problem is type hint creation has been extended to built-ins in python 3.9, so that you do not have to import Dict, List, et al anymore. + +Without kwd args inside [ ], you would not be able to do this: + +Vector = dict[i=float, j=float] + +...but for obvious reasons, call syntax using built ins to create custom type hints isn't an option : + +dict(i=float, j=float) # this syntax isn't available + +We could treat d[1, a=3] either as d[1,] + kwargs or as d[1] + kwargs. Have people debated this yet? + + +I don't think that anyone wants adding a keyword to a single-valued +subscript to change it to a tuple. At least, I really hope that nobody +wants this! + +So given the current behaviour: + + obj[1] # calls __getitem__(1) + obj[1,] # calls __getitem__((1,)) + +I expect that the first will be the most common. If we add a keyword to +the subscript: + + obj[1, a=3] + +I would expect that the it turns into `__getitem__(1, a=3)` which is +almost surely what the reader and coder expects. It would be quite weird +for the subscript 1 to turn into a tuple just because I add a keyword. + + +What I should have said was that a[1,] +would continue to create a tuple, regardless of whether old or new style +indexing was happening. + + +d[1] -> d.__getitem__(1) +d[1,] -> d.__getitem__((1,)) +d[1, 2] -> d.__getitem__((1, 2)) +d[a=3] -> d.__getitem__((), a=3) +d[1, a=3] -> d.__getitem__((1,), a=3) +d[1, 2, a=3] -> d.__getitem__((1, 2), a=3) + +d[1] = val -> d.__setitem__(1, val) +d[1,] = val -> d.__setitem__((1,), val) +d[1, 2] = val -> d.__setitem__((1, 2), val) +d[a=3] = val -> d.__setitem__((), val, a=3) +d[1, a=3] = val -> d.__setitem__((1,), val, a=3) +d[1, 2, a=3] = val -> d.__setitem__((1, 2), val, a=3) + +SHOULD BE: +d[1, a=3] -> d.__getitem__(1, a=3) +SHOULD BE: +d[1, a=3] = val -> d.__setitem__(1, val, a=3) + + + +If you're worried about people doing things like + + a[1, 2, 3, value = 4] = 5 + +I'm not sure that's really a problem -- usually it will result in +an exception due to specifying more than one value for a parameter. + + From d3dd065f6e5949d86f0746c1ff5d4745b3861e92 Mon Sep 17 00:00:00 2001 From: Stefano Borini Date: Mon, 14 Sep 2020 08:22:31 +0100 Subject: [PATCH 11/29] Added Steven points --- pep-9999.txt | 330 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 330 insertions(+) diff --git a/pep-9999.txt b/pep-9999.txt index 96946744fbc..c2afcc0ced2 100644 --- a/pep-9999.txt +++ b/pep-9999.txt @@ -295,6 +295,208 @@ as well. This syntax is valid: Alternative Syntax and Semantics (Steven's proposal) ==================================================== + +(1) An empty subscript is still illegal, regardless of context. + + obj[] # SyntaxError + + +(2) A single subscript remains a single argument: + + obj[index] + # calls type(obj).__getitem__(index) + + obj[index] = value + # calls type(obj).__setitem__(index, value) + + del obj[index] + # calls type(obj).__delitem__(index) + +(This remains the case even if the index is followed by keywords; see +point 5 below.) + + +(3) Comma-seperated arguments are still parsed as a tuple and passed as +a single positional argument: + + obj[spam, eggs] + # calls type(obj).__getitem__((spam, eggs)) + + obj[spam, eggs] = value + # calls type(obj).__setitem__((spam, eggs), value) + + del obj[spam, eggs] + # calls type(obj).__delitem__((spam, eggs)) + + +Points (1) to (3) mean that classes which do not want to support keyword +arguments in subscripts need do nothing at all. (Completely backwards +compatible.) + + +(4) Keyword arguments, if any, must follow positional arguments. + + obj[1, 2, spam=None, 3) # SyntaxError + +This is like function calls, where intermixing positional and keyword +arguments give a SyntaxError. + + +(5) Keyword subscripts, if any, will be handled like they are in +function calls. Examples: + + # Single index with keywords: + + obj[index, spam=1, eggs=2] + # calls type(obj).__getitem__(index, spam=1, eggs=2) + + obj[index, spam=1, eggs=2] = value + # calls type(obj).__setitem__(index, value, spam=1, eggs=2) + + del obj[index, spam=1, eggs=2] + # calls type(obj).__delitem__(index, spam=1, eggs=2) + + + # Comma-separated indices with keywords: + + obj[foo, bar, spam=1, eggs=2] + # calls type(obj).__getitem__((foo, bar), spam=1, eggs=2) + + and *mutatis mutandis* for the set and del cases. + + +(6) The same rules apply with respect to keyword subscripts as for +keywords in function calls: + +- the interpeter matches up each keyword subscript to a named parameter + in the appropriate method; + +- if a named parameter is used twice, that is an error; + +- if there are any named parameters left over (without a value) when the + keywords are all used, they are assigned their default value (if any); + +- if any such parameter doesn't have a default, that is an error; + +- if there are any keyword subscripts remaining after all the named + parameters are filled, and the method has a `**kwargs` parameter, + they are bound to the `**kwargs` parameter as a dict; + +- but if no `**kwargs` parameter is defined, it is an error. + + +(7) Sequence unpacking remains a syntax error inside subscripts: + + obj[*items] + +Reason: unpacking items would result it being immediately repacked into +a tuple. Anyone using sequence unpacking in the subscript is probably +confused as to what is happening, and it is best if they receive an +immediate syntax error with an informative error message. + + +(8) Dict unpacking is permitted: + + items = {'spam': 1, 'eggs': 2} + obj[index, **items] + # equivalent to obj[index, spam=1, eggs=2] + + +(9) Keyword-only subscripts are permitted: + + obj[spam=1, eggs=2] + # calls type(obj).__getitem__(spam=1, eggs=2) + + del obj[spam=1, eggs=2] + # calls type(obj).__delitem__(spam=1, eggs=2) + +but note that the setter is awkward since the signature requires the +first parameter: + + obj[spam=1, eggs=2] = value + # wants to call type(obj).__setitem__(???, value, spam=1, eggs=2) + +Proposed solution: this is a runtime error unless the setitem method +gives the first parameter a default, e.g.: + + def __setitem__(self, index=None, value=None, **kwargs) + +Note that the second parameter will always be present, nevertheless, to +satisfy the interpreter, it too will require a default value. + +(Editorial comment: this is undoubtably an awkward and ugly corner case, +but I am reluctant to prohibit keyword-only assignment.) + + +Comments +-------- + +(a) Non-keyword subscripts are treated the same as the status quo, +giving full backwards compatibility. + + +(b) Technically, if a class defines their getter like this: + + def __getitem__(self, index): + +then the caller could call that using keyword syntax: + + obj[index=1] + +but this should be harmless with no behavioural difference. But classes +that wish to avoid this can define their parameters as positional-only: + + def __getitem__(self, index, /): + + +(c) If the method is declared with no positional arguments (aside from +self), only keyword subscripts can be given: + + def __getitem__(self, *, index) + # requires obj[index=1] not obj[1] + +Although this is unavoidably awkward for setters: + + # Intent is for the object to only support keyword subscripts. + def __setitem__(self, i=None, value=None, /, *, index) + if i is not None: + raise TypeError('only keyword arguments permitted') + + +Gotchas +------- + +If the subscript dunders are declared to use positional-or-keyword +parameters, there may be some surprising cases when arguments are passed +to the method. Given the signature: + + def __getitem__(self, index, direction='north') + +if the caller uses this: + + obj[0, 'south'] + +they will probably be surprised by the method call: + + # expected type(obj).__getitem__(0, direction='south') + # but actually get: + obj.__getitem__((0, 'south'), direction='north') + + +Solution: best practice suggests that keyword subscripts should be +flagged as keyword-only when possible: + + def __getitem__(self, index, *, direction='north') + +The interpreter need not enforce this rule, as there could be scenarios +where this is the desired behaviour. But linters may choose to warn +about subscript methods which don't use the keyword-only flag. + + + + + + Steven's proposal thing[ind1, ind2, kwd1=v1, kw2=v2] = value @@ -769,6 +971,134 @@ an exception due to specifying more than one value for a parameter. +> 2. m.__get__(1, 2, a=3, b=4) # change positional argument handling from +> current behavior + +Advantages: + +1. Consistency with other methods and functions. + +Disadvantages: + +1. Breaks backwards compatibility. + +2. Will require a long and painful transition period during which time +libraries will have to somehow support both calling conventions. + + + + +> 3. m.__get__((1, 2), {'a': 3, 'b': 4}) # handling of positional +> arguments unchanged from current behavior + +I assume that if there are no keyword arguments given, only the +first argument is passed to the method (as opposed to passing an +empty dict). If not, the advantages listed below disappear. + +Advantages: + +(1) Existing positional only subscripting does not change (backwards +compatible). + +(2) Requires no extra effort for developers who don't need or want +keyword parameters in their subscript methods. Just do nothing. + +Disadvantages: + +(1) Forces people to do their own parsing of keyword arguments to local +variables inside the method, instead of allowing the interpreter to do +it. + +(2) Compounds the "Special case breaks the rules" of subscript methods +to keyword arguments as well as positional arguments. + +(3) It's not really clear to me that anyone actually wants this, apart +from just suggesting it as an option. What's the concrete use-case for +this? + + + + +> 4. m.__get__(KeyObject( (1, 2), {'a': 3, 'b': 4} )) # change +> positional argument handling from current behavior only in the case that +> kwd args are provided + +Use-case: you want to wrap an arbitrary number of positional arguments, +plus an arbitrary set of keyword arguments, into a single hashable "key +object", for some unstated reason, and be able to store that key object +into a dict. + +Advantage (double-edged, possible): + +(1) Requires no change to the method signature to support keyword +parameters (whether you want them or not, you will get them). + +Disadvantages: + +(1) If you don't want keyword parameters in your subscript methods, you +can't just *do nothing* and have them be a TypeError, you have to +explicitly check for a KeyObject argument and raise: + + def __getitem__(self, index): + if isinstance(item, KeyObject): + raise TypeError('MyClass index takes no keyword arguments') + +(2) Seems to be a completely artificial and useless use-case to me. If +there is a concrete use-case for this, either I have missed it, (in +which case my apologies) or Jonathan seems to be unwilling or unable to +give it. But if you really wanted it, you could get it with this +signature and a single line in the body: + + def __getitem__(self, *args, **kw): + key = KeyObject(*args, **kw) + +(3) Forces those who want named keyword parameters to parse them from +the KeyObject value themselves. + +Since named keyword parameters are surely going to be the most common +use-case (just as they are for other functions), this makes the common +case difficult and the rare and unusual case easy. + +(4) KeyObject doesn't exist. We would need a new builtin type to support +this, as well as the new syntax. This increases the complexity and +maintenance burden of this new feature. + +(5) Compounds the "kind of screwy" (Greg's words) nature of subscripting +by extending it to keyword arguments as well as positional arguments. + + + +but it would still not know that: + +t = (1,2,3) +something[t] + +is the same as: + +something[1,2,3] + +would it? + + + + + a[17, 42] + a[time = 17, money = 42] + a[money = 42, time = 17] + +With a fresh new dunder, it's dead simple: + + def __getindex__(self, time, money): + ... + +With a __getitem__ that's been enhanced to take keyword args, but +still get positional args packed into a tuple, it's nowhere near +as easy. + + + + + References From 369f5d32d5a8236779a85301fa974624ea56eb61 Mon Sep 17 00:00:00 2001 From: Stefano Borini Date: Mon, 14 Sep 2020 17:11:12 +0100 Subject: [PATCH 12/29] Polishing the use cases and general opening discussion --- pep-9999.txt | 66 ++++++++++++++++++++-------------------------------- 1 file changed, 25 insertions(+), 41 deletions(-) diff --git a/pep-9999.txt b/pep-9999.txt index c2afcc0ced2..c35ef9b14ee 100644 --- a/pep-9999.txt +++ b/pep-9999.txt @@ -18,7 +18,7 @@ Abstract This PEP proposed extending python to allow keyword-like arguments to be accepted during indexing operations. Notations in the form ``a[42, K=3, R=2]`` -would become legal syntax. A final strategy will be proposed in terms of +would become legal syntax. A strategy will be proposed in terms of semantics and implementation. This PEP is a rework and expansion of PEP 472, where an extension of the @@ -32,9 +32,14 @@ Overview Background ---------- -PEP 472 was opened in 2014. The PEP focused on various use cases and was extracted -from a broad discussion on implementation strategies. The PEP was eventually rejected -in 2019 [#rejection]_ mostly due to lack of interest despite its 5 years of existence. +PEP 472 was opened in 2014. The PEP detailed various use cases and was created by +extracting implementation strategies from a broad discussion on the +python-ideas mailing list, although no clear consensus was reached on which strategy +should be used. Many corner cases have been examined more closely and felt +awkward, backward incompatible or both. + +The PEP was eventually rejected in 2019 [#rejection]_ mostly +due to lack of interest for the feature despite its 5 years of existence. However, with the introduction of type hints in PEP 484 [#pep-0484]_ the square bracket notation has been used consistently to enrich the typing @@ -45,15 +50,11 @@ an nd-array (xarray). These packages allow users to access specific data by names, but cannot currently use index notation ([]) for this functionality. As a result, a renewed interest in a more flexible syntax that would allow for -named information has been expressed in many different threads. - -During the investigation of PEP 472, many different strategies have been -proposed to expand the language, but no real consensus was reached. Many corner -cases have been examined more closely and felt awkward, backward incompatible -or both. Renewed interest was prompted by Caleb Donovick [#request-1]_ in 2019 -and Andras Tantos [#request-2]_ in 2020. These requests prompted a strong activity -on the python-ideas mailing list, where various options have been discussed and -a general consensus has been reached. +named information has been expressed occasionally in many different threads on +python-ideas, recently by Caleb Donovick [#request-1]_ in 2019 and Andras +Tantos [#request-2]_ in 2020. These requests prompted a strong activity on the +python-ideas mailing list, where the various options have been re-discussed and +a general consensus on an implementation strategy has now been reached. Use cases --------- @@ -250,46 +251,29 @@ this one isn't a[] -Compatibility Hard Points -------------------------- - -Any change to the current behavior -After discussion, it was found out that the new syntax will have a fixed set of hard points, no matter -the final implementation: - -* Invoking indexing _must_ accept some object. E.g. `a[]` is still syntax error. -* It must be possible to mix single values and named indexes, e.g. `a[1, 2, foo=3]` -* No walrus allowed. E.g. `a[foo:=3] is disallowed. - - New Proposal ------------ -We propose to allow notations involving keyword arguments in the indexing -operation, e.g. +The new notation will make all of the following valid notation: :: - >>> a[K=3, R=2] - -which would allow a more flexible way to indicise content. - -One must additionally consider the extended form that allows both positional -and keyword specification - -:: - - >>> a[3, R=3, K=4] - + >>> a[1] # Current case, single index + >>> a[1, 2] # Current case, multiple indexes + >>> a[1, 2:5] # Current case, slicing. + >>> a[3, R=3, K=4] # New case. Single index, and keyword arguments + >>> a[K=3, R=2] # New case. No index with keyword arguments + >>> a[3, R=3:10, K=4] # New case. Slice in keyword argument -We also ensure that the current semantic for slices is applied to keyword arguments -as well. This syntax is valid: +The new notation will NOT make the following valid notation: :: - >>> a[3, R=3:10, K=4] + >>> a[] # INVALID. No index and no keyword arguments. +Throughout this proposal, we will stress the difference between _index_ and keyword _argument_, as it is important +to understand the fundamental Alternative Syntax and Semantics (Steven's proposal) From 254fdfc108da25b2e9c731571c9d9dcce1fe3b0f Mon Sep 17 00:00:00 2001 From: Stefano Borini Date: Tue, 15 Sep 2020 08:23:45 +0100 Subject: [PATCH 13/29] Polished all cases from Steven proposal --- pep-9999.txt | 289 +++++++++++++++++++++++++++------------------------ 1 file changed, 153 insertions(+), 136 deletions(-) diff --git a/pep-9999.txt b/pep-9999.txt index c35ef9b14ee..764fb207f43 100644 --- a/pep-9999.txt +++ b/pep-9999.txt @@ -254,6 +254,34 @@ this one isn't New Proposal ------------ +Before describing the new proposal, it is important to stress the difference in +nomenclature between _index_ and keyword _argument_, as it is important to +understand the fundamental asymmetry between the two. The ``__(get|set|del)item__`` +is fundamentally an indexing operation, and the way the element is retrieved, +set, or deleted is through an index. + +The current status quo is to build a _final_ index from what is passed between +square brackets, the _positional_ index. In other words, what is passed in the +square brackets is trivially used to generate what the code in ``__getitem__`` then uses +for the indicisation operation. As we already saw for the dict, ``d[1]`` has a +positional index of ``1`` and also a final index of ``1`` (because it's the element that is +then added to the dictionary) and ``d[1, 2]`` has positional index of ``(1, 2)`` and +final index also of ``(1, 2)`` (because yet again it's the element that is added to the dictionary). +However, the positional index ``d[1,2:3]`` is not accepted by the dictionary, because +there's no way to transform the positional index into a final index, as the slice object is +unhashable. The positional index is what is currently known as the ``index`` parameter in +``__getitem__``. Nevertheless, nothing prevents to construct a dictionary-like class that +creates the final index by e.g. converting the positional index to a string. + +The new proposal extends the current status quo, and grants more flexibility to +create the _final_ index via an enhanced syntax that combines the positional index +and keyword arguments, if passed. + +The above brings an important point across. Keyword arguments, in the context of the index +operation, may be used to take indexing decisions to obtain the final index, and therefore +will have to accept values that are unconventional for functions. See for +example use case 1, where a slice is accepted. + The new notation will make all of the following valid notation: :: @@ -264,6 +292,7 @@ The new notation will make all of the following valid notation: >>> a[3, R=3, K=4] # New case. Single index, and keyword arguments >>> a[K=3, R=2] # New case. No index with keyword arguments >>> a[3, R=3:10, K=4] # New case. Slice in keyword argument + >>> a[3, R=..., K=4] # New case. Ellipsis in keyword argument The new notation will NOT make the following valid notation: @@ -272,104 +301,107 @@ The new notation will NOT make the following valid notation: >>> a[] # INVALID. No index and no keyword arguments. -Throughout this proposal, we will stress the difference between _index_ and keyword _argument_, as it is important -to understand the fundamental +Syntax and Semantics +==================== -Alternative Syntax and Semantics (Steven's proposal) -==================================================== +The following old semantics are preserved: +1. As said above, an empty subscript is still illegal, regardless of context. -(1) An empty subscript is still illegal, regardless of context. +:: obj[] # SyntaxError -(2) A single subscript remains a single argument: +2. A single index value remains a single index value when passed: + +:: obj[index] - # calls type(obj).__getitem__(index) + # calls type(obj).__getitem__(obj, index) obj[index] = value - # calls type(obj).__setitem__(index, value) + # calls type(obj).__setitem__(obj, index, value) del obj[index] - # calls type(obj).__delitem__(index) + # calls type(obj).__delitem__(obj, index) -(This remains the case even if the index is followed by keywords; see -point 5 below.) +This remains the case even if the index is followed by keywords; see point 5 below. - -(3) Comma-seperated arguments are still parsed as a tuple and passed as +3. Comma-seperated arguments are still parsed as a tuple and passed as a single positional argument: +:: + obj[spam, eggs] - # calls type(obj).__getitem__((spam, eggs)) + # calls type(obj).__getitem__(obj, (spam, eggs)) obj[spam, eggs] = value - # calls type(obj).__setitem__((spam, eggs), value) + # calls type(obj).__setitem__(obj, (spam, eggs), value) del obj[spam, eggs] - # calls type(obj).__delitem__((spam, eggs)) + # calls type(obj).__delitem__(obj, (spam, eggs)) -Points (1) to (3) mean that classes which do not want to support keyword -arguments in subscripts need do nothing at all. (Completely backwards -compatible.) +The points above mean that classes which do not want to support keyword +arguments in subscripts need do nothing at all, and the feature is therefore +completely backwards compatible. +4. Keyword arguments, if any, must follow positional arguments. -(4) Keyword arguments, if any, must follow positional arguments. +:: - obj[1, 2, spam=None, 3) # SyntaxError + obj[1, 2, spam=None, 3] # SyntaxError This is like function calls, where intermixing positional and keyword arguments give a SyntaxError. - -(5) Keyword subscripts, if any, will be handled like they are in +5. Keyword subscripts, if any, will be handled like they are in function calls. Examples: +:: + # Single index with keywords: obj[index, spam=1, eggs=2] - # calls type(obj).__getitem__(index, spam=1, eggs=2) + # calls type(obj).__getitem__(obj, index, spam=1, eggs=2) obj[index, spam=1, eggs=2] = value - # calls type(obj).__setitem__(index, value, spam=1, eggs=2) + # calls type(obj).__setitem__(obj, index, value, spam=1, eggs=2) del obj[index, spam=1, eggs=2] - # calls type(obj).__delitem__(index, spam=1, eggs=2) - + # calls type(obj).__delitem__(obj, index, spam=1, eggs=2) # Comma-separated indices with keywords: obj[foo, bar, spam=1, eggs=2] - # calls type(obj).__getitem__((foo, bar), spam=1, eggs=2) + # calls type(obj).__getitem__(obj, (foo, bar), spam=1, eggs=2) - and *mutatis mutandis* for the set and del cases. + obj[foo, bar, spam=1, eggs=2] = value + # calls type(obj).__setitem__(obj, (foo, bar), value, spam=1, eggs=2) + del obj[foo, bar, spam=1, eggs=2] + # calls type(obj).__detitem__(obj, (foo, bar), spam=1, eggs=2) -(6) The same rules apply with respect to keyword subscripts as for +6. The same rules apply with respect to keyword subscripts as for keywords in function calls: -- the interpeter matches up each keyword subscript to a named parameter - in the appropriate method; - -- if a named parameter is used twice, that is an error; - -- if there are any named parameters left over (without a value) when the - keywords are all used, they are assigned their default value (if any); - -- if any such parameter doesn't have a default, that is an error; - -- if there are any keyword subscripts remaining after all the named - parameters are filled, and the method has a `**kwargs` parameter, - they are bound to the `**kwargs` parameter as a dict; + - the interpeter matches up each keyword subscript to a named parameter + in the appropriate method; + - if a named parameter is used twice, that is an error; + - if there are any named parameters left over (without a value) when the + keywords are all used, they are assigned their default value (if any); + - if any such parameter doesn't have a default, that is an error; + - if there are any keyword subscripts remaining after all the named + parameters are filled, and the method has a `**kwargs` parameter, + they are bound to the `**kwargs` parameter as a dict; + - but if no `**kwargs` parameter is defined, it is an error. -- but if no `**kwargs` parameter is defined, it is an error. +7. Sequence unpacking remains a syntax error inside subscripts: -(7) Sequence unpacking remains a syntax error inside subscripts: +:: obj[*items] @@ -378,90 +410,112 @@ a tuple. Anyone using sequence unpacking in the subscript is probably confused as to what is happening, and it is best if they receive an immediate syntax error with an informative error message. +8. Dict unpacking is permitted: -(8) Dict unpacking is permitted: +:: items = {'spam': 1, 'eggs': 2} obj[index, **items] # equivalent to obj[index, spam=1, eggs=2] -(9) Keyword-only subscripts are permitted: +9. Keyword-only subscripts are permitted. The positional index will be the empty tuple: + +:: obj[spam=1, eggs=2] - # calls type(obj).__getitem__(spam=1, eggs=2) + # calls type(obj).__getitem__(obj, (), spam=1, eggs=2) + + obj[spam=1, eggs=2] = 5 + # calls type(obj).__setitem__(obj, (), 5, spam=1, eggs=2) del obj[spam=1, eggs=2] - # calls type(obj).__delitem__(spam=1, eggs=2) + # calls type(obj).__delitem__(obj, (), spam=1, eggs=2) -but note that the setter is awkward since the signature requires the -first parameter: - obj[spam=1, eggs=2] = value - # wants to call type(obj).__setitem__(???, value, spam=1, eggs=2) +10. Keyword arguments must allow slice syntax. -Proposed solution: this is a runtime error unless the setitem method -gives the first parameter a default, e.g.: +:: - def __setitem__(self, index=None, value=None, **kwargs) + obj[3:4, spam=1:4, eggs=2] + # calls type(obj).__getitem__(obj, slice(3, 4, None), spam=slice(1, 4, None), eggs=2) -Note that the second parameter will always be present, nevertheless, to -satisfy the interpreter, it too will require a default value. -(Editorial comment: this is undoubtably an awkward and ugly corner case, -but I am reluctant to prohibit keyword-only assignment.) +This may open up the possibility to accept the same syntax for general function +calls, but this is not part of this recommendation. +11. Keyword arguments must allow Ellipsis + +:: + + obj[..., spam=..., eggs=2] + # calls type(obj).__getitem__(obj, Ellipsis, spam=Ellipsis, eggs=2) -Comments --------- -(a) Non-keyword subscripts are treated the same as the status quo, -giving full backwards compatibility. +12. Keyword arguments allow for default values + +:: + # Given type(obj).__getitem__(obj, index, spam=True, eggs=2) + obj[3] # Valid. index = 3, spam = True, eggs = 2 + obj[3, spam=False] # Valid. index = 3, spam = False, eggs = 2 + obj[spam=False] # Valid. index = (), spam = False, eggs = 2 + obj[] # Invalid. + +Corner case and Gotchas +----------------------- -(b) Technically, if a class defines their getter like this: +With the introduction of the new notation, a few corner cases need to be analysed: + +1. Technically, if a class defines their getter like this: + +:: def __getitem__(self, index): then the caller could call that using keyword syntax: +:: + obj[index=1] but this should be harmless with no behavioural difference. But classes that wish to avoid this can define their parameters as positional-only: - def __getitem__(self, index, /): - +:: -(c) If the method is declared with no positional arguments (aside from -self), only keyword subscripts can be given: + def __getitem__(self, index, /): - def __getitem__(self, *, index) - # requires obj[index=1] not obj[1] +2. a similar case occurs with setter notation -Although this is unavoidably awkward for setters: +:: - # Intent is for the object to only support keyword subscripts. - def __setitem__(self, i=None, value=None, /, *, index) - if i is not None: - raise TypeError('only keyword arguments permitted') + # Given type(obj).__getitem__(self, index, value): + obj[1, value=3] = 5 -Gotchas -------- +This poses no issue because the value is passed automatically, and the python interpreter will raise +``TypeError: got multiple values for keyword argument 'value'`` + -If the subscript dunders are declared to use positional-or-keyword +3. If the subscript dunders are declared to use positional-or-keyword parameters, there may be some surprising cases when arguments are passed to the method. Given the signature: +:: + def __getitem__(self, index, direction='north') if the caller uses this: +:: + obj[0, 'south'] they will probably be surprised by the method call: +:: + # expected type(obj).__getitem__(0, direction='south') # but actually get: obj.__getitem__((0, 'south'), direction='north') @@ -470,6 +524,8 @@ they will probably be surprised by the method call: Solution: best practice suggests that keyword subscripts should be flagged as keyword-only when possible: +:: + def __getitem__(self, index, *, direction='north') The interpreter need not enforce this rule, as there could be scenarios @@ -481,75 +537,21 @@ about subscript methods which don't use the keyword-only flag. -Steven's proposal - -thing[ind1, ind2, kwd1=v1, kw2=v2] = value - -Translating to: - -thing.__setitem__(self, (ind1, ind2), value, kwd1=v1, kw2=v2) - -which is pretty darn weird -- particularly if you try to write the handler this way: - -def __setitem__(self, *args, **kwargs): - -so: args would always be a 2-tuple, something like: ((ind1, ind2), value) - - - def __getitem__(self, index, *, spam, eggs=None): - - -Is that even a question? - obj[index, keyword=value] -where index is any comma-separated list of expressions, including -slices, keyword is an identifier, and value is any expression including -slices. Are there even any other options being considered? -A few points: -- The index is optional, but it will be a runtime TypeError if the -method isn't defined with the corresponding parameter taking a default -value. -- You can have multiple keyword=value pairs, separated by commas. They -must all follow the index part. Duplicate keywords is a runtime error, -just as they are are for function calls. -- An empty subscript remains a syntax error, even if the method -signature would allow it. - think the simplest default value would be Ellipsis. So foo[a=1, b=2] would be equivalent to foo[..., a=1, b=2] -But I don't see why this is a problem we have to deal with. The index argument can just not be passed at all, and it is up to the class developer to pick an appropriate sentinel if needed. -The index argument can just not be passed at all, and it is up to the class developer to pick an appropriate sentinel if needed. -though it would be a (TypeError, rather than a SyntaxError) if no index were passed in. -As "proper" exception handling should be close to the operation, and catch specific Exceptions, most case will probably be fine. But not all. For example, there might be code in the wild that does -try: - a_function(something) -except TypeError: - do_something -And there is something in a_function that uses indexing -- someone messes with that code, and puts something new in an index that used to be a SyntaxError and is now a TypeError -- in the past, that wouldn't have even run, but now it will, and the problem might not be caught in tests because the TypeError is being handled. -Given that this would be a change from a compile time error to a runtime error, there is no code that used to work that will break, but it would be easier to write broken code in certain situations -- maybe not a huge deal, but worth thinking about. -MISSING = object() - -def __getitem__(self, key, x=MISSING, y=MISSING): - if x is MISSING and y is MISSING:: - x, y = key - if x is missing: - x, = key - if y is MISSING: - y, = key - -And probably that code I just wrote has bugs. And it gets more complicated if we want to have more arguments than just two. And even more complicated if we want some of the arguments to be positional only or any other combination of things. This is code you would not have to write if we could do this instead with a new dunder or subscript processor: @@ -764,15 +766,12 @@ remembering that this is a special case, you have to document *in your proposal* that you intend to allow this. And other people *do* care about disallowing dynamic features like monkeypatching. - - Adding an adapter function -------------------------- Similar to the above, in the sense that a pre-function would be called to convert the "new style" indexing into "old style indexing" that is then passed. Has problems similar to the above. - A single bit to change the behavior ----------------------------------- @@ -1081,6 +1080,24 @@ as easy. +on non-specified parameter index +but note that the setter is awkward since the signature requires the +first parameter: + + obj[spam=1, eggs=2] = value + # wants to call type(obj).__setitem__(???, value, spam=1, eggs=2) + +Proposed solution: this is a runtime error unless the setitem method +gives the first parameter a default, e.g.: + + def __setitem__(self, index=None, value=None, **kwargs) + +Note that the second parameter will always be present, nevertheless, to +satisfy the interpreter, it too will require a default value. + +(Editorial comment: this is undoubtably an awkward and ugly corner case, +but I am reluctant to prohibit keyword-only assignment.) + From b5f090b62ca80ebb735ceca4410fe825a178ac3f Mon Sep 17 00:00:00 2001 From: Stefano Borini Date: Thu, 17 Sep 2020 08:59:20 +0100 Subject: [PATCH 14/29] More cleanup to various sections --- pep-9999.txt | 403 ++++++++++++++++++++++++--------------------------- 1 file changed, 192 insertions(+), 211 deletions(-) diff --git a/pep-9999.txt b/pep-9999.txt index 764fb207f43..375a047d56d 100644 --- a/pep-9999.txt +++ b/pep-9999.txt @@ -301,6 +301,11 @@ The new notation will NOT make the following valid notation: >>> a[] # INVALID. No index and no keyword arguments. +It is worth stressing out that none of what is proposed in this PEP will change +the behavior of the current core classes that use indexing. Adding keywords to indexation for +custom classes is not the same as modifying e.g. the standard dict type, which +will remain the same and will continue not to accept keyword arguments. + Syntax and Semantics ==================== @@ -373,6 +378,11 @@ function calls. Examples: del obj[index, spam=1, eggs=2] # calls type(obj).__delitem__(obj, index, spam=1, eggs=2) +This ensures that a single positional index will not turn into a tuple +just because one adds a keyword value. + +:: + # Comma-separated indices with keywords: obj[foo, bar, spam=1, eggs=2] @@ -384,6 +394,7 @@ function calls. Examples: del obj[foo, bar, spam=1, eggs=2] # calls type(obj).__detitem__(obj, (foo, bar), spam=1, eggs=2) + 6. The same rules apply with respect to keyword subscripts as for keywords in function calls: @@ -410,6 +421,9 @@ a tuple. Anyone using sequence unpacking in the subscript is probably confused as to what is happening, and it is best if they receive an immediate syntax error with an informative error message. +This restriction has however been considered arbitrary by some, and it might +be lifted at a later stage for symmetry with kwargs unpacking, see next. + 8. Dict unpacking is permitted: :: @@ -462,6 +476,15 @@ calls, but this is not part of this recommendation. obj[] # Invalid. +Existing indexing implementations in standard classes +----------------------------------------------------- + +As said before, we recommend that current classes that use indexing operations +do not modify their behavior. In other words, if ``d`` is a ``dict``, the +statement ``d[1, a=2]`` will raise ``TypeError``, as their implementation will +not support the use of keyword arguments. The same holds for all other classes +(list, frozendict, etc.) + Corner case and Gotchas ----------------------- @@ -473,19 +496,30 @@ With the introduction of the new notation, a few corner cases need to be analyse def __getitem__(self, index): -then the caller could call that using keyword syntax: +then the caller could call that using keyword syntax, like these two cases: :: + obj[3, index=4] obj[index=1] -but this should be harmless with no behavioural difference. But classes -that wish to avoid this can define their parameters as positional-only: +The resulting behavior would be an error automatically, since it would be like +attempting to call the method with two values for the ``index`` argument, and +a ``TypeError`` will be raised. In the first case, the ``index`` would be ``3``, +in the second case, it would be the empty tuple ``()``. + +Note that this behavior applies for all currently existing classes that rely on +indexing, meaning that there is no way for the new behavior to introduce +backward compatibility issues on this respect. + +Classes that wish to stress this behavior explicitly can define their +parameters as positional-only: :: def __getitem__(self, index, /): + 2. a similar case occurs with setter notation :: @@ -533,208 +567,207 @@ where this is the desired behaviour. But linters may choose to warn about subscript methods which don't use the keyword-only flag. +4. As we saw, a single value followed by a keyword argument will not be changed into a tuple, i.e.: + ``d[1, a=3]`` is treated as ``__getitem__(1, a=3)``, not ``__getitem__((1,), a=3)`` + In other words, adding a keyword to a single-valued subscript will not change it into a tuple. + For those cases where an actual tuple needs to be passed, a proper syntax will have to be used: + So given the current behaviour, the fourth case below is required to + disambiguate the call, making it an explicit tuple. Note that this behavior + just reveals the truth that the ``obj[1,]`` notation is shorthand for + ``obj[(1,)]`` (and also ``obj[1]`` is shorthand for ``obj[(1)]``, with the expected behavior). + When keywords are present, the rule that you can omit this outermost pair of parentheses is no + longer true. +:: + obj[1] # calls __getitem__(1) + obj[1, a=3] # calls __getitem__(1, a=3) + obj[1,] # calls __getitem__((1,)) + obj[(1,), a=3] # calls __getitem__((1,), a=3) +This is particularly relevant in the following case: +:: + obj[1, 2] # calls __getitem__((1, 2)) + obj[(1, 2)] # same as above + obj[1, 2, a=3] # calls __getitem__((1, 2), a=3) + obj[(1, 2), a=3] # calls __getitem__(((1, 2),), a=3). Not __getitem__((1, 2), a=3) +Rejected Ideas +============== +Previous PEP 472 solutions +-------------------------- +PEP 472 presents a good amount of ideas that are now all to be considered +Rejected. A personal email from D'Aprano to the Author specifically said: +"I have now carefully read through PEP 472 in full, and I am afraid I +cannot support any of the strategies currently in the PEP." +We agree that those options are inferior to the currently presented, for one +reason or another. +Adding new dunders +------------------ +It was proposed to introduce new dunders __(get|set|del)item_ex__ +that are invoked over the __(get|set|del)item__ triad, if they are present. +The rationale around this choice is to make the intuition around how to add kwd +arg support to square brackets more obvious and in line with the function +behavior. +Adding an adapter function +-------------------------- +Similar to the above, in the sense that a pre-function would be called to +convert the "new style" indexing into "old style indexing" that is then passed. +Has problems similar to the above. +create a new "kwslice" object +----------------------------- -This is code you would not have to write if we could do this instead with a new dunder or subscript processor: - -def __getx__(self, x, y): ... - -And these all just work: - -q[1, 2] -q[1, y=2] -q[y=2, x=1] - -1 is assigned to x and 2 is assigned to y in all of these for both versions, but the second certain requires not parsing of parameters. Python does it for us. That's a lot of easily available flexibility. - - -We could treat d[1, a=3] either as d[1,] + kwargs or as d[1] + kwargs. Have people debated this yet? - - -I don't think that anyone wants adding a keyword to a single-valued -subscript to change it to a tuple. At least, I really hope that nobody -wants this! - -So given the current behaviour: - - obj[1] # calls __getitem__(1) - obj[1,] # calls __getitem__((1,)) - -I expect that the first will be the most common. If we add a keyword to -the subscript: - - obj[1, a=3] +This proposal has already been explored in "New arguments contents" P4 in PEP 472. -I would expect that the it turns into `__getitem__(1, a=3)` which is -almost surely what the reader and coder expects. It would be quite weird -for the subscript 1 to turn into a tuple just because I add a keyword. +:: -That does leave the second case a little trickier to add a keyword to, -it would require a pair of parens to disambiguate it from above: + obj[a, b:c, x=1] # calls __getitem__(a, slice(b, c), key(x=1)) - obj[(1,), a=3] +This solution requires everyone who needs keyword arguments to parse the tuple +and/or key object by hand to extract them. This is painful and opens up to the +get/set/del function to always accept arbitrary keyword arguments, whether they +make sense or not. We want the developer to be able to specify which arguments +make sense and which ones do not. -but I think that's likely to be obvious to the developer who is adding -in the keyword where previously no keyword existed. +Allowing for empty index notation obj[] +--------------------------------------- +The current proposal prevents ``obj[]`` from being valid notation. However +a commenter stated -That's a fair ruling. In general, when keywords are present, the rule that you can always omit an outermost pair of parentheses is no longer true. That is, d[(...)] and d[...] are always equivalent regardless what "..." stands for, as long as (...) is a valid expression (which it isn't if there are slices involved). Example: ``` -d[1] ~~~ d[(1)] -d[1,] ~~~ d[(1,)] -d[1, 2] ~~~ d[(1, 2)] +We have `Tuple[int, int]` as a tuple of two integers. And we have `Tuple[int]` +as a tuple of one integer. And occasionally we need to spell a tuple of *no* +values, since that's the type of `()`. But we currently are forced to write +that as `Tuple[()]`. If we allowed `Tuple[]` that odd edge case would be +removed. + +So I probably would be okay with allowing `obj[]` syntactically, as long as the +dict type could be made to reject it. ``` -But there is absolutely no such rule if keywords are present. - -FYI, Jonathan's post (once I "got" it) led me to a new way of reasoning about the various proposals (__keyfn__, __subscript__ and what I will keep calling "Steven's proposal") based on what the compiler and interpreter need to do to support this corner case. My tentative conclusion is that Steven's proposal is superior. But I have been reviewing my reasoning and pseudo-code a few times and I'm still not happy with it, so posting it will have to wait. - - -What the dict class's `__getitem__` would do with that is a different issue -- probably it would be an error. - - -> Doesn't that mean that a "index" will not be an allowable index label, and that this conflict will depend on knowing the particular implementation details of the dunder methods? -> - -Yes, that would be correct. However, the function could instead be defined as: - -def __getitem__(self, index, /, **kwargs): - ... - -and then there'd be no conflict (as "self" and "index" must be passed -positionally). - -Good edge case to consider. But would it really be such a problem? If you have an existing class like this: - - class C: - def __getitem__(self, index): ... - - c = C() -then presumably calling `c[1, index=2]` would just be an error (since it would be like attempting to call the method with two values for the `index` argument), and ditto for `c[1, 2, 3, index=4]`. The only odd case might be `c[index=1]` -- but presumably that would be equivalent to `c[(), index=1]` so it would still fail. +This proposal already established that, in case no positional index is given, the +passed value must be the empty tuple. Allowing for the empty index notation would +make the dictionary type accept it automatically, to insert or refer to the value with +the empty tuple as key. Moreover, a typing notation such as ``Tuple[]`` can easily +be written as ``Tuple`` without the indexing notation. -My point is that all existing `__getitem__` implementations will raise errors if any keywords are given, even if the keyword happens to correspond to the name of the argument (say, `index`). This is to counter Chris B's concern that if an existing `__getitem__` implementation didn't use the '/' notation to indicate that `self` and `index` are positional, it would have a bug. I claim that the bug will *only* be present when someone adds keyword support to their `__getitem__` method without using the '/'. Since that's not existing code, this "proves" that adding this feature would not introduce a subtle bug in a lot of existing code -- only in carelessly written new (or updated) code. +Use None instead of the empty tuple when no positional index is given +--------------------------------------------------------------------- -The primary issue I was trying to find a way to reliably and clearly avoid conflicts between the index labels and the positional argument names. So if you have: +The case ``obj[k=3]`` will lead to a call ``__getitem__((), k=3)``. +The alternative ``__getitem__(None, k=3)`` was considered but rejected: +NumPy uses `None` to indicate inserting a new axis/dimensions (there's +a `np.newaxis` alias as well): -__getitem__(self, index, **kwargs) - -You can't have an index label named "index", because it conflicts with the "index" positional argument. - -Apparently that isn't an issue if you structure it like this instead: - -__getitem__(self, index, /, **kwargs) +:: -But projects would need to know to do that. + arr = np.array(5) + arr.ndim == 0 + arr[None].ndim == arr[None,].ndim == 1 -> In the "New syntax", wouldn't these examples map to: -> -> d[1, 2, a=3] => d.__getitem__((1, 2), a=3) -> and -> d[(1, 2), a=3] => d.__getitem__((1, 2), a=3) -> -Not quite. The second should be: +So the final conclusion is that we favor the following series: -d[(1, 2), a=3] => d.__getitem__(((1, 2),), a=3) +:: - py> d = Demo() - py> d[(1, 2)] # Tuple single arg. - (1, 2) - py> d[1, 2] # Still a tuple. - (1, 2) + obj[k=3] # __getitem__((), k=3). Empty tuple + obj[1, k=3] # __getitem__(1, k=3). Integer + obj[1, 2, k=3] # __getitem__((1, 2), k=3). Tuple +more than this: -Adding a keyword arg should not change this. +:: -An extra **kwds would be quite sufficient for xarray. We don't need to distinguish between `d[day=3, detector=4]` and `d[day=4, detector=3]`, at least not any differently from normal Python keyword arguments. + obj[k=3] # __getitem__(None, k=3). None + obj[1, k=3] # __getitem__(1, k=3). Integer + obj[1, 2, k=3] # __getitem__((1, 2), k=3). Tuple -One question that comes up: should d[**kwargs] be valid syntax? d[*args] currently is not, but that's OK since d[tuple(args)] is identical. +With the first more in line with a *args semantics for calling a routine with +no positional arguments -On the other hand, we probably do need d[**kwargs] since there's no way to dynamically unpack keyword arguments (short of directly calling __getitem__). And perhaps for symmetry this suggests d[*args] should be valid, too, defined as equivalent to d[tuple(args)]. +:: + >>> def foo(*args, **kwargs): + ... print(args, kwargs) + ... + >>> foo(k=3) + () {'k': 3} -If d[] were to be allowed, I would expect it to pass an empty -tuple as the index, since it's the limiting case of reducing the -number of positional indices. +Although we accept the following asymmetry: +:: -We have `Tuple[int, int]` as a tuple of two integers. And we have `Tuple[int]` as a tuple of one integer. And occasionally we need to spell a tuple of *no* values, since that's the type of `()`. But we currently are forced to write that as `Tuple[()]`. If we allowed `Tuple[]` that odd edge case would be removed. + >>> foo(1, k=3) + (1,) {'k': 3} -So I probably would be okay with allowing `obj[]` syntactically, as long as the dict type could be made to reject it. -For what its worth, NumPy uses `None` to indicate inserting a new -axis/dimensions (we have an `np.newaxis` alias as well): +Common objections +================= - arr = np.array(5) - arr.ndim == 0 - arr[None].ndim == arr[None,].ndim == 1 +1. Just use a method call. -So that would be problematic. There are two (subtly different [1]) -acceptable choices for `ndarray[]`: +One of the use cases is typing, where the indexing is used exclusively, and +function calls are out of the question. Moreover, function calls do not handle +slice notation, which is commonly used in some cases for arrays. -I think it is worth directly discussing the availability of slices in PEP 472-style keyword indices, since we seem to have mostly converged on a dunder method signature. This is an issue that has been alluded to regarding keyword-based (labelled) indices but not directly addressed. The basic syntax would be something like d[x=1:3]. +One problem is type hint creation has been extended to built-ins in python 3.9, +so that you do not have to import Dict, List, et al anymore. +Without kwdargs inside [], you would not be able to do this: -Type hints are indeed dispatched differently, but this is done based on information that is only available at runtime. Since PEP 560, for `x[y]`, if no `__getitem__` method is found, and `x` is a type (class) object, and `x` has a class method `__class_getitem__`, that method is called. Extending this with keyword args is straightforward. Modifying the compiler to generate different bytecode for this case is essentially impossible. +:: + + Vector = dict[i=float, j=float] -See https://github.com/python/cpython/blob/6844b56176c41f0a0e25fcd4fef5463bcdbc7d7c/Objects/abstract.c#L181-L198 for the code (it's part of PyObject_GetItem). +but for obvious reasons, call syntax using builtins to create custom type hints +isn't an option: +:: -Should -q[1, 2, k=3] -q[(1, 2), k=3] + dict(i=float, j=float) # would create a dictionary, not a type -evaluate the same way? -hat is, `d[::]` is syntactically valid, but `d[(::)]` is not. Try it. +*********************************************************************************** +The problems with this approach were found to be: -Alternative Syntax and semantics -================================ +* It will slow down subscripting. For every subscript access, this new dunder + attribute gets investigated on the class, and if it is not present then the + default key translation function is executed. Different ideas were proposed to handle this, from wrapping the method + only at class instantiation time (would not work when monkeypatching) -Adding new dunders ------------------- +* It adds complexity -This proposal introduces new dunders __(get|set|del)item_ex__ -that are invoked over the __(get|set|del)item__ triad, if they are present. +def __getx__(self, x, y): ... -The rationale around this choice is to make the intuition around how to add kwd -arg support to square brackets more obvious and in line with the function -behavior. It would also make writing code for specialized libraries that tend -to use item dunders, like pandas and xarray, much easier. Right now such -libraries have to rely on their own efforts to break up a key, or use -"functions in stead" (e.g. iloc()) +And these all just work: +q[1, 2] +q[1, y=2] +q[y=2, x=1] +1 is assigned to x and 2 is assigned to y in all of these for both versions, but the second certain requires not parsing of parameters. Python does it for us. That's a lot of easily available flexibility. -Problems with this approach: -* __setitem_ex__ value would need to be the first element, because the index is of arbitrary length. -* It will slow down subscripting. For every subscript access, this new dunder - attribute gets investigated on the class, and if it is not present then the - default key translation function is executed. Different ideas were proposed to handle this, from wrapping the method - only at class instantiation time (would not work when monkeypatching) -* It adds complexity +* In __setitem_ex__ value would need to be the first element, because the index is of arbitrary length. Again, implicit on your argument here is the assumption that all keyword indices necessarily map into positional indices. This may be the case with the use-case you had in mind. But for other use-cases brought up so far that assumption is false. xarray, which is the primary python package for numpy arrays with labelled dimensions. It supports adding and indexing by additional dimensions that don't correspond directly to the dimensions of the underlying numpy array, and those have no position to match up to. They are called "non-dimension coordinates". @@ -766,11 +799,30 @@ remembering that this is a special case, you have to document *in your proposal* that you intend to allow this. And other people *do* care about disallowing dynamic features like monkeypatching. -Adding an adapter function --------------------------- -Similar to the above, in the sense that a pre-function would be called to convert the "new style" indexing into "old style indexing" that is then passed. -Has problems similar to the above. + +So that would be problematic. There are two (subtly different [1]) +acceptable choices for `ndarray[]`: + +I think it is worth directly discussing the availability of slices in PEP +472-style keyword indices, since we seem to have mostly converged on a dunder +method signature. This is an issue that has been alluded to regarding +keyword-based (labelled) indices but not directly addressed. The basic syntax +would be something like d[x=1:3]. + +Type hints are indeed dispatched differently, but this is done based on +information that is only available at runtime. Since PEP 560, for `x[y]`, if no +`__getitem__` method is found, and `x` is a type (class) object, and `x` has a +class method `__class_getitem__`, that method is called. Extending this with +keyword args is straightforward. Modifying the compiler to generate different +bytecode for this case is essentially impossible. + +See https://github.com/python/cpython/blob/6844b56176c41f0a0e25fcd4fef5463bcdbc7d7c/Objects/abstract.c#L181-L198 for the code (it's part of PyObject_GetItem). + + +Alternative Syntax and semantics +================================ + A single bit to change the behavior ----------------------------------- @@ -829,35 +881,6 @@ Finally, I am unsure how you would deal with the difference between d[1] and d[1 Rejected Ideas ============== -PEP 472 presents a good amount of ideas that are now all to be considered Rejected. A personal email from D'Aprano to the Author -specifically said: - -"I have now carefully read through PEP 472 in full, and I am afraid I -cannot support any of the strategies currently in the PEP." - -Moreover, additional ideas and discussion occurred during the re-evaluation of the PEP: - -1. create a new "kwslice" object - -Has anyone suggested attaching the keyword args as attributes -on the slice object? - -We'll also need to decide how to combine subscripts and keywords: - - obj[a, b:c, x=1] - # is this a tuple argument (a, slice(b, c), key(x=1)) - # or key argument key(a, slice(b, c), x=1) - -would get the job done, but requires everyone who -needs keyword arguments to parse the tuple and/or key object by hand to -extract them. Having done something similiar in the past (emulating -keyword-only arguments in Python 2), I can tell you this is painful. - -It would also open up to the get/set/del function to always accept arbitrary -keyword arguments, whether they make sense or not. We want the developer -to be able to specify which arguments make sense and which ones do not. - - Again, implicit on your argument here is the assumption that all keyword indices necessarily map into positional indices. This may be the case with the use-case you had in mind. But for other use-cases brought up so far that @@ -875,48 +898,6 @@ arguments in the index that change how the indexing behaves. These don't correspond to any dimension, either. -Adding keywords to indexation for custom classes is not -the same as modifying the standard dict type for typing. - - -Common objections -================= - -> Just use a method call. - -One of the use cases is typing, where the [] is used exclusively, and function calls are out of the question. -Moreover, function calls do not handle slice notation, which is commonly used in some cases for arrays. - -One problem is type hint creation has been extended to built-ins in python 3.9, so that you do not have to import Dict, List, et al anymore. - -Without kwd args inside [ ], you would not be able to do this: - -Vector = dict[i=float, j=float] - -...but for obvious reasons, call syntax using built ins to create custom type hints isn't an option : - -dict(i=float, j=float) # this syntax isn't available - -We could treat d[1, a=3] either as d[1,] + kwargs or as d[1] + kwargs. Have people debated this yet? - - -I don't think that anyone wants adding a keyword to a single-valued -subscript to change it to a tuple. At least, I really hope that nobody -wants this! - -So given the current behaviour: - - obj[1] # calls __getitem__(1) - obj[1,] # calls __getitem__((1,)) - -I expect that the first will be the most common. If we add a keyword to -the subscript: - - obj[1, a=3] - -I would expect that the it turns into `__getitem__(1, a=3)` which is -almost surely what the reader and coder expects. It would be quite weird -for the subscript 1 to turn into a tuple just because I add a keyword. What I should have said was that a[1,] From 2e237599aac44f1b01ef532dca2d982f788e8f48 Mon Sep 17 00:00:00 2001 From: Stefano Borini Date: Fri, 18 Sep 2020 08:56:43 +0100 Subject: [PATCH 15/29] More cleanup to the extended dunder --- pep-9999.txt | 309 +++++++++++++++++++++------------------------------ 1 file changed, 125 insertions(+), 184 deletions(-) diff --git a/pep-9999.txt b/pep-9999.txt index 375a047d56d..e1db6d42725 100644 --- a/pep-9999.txt +++ b/pep-9999.txt @@ -378,11 +378,6 @@ function calls. Examples: del obj[index, spam=1, eggs=2] # calls type(obj).__delitem__(obj, index, spam=1, eggs=2) -This ensures that a single positional index will not turn into a tuple -just because one adds a keyword value. - -:: - # Comma-separated indices with keywords: obj[foo, bar, spam=1, eggs=2] @@ -394,6 +389,13 @@ just because one adds a keyword value. del obj[foo, bar, spam=1, eggs=2] # calls type(obj).__detitem__(obj, (foo, bar), spam=1, eggs=2) +Note that: + + - a single positional index will not turn into a tuple + just because one adds a keyword value. + - for ``__setitem__``, the same order is retained for index and value. + The keyword arguments go at the end, as is normal for a function + definition. 6. The same rules apply with respect to keyword subscripts as for keywords in function calls: @@ -475,6 +477,12 @@ calls, but this is not part of this recommendation. obj[spam=False] # Valid. index = (), spam = False, eggs = 2 obj[] # Invalid. +13. The same semantics given above must be extended to __class__getitem__ + Since PEP 560, type hints are dispatched so that for ``x[y]``, if no + ``__getitem__`` method is found, and ``x`` is a type (class) object, + and ``x`` has a class method ``__class_getitem__``, that method is + called. The same changes should be applied to this method as well, + so that a writing like ``list[T=int]`` can be accepted. Existing indexing implementations in standard classes ----------------------------------------------------- @@ -619,7 +627,85 @@ that are invoked over the __(get|set|del)item__ triad, if they are present. The rationale around this choice is to make the intuition around how to add kwd arg support to square brackets more obvious and in line with the function -behavior. +behavior. Given: + +:: + + def __getitem_ex__(self, x, y): ... + +These all just work and produce the same result effortlessly: + +:: + + obj[1, 2] + obj[1, y=2] + obj[y=2, x=1] + +In other words, this solution would unify the behavior of __getitem__ to the traditional +function signature, but since we can't change __getitem__ and break backward compatibility, +we would have an extended version that is used preferentially. + +The problems with this approach were found to be: + +- It will slow down subscripting. For every subscript access, this new dunder + attribute gets investigated on the class, and if it is not present then the + default key translation function is executed. + Different ideas were proposed to handle this, from wrapping the method + only at class instantiation time, to add a bit flag to signal the availability + of these methods. Regardess of the solution, the new dunder would be effective + only if added at class creation time, not if it's added later. This would + be unusual and would disallow (and behave unexpectedly) monkeypatching of the + methods for whatever reason it might be needed. + +- It adds complexity to the mechanism. + +- Will require a long and painful transition period during which time + libraries will have to somehow support both calling conventions, because most + likely, the extended methods will delegate to the traditional ones when the + right conditions are matched in the arguments, or some classes will support + the traditional dunder and others the extended dunder. While this will not + affect calling code, it will affect development. + +- it would potentially lead to mixed situations where the extended version is + defined for the getter, but not for the setter. + +- In the __setitem_ex__ signature, value would have to be made the first + element, because the index is of arbitrary length depending on the specified + indexes. This would look awkward because the visual notation does not match + the signature: + +:: + + obj[1, 2] = 3 # calls obj.__setitem_ex__(3, 1, 2) + +- the solution relies on the assumption that all keyword indices necessarily map + into positional indices, or that they must have a name. This assumption may be + false: xarray, which is the primary python package for numpy arrays with + labelled dimensions, supports indexing by additional dimensions (so called + "non-dimension coordinates) that don't correspond directly to the dimensions + of the underlying numpy array, and those have no position to match up to. + In other words, anonymous indexes are a plausible use case that this solution + would remove, although it could be argued that using ``*args`` would solve + that issue. + + + + + + + + + + + + + + + + + + + Adding an adapter function -------------------------- @@ -628,7 +714,6 @@ Similar to the above, in the sense that a pre-function would be called to convert the "new style" indexing into "old style indexing" that is then passed. Has problems similar to the above. - create a new "kwslice" object ----------------------------- @@ -644,6 +729,39 @@ get/set/del function to always accept arbitrary keyword arguments, whether they make sense or not. We want the developer to be able to specify which arguments make sense and which ones do not. + +Using a single bit to change the behavior +----------------------------------------- + +A special class dunder flag + + + __keyfn__ = True + +would change the signature of the __get|set|delitem__ to a "function like" dispatch, +meaning that this + +:: + + >>> d[1, 2, z=3] + +would result in a call to + +:: + >>> d.__getitem__(1, 2, z=3) # instead of d.__getitem__((1, 2), z=3) + +This option has been rejected because it feels odd that a signature of a method +depends on a specific value of another dunder. It would be confusing for both +static type checkers and for humans: a static type checker would have to hard-code +a special case for this, because there really is nothing else in Python +where the signature of a dunder depends on the value of another dunder. +A human that has to implement a __getitem__ dunder would have to look if in the +class (or in any of its subclasses) for a __keyfn__ before the dunder can be written. +Moreover, adding a base classes that have the __keyfn__ flag set would break +the signature of the current methods. This would be even more problematic if the +flag is changed at runtime, or if the flag is generated by calling a function +that returns randomly True or something else. + Allowing for empty index notation obj[] --------------------------------------- @@ -746,77 +864,22 @@ isn't an option: -The problems with this approach were found to be: - -* It will slow down subscripting. For every subscript access, this new dunder - attribute gets investigated on the class, and if it is not present then the - default key translation function is executed. Different ideas were proposed to handle this, from wrapping the method - only at class instantiation time (would not work when monkeypatching) - -* It adds complexity -def __getx__(self, x, y): ... -And these all just work: -q[1, 2] -q[1, y=2] -q[y=2, x=1] -1 is assigned to x and 2 is assigned to y in all of these for both versions, but the second certain requires not parsing of parameters. Python does it for us. That's a lot of easily available flexibility. -* In __setitem_ex__ value would need to be the first element, because the index is of arbitrary length. -Again, implicit on your argument here is the assumption that all keyword indices necessarily map into positional indices. This may be the case with the use-case you had in mind. But for other use-cases brought up so far that assumption is false. -xarray, which is the primary python package for numpy arrays with labelled dimensions. It supports adding and indexing by additional dimensions that don't correspond directly to the dimensions of the underlying numpy array, and those have no position to match up to. They are called "non-dimension coordinates". -Other people have wanted to allow parameters to be added when indexing, arguments in the index that change how the indexing behaves. These don't correspond to any dimension, either. -possibly have to invert value and index?s -lass A: - __keyfn__ = None - def __setitem__(self, val, x=0, y=0, z=0): - print((val, x, y, z)) - - >>> a = A() - >>> a[1, z=2] = 'hello' - ('hello', 1, 0, 2)* - - -Objection 1: Slowing Things Down - -The INTENDED EFFECT of the changes to internals will be as Jonathan Fine described: every time a subscript operation occurs, this new dunder attribute gets investigated on the class, and if it is not present then the default key translation function is executed. - -If things were implemented in exactly that way, obviously it would slow everything down a lot. Every subscripting operation gets slowed down everywhere and that's probably not an option. - -the new dunder is *only* effective if added at class creation time, -not if it's added later. You may not care about this, but it is a very -different behaviour than any other dunder method Python supports - so -quite apart from the problems people would have learning and -remembering that this is a special case, you have to document *in your -proposal* that you intend to allow this. And other people *do* care -about disallowing dynamic features like monkeypatching. So that would be problematic. There are two (subtly different [1]) acceptable choices for `ndarray[]`: -I think it is worth directly discussing the availability of slices in PEP -472-style keyword indices, since we seem to have mostly converged on a dunder -method signature. This is an issue that has been alluded to regarding -keyword-based (labelled) indices but not directly addressed. The basic syntax -would be something like d[x=1:3]. - -Type hints are indeed dispatched differently, but this is done based on -information that is only available at runtime. Since PEP 560, for `x[y]`, if no -`__getitem__` method is found, and `x` is a type (class) object, and `x` has a -class method `__class_getitem__`, that method is called. Extending this with -keyword args is straightforward. Modifying the compiler to generate different -bytecode for this case is essentially impossible. - See https://github.com/python/cpython/blob/6844b56176c41f0a0e25fcd4fef5463bcdbc7d7c/Objects/abstract.c#L181-L198 for the code (it's part of PyObject_GetItem). @@ -824,132 +887,10 @@ Alternative Syntax and semantics ================================ -A single bit to change the behavior ------------------------------------ - -Ricky has given some examples. Here are more, all assuming - __keyfn__ = True - -First, this use of __keyfn__ would allow - >>> d[1, 2, z=3] -to result in - >>> d.__getitem__(1, 2, z=3) - -Some further examples: - >>> d[1, 2] - >>> d.__getitem__(1, 2) - - >>> d[(1, 2)] - >>> d.__getitem__((1, 2)) - - >>> d[a=1, b=2] - >>> d.__getitem__(a=1, b=2) - -I find the above easy to understand and use. For Steven's proposal the calls to __getitem__ would be - - >>> d[1, 2, z=3] - >>> d.__getitem__((1, 2), z=3) - - >>> d[1, 2] - >>> d.__getitem__((1, 2) - - >>> d[(1, 2)] # Same result as d[1, 2] - >>> d.__getitem__((1, 2)) # From d[(1, 2)] - - >>> d[a=1, b=2] - >>> d.__getitem__((), a=1, b=2) - -I find these harder to understand and use, which is precisely the point Ricky made in his most recent post. That's because there's a clear and precise analogy between - >>> x(1, 2, a=3, b=4) - >>> x[1, 2, a=3, b=4] - -I think it reasonable to argue adding a single bit to every class is not worth the benefit it provides. However, this argument should be supported by evidence. (As indeed should the argument that it is worth the benefit.) - -I also think it reasonable to argue that now is not the time to allow __keyfn__ to have values other than None or True. And that allowing further values should require an additional PEP. - -I don't recall seeing an argument that Steven's proposal is as easy to understand and use as mine (with __keyfn__ == None). - - -Yes. I find it a big flaw that the signature of __setitem__ is so strongly influenced by the value of __keyfunc__. For example, a static type checker (since PEP 484 I care deeply about those and they're popping up like mushrooms :-) would have to hard-code a special case for this, because there really is nothing else in Python where the signature of a dunder depends on the value of another dunder. - -And in case you don't care about static type checkers, I think it's the same for human readers. Whenever I see a __setitem__ function I must look everywhere else in the class (and in all its base classes) for a __keyfn__ before I can understand how the __setitem__ function's signature is mapped from the d[...] notation. - -Finally, I am unsure how you would deal with the difference between d[1] and d[1,], which must be preserved (for __keyfn__ = True or absent, for backwards compatibility). The bytecode compiler cannot assume to know the value of __keyfn__ (because d could be defined in another module or could be an instance of one of several classes defined in the current module). (I think this problem is also present in the __subscript__ version.) - - Rejected Ideas ============== -Again, implicit on your argument here is the assumption that all keyword -indices necessarily map into positional indices. This may be the case with the -use-case you had in mind. But for other use-cases brought up so far that -assumption is false. Your approach would make those use cases extremely -difficult if not impossible. - -xarray, which is the primary python package for numpy arrays with labelled -dimensions. It supports adding and indexing by additional dimensions that -don't correspond directly to the dimensions of the underlying numpy array, and -those have no position to match up to. They are called "non-dimension -coordinates". - -Other people have wanted to allow parameters to be added when indexing, -arguments in the index that change how the indexing behaves. These don't -correspond to any dimension, either. - - - - -What I should have said was that a[1,] -would continue to create a tuple, regardless of whether old or new style -indexing was happening. - - -d[1] -> d.__getitem__(1) -d[1,] -> d.__getitem__((1,)) -d[1, 2] -> d.__getitem__((1, 2)) -d[a=3] -> d.__getitem__((), a=3) -d[1, a=3] -> d.__getitem__((1,), a=3) -d[1, 2, a=3] -> d.__getitem__((1, 2), a=3) - -d[1] = val -> d.__setitem__(1, val) -d[1,] = val -> d.__setitem__((1,), val) -d[1, 2] = val -> d.__setitem__((1, 2), val) -d[a=3] = val -> d.__setitem__((), val, a=3) -d[1, a=3] = val -> d.__setitem__((1,), val, a=3) -d[1, 2, a=3] = val -> d.__setitem__((1, 2), val, a=3) - -SHOULD BE: -d[1, a=3] -> d.__getitem__(1, a=3) -SHOULD BE: -d[1, a=3] = val -> d.__setitem__(1, val, a=3) - - - -If you're worried about people doing things like - - a[1, 2, 3, value = 4] = 5 - -I'm not sure that's really a problem -- usually it will result in -an exception due to specifying more than one value for a parameter. - - - -> 2. m.__get__(1, 2, a=3, b=4) # change positional argument handling from -> current behavior - -Advantages: - -1. Consistency with other methods and functions. - -Disadvantages: - -1. Breaks backwards compatibility. - -2. Will require a long and painful transition period during which time -libraries will have to somehow support both calling conventions. - - > 3. m.__get__((1, 2), {'a': 3, 'b': 4}) # handling of positional From 69eecde17164b219999ed5c9b4ad2b3c16eb8218 Mon Sep 17 00:00:00 2001 From: Stefano Borini Date: Fri, 18 Sep 2020 19:11:43 +0100 Subject: [PATCH 16/29] Basic implementation completed. We now need the C part --- pep-9999.txt | 198 ++++----------------------------------------------- 1 file changed, 13 insertions(+), 185 deletions(-) diff --git a/pep-9999.txt b/pep-9999.txt index e1db6d42725..3cd98b25dc0 100644 --- a/pep-9999.txt +++ b/pep-9999.txt @@ -601,7 +601,15 @@ This is particularly relevant in the following case: obj[1, 2] # calls __getitem__((1, 2)) obj[(1, 2)] # same as above obj[1, 2, a=3] # calls __getitem__((1, 2), a=3) - obj[(1, 2), a=3] # calls __getitem__(((1, 2),), a=3). Not __getitem__((1, 2), a=3) + obj[(1, 2), a=3] # calls __getitem__(((1, 2),), a=3). NOT __getitem__((1, 2), a=3) + +And particularly when the tuple is extracted as a variable: + +:: + + t = (1, 2) + obj[t] # calls __getitem__((1, 2)) + obj[t, a=3] # calls __getitem__(((1, 2),), a=3). NOT __getitem__((1, 2), a=3) Rejected Ideas @@ -619,6 +627,10 @@ cannot support any of the strategies currently in the PEP." We agree that those options are inferior to the currently presented, for one reason or another. +To keep this document compact, we will not present here the objections for +all options presented in PEP 472. Suffice to say that they were discussed, +and each proposed alternative had one or few dealbreakers. + Adding new dunders ------------------ @@ -688,25 +700,6 @@ The problems with this approach were found to be: would remove, although it could be argued that using ``*args`` would solve that issue. - - - - - - - - - - - - - - - - - - - Adding an adapter function -------------------------- @@ -859,171 +852,6 @@ isn't an option: dict(i=float, j=float) # would create a dictionary, not a type - -*********************************************************************************** - - - - - - - - - - - - - - - - -So that would be problematic. There are two (subtly different [1]) -acceptable choices for `ndarray[]`: - -See https://github.com/python/cpython/blob/6844b56176c41f0a0e25fcd4fef5463bcdbc7d7c/Objects/abstract.c#L181-L198 for the code (it's part of PyObject_GetItem). - - -Alternative Syntax and semantics -================================ - - - -Rejected Ideas -============== - - - -> 3. m.__get__((1, 2), {'a': 3, 'b': 4}) # handling of positional -> arguments unchanged from current behavior - -I assume that if there are no keyword arguments given, only the -first argument is passed to the method (as opposed to passing an -empty dict). If not, the advantages listed below disappear. - -Advantages: - -(1) Existing positional only subscripting does not change (backwards -compatible). - -(2) Requires no extra effort for developers who don't need or want -keyword parameters in their subscript methods. Just do nothing. - -Disadvantages: - -(1) Forces people to do their own parsing of keyword arguments to local -variables inside the method, instead of allowing the interpreter to do -it. - -(2) Compounds the "Special case breaks the rules" of subscript methods -to keyword arguments as well as positional arguments. - -(3) It's not really clear to me that anyone actually wants this, apart -from just suggesting it as an option. What's the concrete use-case for -this? - - - - -> 4. m.__get__(KeyObject( (1, 2), {'a': 3, 'b': 4} )) # change -> positional argument handling from current behavior only in the case that -> kwd args are provided - -Use-case: you want to wrap an arbitrary number of positional arguments, -plus an arbitrary set of keyword arguments, into a single hashable "key -object", for some unstated reason, and be able to store that key object -into a dict. - -Advantage (double-edged, possible): - -(1) Requires no change to the method signature to support keyword -parameters (whether you want them or not, you will get them). - -Disadvantages: - -(1) If you don't want keyword parameters in your subscript methods, you -can't just *do nothing* and have them be a TypeError, you have to -explicitly check for a KeyObject argument and raise: - - def __getitem__(self, index): - if isinstance(item, KeyObject): - raise TypeError('MyClass index takes no keyword arguments') - -(2) Seems to be a completely artificial and useless use-case to me. If -there is a concrete use-case for this, either I have missed it, (in -which case my apologies) or Jonathan seems to be unwilling or unable to -give it. But if you really wanted it, you could get it with this -signature and a single line in the body: - - def __getitem__(self, *args, **kw): - key = KeyObject(*args, **kw) - -(3) Forces those who want named keyword parameters to parse them from -the KeyObject value themselves. - -Since named keyword parameters are surely going to be the most common -use-case (just as they are for other functions), this makes the common -case difficult and the rare and unusual case easy. - -(4) KeyObject doesn't exist. We would need a new builtin type to support -this, as well as the new syntax. This increases the complexity and -maintenance burden of this new feature. - -(5) Compounds the "kind of screwy" (Greg's words) nature of subscripting -by extending it to keyword arguments as well as positional arguments. - - - -but it would still not know that: - -t = (1,2,3) -something[t] - -is the same as: - -something[1,2,3] - -would it? - - - - - a[17, 42] - a[time = 17, money = 42] - a[money = 42, time = 17] - -With a fresh new dunder, it's dead simple: - - def __getindex__(self, time, money): - ... - -With a __getitem__ that's been enhanced to take keyword args, but -still get positional args packed into a tuple, it's nowhere near -as easy. - - - -on non-specified parameter index -but note that the setter is awkward since the signature requires the -first parameter: - - obj[spam=1, eggs=2] = value - # wants to call type(obj).__setitem__(???, value, spam=1, eggs=2) - -Proposed solution: this is a runtime error unless the setitem method -gives the first parameter a default, e.g.: - - def __setitem__(self, index=None, value=None, **kwargs) - -Note that the second parameter will always be present, nevertheless, to -satisfy the interpreter, it too will require a default value. - -(Editorial comment: this is undoubtably an awkward and ugly corner case, -but I am reluctant to prohibit keyword-only assignment.) - - - - - References ========== From 9f1380dd0002542adf8cdc8a9e23ff47bcf1ccc3 Mon Sep 17 00:00:00 2001 From: Stefano Borini Date: Fri, 18 Sep 2020 19:47:15 +0100 Subject: [PATCH 17/29] Added attempt at the C interface --- pep-9999.txt | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/pep-9999.txt b/pep-9999.txt index 3cd98b25dc0..317d142d85b 100644 --- a/pep-9999.txt +++ b/pep-9999.txt @@ -612,6 +612,29 @@ And particularly when the tuple is extracted as a variable: obj[t, a=3] # calls __getitem__(((1, 2),), a=3). NOT __getitem__((1, 2), a=3) +C Interface +=========== + +Resolution of the indexing operation is performed through a call to +``PyObject_GetItem(PyObject *o, PyObject *key)`` for the get operation, +``PyObject_SetItem(PyObject *o, PyObject *key, PyObject *value)`` for the set operation, and +``PyObject_DelItem(PyObject *o, PyObject *key)`` for the del operation. + +These functions are used extensively within the python executable, and are +also part of the public C API, as exported by ``Include/abstract.h``. It is clear that +the signature of this function cannot be changed, and different C level functions +need to be implemented to support the extended call. We propose +``PyObject_GetItemEx(PyObject *o, PyObject *key, PyObject *kwargs)``, +``PyObject_SetItemEx(PyObject *o, PyObject *key, PyObject *value, PyObject *kwargs)`` and +``PyObject_DetItemEx(PyObject *o, PyObject *key, PyObject *kwargs)``. + +Additionally, new opcodes will be needed for the enhanced call. +Currently, the implementation uses ``BINARY_SUBSCR``, ``STORE_SUBSCR`` and ``DELETE_SUBSCR`` +to invoke the old functions. We propose ``BINARY_SUBSCR_EX``, +``STORE_SUBSCR_EX`` and ``DELETE_SUBSCR_EX`` for the extended operation. The parser will +have to generate these new opcodes. The PyObject_(Get|Set|Del)Item implementations +will call the extended methods passing NULL as kwargs. + Rejected Ideas ============== From 931566792ec7d74d8b8f9a9274c01e55c9b3b204 Mon Sep 17 00:00:00 2001 From: Stefano Borini Date: Fri, 18 Sep 2020 23:51:26 +0100 Subject: [PATCH 18/29] Removed snafu --- pep-9999.txt | 2 -- 1 file changed, 2 deletions(-) diff --git a/pep-9999.txt b/pep-9999.txt index 317d142d85b..5b42a2b041d 100644 --- a/pep-9999.txt +++ b/pep-9999.txt @@ -238,8 +238,6 @@ this one isn't The fifth difference is that there's no zero-argument form. This is valid -colon notations to slices, thanks to support from the parser. This is valid - :: f() From f7423f482c312c5820ab2cd48d306833af5744ca Mon Sep 17 00:00:00 2001 From: Stefano Borini Date: Fri, 18 Sep 2020 23:55:37 +0100 Subject: [PATCH 19/29] Reworded passage about stdlib classes --- pep-9999.txt | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/pep-9999.txt b/pep-9999.txt index 5b42a2b041d..e990a5bb4a8 100644 --- a/pep-9999.txt +++ b/pep-9999.txt @@ -300,9 +300,11 @@ The new notation will NOT make the following valid notation: It is worth stressing out that none of what is proposed in this PEP will change -the behavior of the current core classes that use indexing. Adding keywords to indexation for -custom classes is not the same as modifying e.g. the standard dict type, which -will remain the same and will continue not to accept keyword arguments. +the behavior of the current core classes that use indexing. Adding keywords to +the index operation for custom classes is not the same as modifying e.g. the +standard dict type to handle keyword arguments. In fact, dict (as well as list and other +stdlib classes with indexing semantics) will remain the same and will continue +not to accept keyword arguments. Syntax and Semantics From f210881b43ae908bec71a414339d6caca44bd177 Mon Sep 17 00:00:00 2001 From: Stefano Borini Date: Sat, 19 Sep 2020 00:22:01 +0100 Subject: [PATCH 20/29] Fixed incorrect resolution for single value tuples --- pep-9999.txt | 24 +++++++++++++++++------- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/pep-9999.txt b/pep-9999.txt index e990a5bb4a8..66fdbd9ee3c 100644 --- a/pep-9999.txt +++ b/pep-9999.txt @@ -576,13 +576,19 @@ about subscript methods which don't use the keyword-only flag. 4. As we saw, a single value followed by a keyword argument will not be changed into a tuple, i.e.: - ``d[1, a=3]`` is treated as ``__getitem__(1, a=3)``, not ``__getitem__((1,), a=3)`` + ``d[1, a=3]`` is treated as ``__getitem__(1, a=3)``, NOT ``__getitem__((1,), a=3)``. It would be + extremely confusing if adding keyword arguments were to change the type of the passed index. In other words, adding a keyword to a single-valued subscript will not change it into a tuple. For those cases where an actual tuple needs to be passed, a proper syntax will have to be used: - So given the current behaviour, the fourth case below is required to - disambiguate the call, making it an explicit tuple. Note that this behavior - just reveals the truth that the ``obj[1,]`` notation is shorthand for +:: + + obj[(1,), a=3] # calls __getitem__((1,), a=3) + + In this case, the call is passing a single element (which is passed as is, as from rule above), + only that the single element happens to be a tuple. + + Note that this behavior just reveals the truth that the ``obj[1,]`` notation is shorthand for ``obj[(1,)]`` (and also ``obj[1]`` is shorthand for ``obj[(1)]``, with the expected behavior). When keywords are present, the rule that you can omit this outermost pair of parentheses is no longer true. @@ -594,14 +600,14 @@ about subscript methods which don't use the keyword-only flag. obj[1,] # calls __getitem__((1,)) obj[(1,), a=3] # calls __getitem__((1,), a=3) -This is particularly relevant in the following case: +This is particularly relevant in the case where two entries are passed: :: obj[1, 2] # calls __getitem__((1, 2)) obj[(1, 2)] # same as above obj[1, 2, a=3] # calls __getitem__((1, 2), a=3) - obj[(1, 2), a=3] # calls __getitem__(((1, 2),), a=3). NOT __getitem__((1, 2), a=3) + obj[(1, 2), a=3] # calls __getitem__((1, 2), a=3) And particularly when the tuple is extracted as a variable: @@ -609,8 +615,12 @@ And particularly when the tuple is extracted as a variable: t = (1, 2) obj[t] # calls __getitem__((1, 2)) - obj[t, a=3] # calls __getitem__(((1, 2),), a=3). NOT __getitem__((1, 2), a=3) + obj[t, a=3] # calls __getitem__((1, 2), a=3) +Why? because in the case ``obj[1, 2, a=3]`` we are passing two elements (which +are then packed as a tuple and passed as the index). In the case ``obj[(1, 2), a=3]`` +we are passing a single element (which is passed as is) which happens to be a tuple. +The final result is that they are the same. C Interface =========== From 03358d6b75ec62adbbcbd1ed18b0b4a01b659ee9 Mon Sep 17 00:00:00 2001 From: Stefano Borini Date: Sat, 19 Sep 2020 00:24:47 +0100 Subject: [PATCH 21/29] Specified which author in case of additional inquire --- pep-9999.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/pep-9999.txt b/pep-9999.txt index 66fdbd9ee3c..3f4b6b95410 100644 --- a/pep-9999.txt +++ b/pep-9999.txt @@ -652,7 +652,8 @@ Previous PEP 472 solutions -------------------------- PEP 472 presents a good amount of ideas that are now all to be considered -Rejected. A personal email from D'Aprano to the Author specifically said: +Rejected. A personal email from D'Aprano to one of the authors (Stefano Borini) +specifically said: "I have now carefully read through PEP 472 in full, and I am afraid I cannot support any of the strategies currently in the PEP." From 3dcefb313ef17c78b13e1678c2016148ada8d887 Mon Sep 17 00:00:00 2001 From: Stefano Borini Date: Sat, 19 Sep 2020 00:27:05 +0100 Subject: [PATCH 22/29] Closed quote --- pep-9999.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pep-9999.txt b/pep-9999.txt index 3f4b6b95410..7033ef84bca 100644 --- a/pep-9999.txt +++ b/pep-9999.txt @@ -728,7 +728,7 @@ The problems with this approach were found to be: into positional indices, or that they must have a name. This assumption may be false: xarray, which is the primary python package for numpy arrays with labelled dimensions, supports indexing by additional dimensions (so called - "non-dimension coordinates) that don't correspond directly to the dimensions + "non-dimension coordinates") that don't correspond directly to the dimensions of the underlying numpy array, and those have no position to match up to. In other words, anonymous indexes are a plausible use case that this solution would remove, although it could be argued that using ``*args`` would solve From dedd4828673c17418bbcbb042897e49b643e720a Mon Sep 17 00:00:00 2001 From: Stefano Borini Date: Sat, 19 Sep 2020 01:23:33 +0100 Subject: [PATCH 23/29] Escaped * to let CI pass --- pep-9999.txt | 44 +++++++++++++++++++++----------------------- 1 file changed, 21 insertions(+), 23 deletions(-) diff --git a/pep-9999.txt b/pep-9999.txt index 7033ef84bca..2a3889904e5 100644 --- a/pep-9999.txt +++ b/pep-9999.txt @@ -188,11 +188,11 @@ between the two forms, and therefore it is not a given that the two should behave transparently and symmetrically. The third difference is that functions have names assigned to their -arguments, unless the passed parameters are captured with *args, in which case +arguments, unless the passed parameters are captured with \*args, in which case they end up as entries in the args tuple. In other words, functions already have anonymous argument semantic, exactly like the indexing operation. However, __(get|set|del)item__ is not always receiving a tuple as the `index` argument -(to be uniform in behavior with *args). In fact, given a trivial class: +(to be uniform in behavior with \*args). In fact, given a trivial class: :: @@ -407,9 +407,9 @@ keywords in function calls: keywords are all used, they are assigned their default value (if any); - if any such parameter doesn't have a default, that is an error; - if there are any keyword subscripts remaining after all the named - parameters are filled, and the method has a `**kwargs` parameter, - they are bound to the `**kwargs` parameter as a dict; - - but if no `**kwargs` parameter is defined, it is an error. + parameters are filled, and the method has a ``**kwargs`` parameter, + they are bound to the ``**kwargs`` parameter as a dict; + - but if no ``**kwargs`` parameter is defined, it is an error. 7. Sequence unpacking remains a syntax error inside subscripts: @@ -642,8 +642,8 @@ Additionally, new opcodes will be needed for the enhanced call. Currently, the implementation uses ``BINARY_SUBSCR``, ``STORE_SUBSCR`` and ``DELETE_SUBSCR`` to invoke the old functions. We propose ``BINARY_SUBSCR_EX``, ``STORE_SUBSCR_EX`` and ``DELETE_SUBSCR_EX`` for the extended operation. The parser will -have to generate these new opcodes. The PyObject_(Get|Set|Del)Item implementations -will call the extended methods passing NULL as kwargs. +have to generate these new opcodes. The ``PyObject_(Get|Set|Del)Item`` implementations +will call the extended methods passing ``NULL`` as kwargs. Rejected Ideas ============== @@ -655,8 +655,8 @@ PEP 472 presents a good amount of ideas that are now all to be considered Rejected. A personal email from D'Aprano to one of the authors (Stefano Borini) specifically said: -"I have now carefully read through PEP 472 in full, and I am afraid I -cannot support any of the strategies currently in the PEP." + I have now carefully read through PEP 472 in full, and I am afraid I + cannot support any of the strategies currently in the PEP. We agree that those options are inferior to the currently presented, for one reason or another. @@ -668,8 +668,8 @@ and each proposed alternative had one or few dealbreakers. Adding new dunders ------------------ -It was proposed to introduce new dunders __(get|set|del)item_ex__ -that are invoked over the __(get|set|del)item__ triad, if they are present. +It was proposed to introduce new dunders ``__(get|set|del)item_ex__`` +that are invoked over the ``__(get|set|del)item__`` triad, if they are present. The rationale around this choice is to make the intuition around how to add kwd arg support to square brackets more obvious and in line with the function @@ -795,16 +795,14 @@ Allowing for empty index notation obj[] The current proposal prevents ``obj[]`` from being valid notation. However a commenter stated -``` -We have `Tuple[int, int]` as a tuple of two integers. And we have `Tuple[int]` -as a tuple of one integer. And occasionally we need to spell a tuple of *no* -values, since that's the type of `()`. But we currently are forced to write -that as `Tuple[()]`. If we allowed `Tuple[]` that odd edge case would be -removed. + We have ``Tuple[int, int]`` as a tuple of two integers. And we have `Tuple[int]` + as a tuple of one integer. And occasionally we need to spell a tuple of *no* + values, since that's the type of ``()``. But we currently are forced to write + that as ``Tuple[()]``. If we allowed ``Tuple[]`` that odd edge case would be + removed. -So I probably would be okay with allowing `obj[]` syntactically, as long as the -dict type could be made to reject it. -``` + So I probably would be okay with allowing ``obj[]`` syntactically, as long as the + dict type could be made to reject it. This proposal already established that, in case no positional index is given, the passed value must be the empty tuple. Allowing for the empty index notation would @@ -818,7 +816,7 @@ Use None instead of the empty tuple when no positional index is given The case ``obj[k=3]`` will lead to a call ``__getitem__((), k=3)``. The alternative ``__getitem__(None, k=3)`` was considered but rejected: NumPy uses `None` to indicate inserting a new axis/dimensions (there's -a `np.newaxis` alias as well): +a ``np.newaxis`` alias as well): :: @@ -842,7 +840,7 @@ more than this: obj[1, k=3] # __getitem__(1, k=3). Integer obj[1, 2, k=3] # __getitem__((1, 2), k=3). Tuple -With the first more in line with a *args semantics for calling a routine with +With the first more in line with a \*args semantics for calling a routine with no positional arguments :: @@ -873,7 +871,7 @@ slice notation, which is commonly used in some cases for arrays. One problem is type hint creation has been extended to built-ins in python 3.9, so that you do not have to import Dict, List, et al anymore. -Without kwdargs inside [], you would not be able to do this: +Without kwdargs inside ``[]``, you would not be able to do this: :: From a25850868e99273ac1c64983896e642b11e69398 Mon Sep 17 00:00:00 2001 From: Stefano Borini Date: Sat, 19 Sep 2020 14:26:34 +0100 Subject: [PATCH 24/29] More reformatting --- pep-9999.txt | 426 ++++++++++++++++++++++++++------------------------- 1 file changed, 215 insertions(+), 211 deletions(-) diff --git a/pep-9999.txt b/pep-9999.txt index 2a3889904e5..89049796393 100644 --- a/pep-9999.txt +++ b/pep-9999.txt @@ -147,9 +147,9 @@ selection (slices). Some common examples: :: - >>> a[3] # returns the fourth element of a - >>> a[1:10:2] # slice notation (extract a non-trivial data subset) - >>> a[3, 2] # multiple indexes (for multidimensional arrays) + >>> a[3] # returns the fourth element of a + >>> a[1:10:2] # slice notation (extract a non-trivial data subset) + >>> a[3, 2] # multiple indexes (for multidimensional arrays) This translates into a __(get|set|del)item__ dunder call which is passed a single parameter containing the index (for __getitem__ and __delitem__) or two parameters @@ -282,7 +282,7 @@ example use case 1, where a slice is accepted. The new notation will make all of the following valid notation: -:: + :: >>> a[1] # Current case, single index >>> a[1, 2] # Current case, multiple indexes @@ -294,11 +294,10 @@ The new notation will make all of the following valid notation: The new notation will NOT make the following valid notation: -:: + :: >>> a[] # INVALID. No index and no keyword arguments. - It is worth stressing out that none of what is proposed in this PEP will change the behavior of the current core classes that use indexing. Adding keywords to the index operation for custom classes is not the same as modifying e.g. the @@ -306,7 +305,6 @@ standard dict type to handle keyword arguments. In fact, dict (as well as list a stdlib classes with indexing semantics) will remain the same and will continue not to accept keyword arguments. - Syntax and Semantics ==================== @@ -314,173 +312,178 @@ The following old semantics are preserved: 1. As said above, an empty subscript is still illegal, regardless of context. -:: - - obj[] # SyntaxError + :: + obj[] # SyntaxError 2. A single index value remains a single index value when passed: -:: + :: - obj[index] - # calls type(obj).__getitem__(obj, index) + obj[index] + # calls type(obj).__getitem__(obj, index) - obj[index] = value - # calls type(obj).__setitem__(obj, index, value) + obj[index] = value + # calls type(obj).__setitem__(obj, index, value) - del obj[index] - # calls type(obj).__delitem__(obj, index) + del obj[index] + # calls type(obj).__delitem__(obj, index) -This remains the case even if the index is followed by keywords; see point 5 below. + This remains the case even if the index is followed by keywords; see point 5 below. 3. Comma-seperated arguments are still parsed as a tuple and passed as -a single positional argument: + a single positional argument: -:: + :: - obj[spam, eggs] - # calls type(obj).__getitem__(obj, (spam, eggs)) + obj[spam, eggs] + # calls type(obj).__getitem__(obj, (spam, eggs)) - obj[spam, eggs] = value - # calls type(obj).__setitem__(obj, (spam, eggs), value) + obj[spam, eggs] = value + # calls type(obj).__setitem__(obj, (spam, eggs), value) - del obj[spam, eggs] - # calls type(obj).__delitem__(obj, (spam, eggs)) + del obj[spam, eggs] + # calls type(obj).__delitem__(obj, (spam, eggs)) -The points above mean that classes which do not want to support keyword -arguments in subscripts need do nothing at all, and the feature is therefore -completely backwards compatible. + The points above mean that classes which do not want to support keyword + arguments in subscripts need do nothing at all, and the feature is therefore + completely backwards compatible. 4. Keyword arguments, if any, must follow positional arguments. -:: + :: - obj[1, 2, spam=None, 3] # SyntaxError + obj[1, 2, spam=None, 3] # SyntaxError -This is like function calls, where intermixing positional and keyword -arguments give a SyntaxError. + This is like function calls, where intermixing positional and keyword + arguments give a SyntaxError. 5. Keyword subscripts, if any, will be handled like they are in -function calls. Examples: + function calls. Examples: -:: + :: + + # Single index with keywords: - # Single index with keywords: + obj[index, spam=1, eggs=2] + # calls type(obj).__getitem__(obj, index, spam=1, eggs=2) - obj[index, spam=1, eggs=2] - # calls type(obj).__getitem__(obj, index, spam=1, eggs=2) + obj[index, spam=1, eggs=2] = value + # calls type(obj).__setitem__(obj, index, value, spam=1, eggs=2) - obj[index, spam=1, eggs=2] = value - # calls type(obj).__setitem__(obj, index, value, spam=1, eggs=2) + del obj[index, spam=1, eggs=2] + # calls type(obj).__delitem__(obj, index, spam=1, eggs=2) - del obj[index, spam=1, eggs=2] - # calls type(obj).__delitem__(obj, index, spam=1, eggs=2) + # Comma-separated indices with keywords: - # Comma-separated indices with keywords: + obj[foo, bar, spam=1, eggs=2] + # calls type(obj).__getitem__(obj, (foo, bar), spam=1, eggs=2) - obj[foo, bar, spam=1, eggs=2] - # calls type(obj).__getitem__(obj, (foo, bar), spam=1, eggs=2) + obj[foo, bar, spam=1, eggs=2] = value + # calls type(obj).__setitem__(obj, (foo, bar), value, spam=1, eggs=2) - obj[foo, bar, spam=1, eggs=2] = value - # calls type(obj).__setitem__(obj, (foo, bar), value, spam=1, eggs=2) + del obj[foo, bar, spam=1, eggs=2] + # calls type(obj).__detitem__(obj, (foo, bar), spam=1, eggs=2) - del obj[foo, bar, spam=1, eggs=2] - # calls type(obj).__detitem__(obj, (foo, bar), spam=1, eggs=2) + Note that: -Note that: + - a single positional index will not turn into a tuple + just because one adds a keyword value. - - a single positional index will not turn into a tuple - just because one adds a keyword value. - - for ``__setitem__``, the same order is retained for index and value. - The keyword arguments go at the end, as is normal for a function - definition. + - for ``__setitem__``, the same order is retained for index and value. + The keyword arguments go at the end, as is normal for a function + definition. 6. The same rules apply with respect to keyword subscripts as for -keywords in function calls: + keywords in function calls: + + - the interpeter matches up each keyword subscript to a named parameter + in the appropriate method; + + - if a named parameter is used twice, that is an error; - - the interpeter matches up each keyword subscript to a named parameter - in the appropriate method; - - if a named parameter is used twice, that is an error; - - if there are any named parameters left over (without a value) when the - keywords are all used, they are assigned their default value (if any); - - if any such parameter doesn't have a default, that is an error; - - if there are any keyword subscripts remaining after all the named - parameters are filled, and the method has a ``**kwargs`` parameter, - they are bound to the ``**kwargs`` parameter as a dict; - - but if no ``**kwargs`` parameter is defined, it is an error. + - if there are any named parameters left over (without a value) when the + keywords are all used, they are assigned their default value (if any); + + - if any such parameter doesn't have a default, that is an error; + + - if there are any keyword subscripts remaining after all the named + parameters are filled, and the method has a ``**kwargs`` parameter, + they are bound to the ``**kwargs`` parameter as a dict; + + - but if no ``**kwargs`` parameter is defined, it is an error. 7. Sequence unpacking remains a syntax error inside subscripts: -:: + :: - obj[*items] + obj[*items] -Reason: unpacking items would result it being immediately repacked into -a tuple. Anyone using sequence unpacking in the subscript is probably -confused as to what is happening, and it is best if they receive an -immediate syntax error with an informative error message. + Reason: unpacking items would result it being immediately repacked into + a tuple. Anyone using sequence unpacking in the subscript is probably + confused as to what is happening, and it is best if they receive an + immediate syntax error with an informative error message. -This restriction has however been considered arbitrary by some, and it might -be lifted at a later stage for symmetry with kwargs unpacking, see next. + This restriction has however been considered arbitrary by some, and it might + be lifted at a later stage for symmetry with kwargs unpacking, see next. 8. Dict unpacking is permitted: -:: + :: - items = {'spam': 1, 'eggs': 2} - obj[index, **items] - # equivalent to obj[index, spam=1, eggs=2] + items = {'spam': 1, 'eggs': 2} + obj[index, **items] + # equivalent to obj[index, spam=1, eggs=2] 9. Keyword-only subscripts are permitted. The positional index will be the empty tuple: -:: + :: - obj[spam=1, eggs=2] - # calls type(obj).__getitem__(obj, (), spam=1, eggs=2) + obj[spam=1, eggs=2] + # calls type(obj).__getitem__(obj, (), spam=1, eggs=2) - obj[spam=1, eggs=2] = 5 - # calls type(obj).__setitem__(obj, (), 5, spam=1, eggs=2) + obj[spam=1, eggs=2] = 5 + # calls type(obj).__setitem__(obj, (), 5, spam=1, eggs=2) - del obj[spam=1, eggs=2] - # calls type(obj).__delitem__(obj, (), spam=1, eggs=2) + del obj[spam=1, eggs=2] + # calls type(obj).__delitem__(obj, (), spam=1, eggs=2) 10. Keyword arguments must allow slice syntax. -:: - - obj[3:4, spam=1:4, eggs=2] - # calls type(obj).__getitem__(obj, slice(3, 4, None), spam=slice(1, 4, None), eggs=2) + :: + obj[3:4, spam=1:4, eggs=2] + # calls type(obj).__getitem__(obj, slice(3, 4, None), spam=slice(1, 4, None), eggs=2) -This may open up the possibility to accept the same syntax for general function -calls, but this is not part of this recommendation. + This may open up the possibility to accept the same syntax for general function + calls, but this is not part of this recommendation. 11. Keyword arguments must allow Ellipsis -:: + :: - obj[..., spam=..., eggs=2] - # calls type(obj).__getitem__(obj, Ellipsis, spam=Ellipsis, eggs=2) + obj[..., spam=..., eggs=2] + # calls type(obj).__getitem__(obj, Ellipsis, spam=Ellipsis, eggs=2) 12. Keyword arguments allow for default values -:: - # Given type(obj).__getitem__(obj, index, spam=True, eggs=2) - obj[3] # Valid. index = 3, spam = True, eggs = 2 - obj[3, spam=False] # Valid. index = 3, spam = False, eggs = 2 - obj[spam=False] # Valid. index = (), spam = False, eggs = 2 - obj[] # Invalid. + :: + + # Given type(obj).__getitem__(obj, index, spam=True, eggs=2) + obj[3] # Valid. index = 3, spam = True, eggs = 2 + obj[3, spam=False] # Valid. index = 3, spam = False, eggs = 2 + obj[spam=False] # Valid. index = (), spam = False, eggs = 2 + obj[] # Invalid. -13. The same semantics given above must be extended to __class__getitem__ +13. The same semantics given above must be extended to ``__class__getitem__``: Since PEP 560, type hints are dispatched so that for ``x[y]``, if no ``__getitem__`` method is found, and ``x`` is a type (class) object, - and ``x`` has a class method ``__class_getitem__``, that method is + and ``x`` has a class method ``__class_getitem__``, that method is called. The same changes should be applied to this method as well, so that a writing like ``list[T=int]`` can be accepted. @@ -500,79 +503,78 @@ With the introduction of the new notation, a few corner cases need to be analyse 1. Technically, if a class defines their getter like this: -:: - - def __getitem__(self, index): + :: -then the caller could call that using keyword syntax, like these two cases: + def __getitem__(self, index): -:: + then the caller could call that using keyword syntax, like these two cases: - obj[3, index=4] - obj[index=1] + :: -The resulting behavior would be an error automatically, since it would be like -attempting to call the method with two values for the ``index`` argument, and -a ``TypeError`` will be raised. In the first case, the ``index`` would be ``3``, -in the second case, it would be the empty tuple ``()``. + obj[3, index=4] + obj[index=1] -Note that this behavior applies for all currently existing classes that rely on -indexing, meaning that there is no way for the new behavior to introduce -backward compatibility issues on this respect. + The resulting behavior would be an error automatically, since it would be like + attempting to call the method with two values for the ``index`` argument, and + a ``TypeError`` will be raised. In the first case, the ``index`` would be ``3``, + in the second case, it would be the empty tuple ``()``. -Classes that wish to stress this behavior explicitly can define their -parameters as positional-only: + Note that this behavior applies for all currently existing classes that rely on + indexing, meaning that there is no way for the new behavior to introduce + backward compatibility issues on this respect. -:: + Classes that wish to stress this behavior explicitly can define their + parameters as positional-only: - def __getitem__(self, index, /): + :: + def __getitem__(self, index, /): 2. a similar case occurs with setter notation -:: + :: - # Given type(obj).__getitem__(self, index, value): + # Given type(obj).__getitem__(self, index, value): - obj[1, value=3] = 5 + obj[1, value=3] = 5 -This poses no issue because the value is passed automatically, and the python interpreter will raise -``TypeError: got multiple values for keyword argument 'value'`` + This poses no issue because the value is passed automatically, and the python interpreter will raise + ``TypeError: got multiple values for keyword argument 'value'`` 3. If the subscript dunders are declared to use positional-or-keyword -parameters, there may be some surprising cases when arguments are passed -to the method. Given the signature: + parameters, there may be some surprising cases when arguments are passed + to the method. Given the signature: -:: + :: - def __getitem__(self, index, direction='north') + def __getitem__(self, index, direction='north') -if the caller uses this: + if the caller uses this: -:: + :: - obj[0, 'south'] + obj[0, 'south'] -they will probably be surprised by the method call: + they will probably be surprised by the method call: -:: + :: - # expected type(obj).__getitem__(0, direction='south') - # but actually get: - obj.__getitem__((0, 'south'), direction='north') + # expected type(obj).__getitem__(0, direction='south') + # but actually get: + obj.__getitem__((0, 'south'), direction='north') -Solution: best practice suggests that keyword subscripts should be -flagged as keyword-only when possible: + Solution: best practice suggests that keyword subscripts should be + flagged as keyword-only when possible: -:: + :: - def __getitem__(self, index, *, direction='north') + def __getitem__(self, index, *, direction='north') -The interpreter need not enforce this rule, as there could be scenarios -where this is the desired behaviour. But linters may choose to warn -about subscript methods which don't use the keyword-only flag. + The interpreter need not enforce this rule, as there could be scenarios + where this is the desired behaviour. But linters may choose to warn + about subscript methods which don't use the keyword-only flag. 4. As we saw, a single value followed by a keyword argument will not be changed into a tuple, i.e.: @@ -581,9 +583,9 @@ about subscript methods which don't use the keyword-only flag. In other words, adding a keyword to a single-valued subscript will not change it into a tuple. For those cases where an actual tuple needs to be passed, a proper syntax will have to be used: -:: + :: - obj[(1,), a=3] # calls __getitem__((1,), a=3) + obj[(1,), a=3] # calls __getitem__((1,), a=3) In this case, the call is passing a single element (which is passed as is, as from rule above), only that the single element happens to be a tuple. @@ -593,34 +595,34 @@ about subscript methods which don't use the keyword-only flag. When keywords are present, the rule that you can omit this outermost pair of parentheses is no longer true. -:: + :: - obj[1] # calls __getitem__(1) - obj[1, a=3] # calls __getitem__(1, a=3) - obj[1,] # calls __getitem__((1,)) - obj[(1,), a=3] # calls __getitem__((1,), a=3) + obj[1] # calls __getitem__(1) + obj[1, a=3] # calls __getitem__(1, a=3) + obj[1,] # calls __getitem__((1,)) + obj[(1,), a=3] # calls __getitem__((1,), a=3) -This is particularly relevant in the case where two entries are passed: + This is particularly relevant in the case where two entries are passed: -:: + :: - obj[1, 2] # calls __getitem__((1, 2)) - obj[(1, 2)] # same as above - obj[1, 2, a=3] # calls __getitem__((1, 2), a=3) - obj[(1, 2), a=3] # calls __getitem__((1, 2), a=3) + obj[1, 2] # calls __getitem__((1, 2)) + obj[(1, 2)] # same as above + obj[1, 2, a=3] # calls __getitem__((1, 2), a=3) + obj[(1, 2), a=3] # calls __getitem__((1, 2), a=3) -And particularly when the tuple is extracted as a variable: + And particularly when the tuple is extracted as a variable: -:: + :: - t = (1, 2) - obj[t] # calls __getitem__((1, 2)) - obj[t, a=3] # calls __getitem__((1, 2), a=3) - -Why? because in the case ``obj[1, 2, a=3]`` we are passing two elements (which -are then packed as a tuple and passed as the index). In the case ``obj[(1, 2), a=3]`` -we are passing a single element (which is passed as is) which happens to be a tuple. -The final result is that they are the same. + t = (1, 2) + obj[t] # calls __getitem__((1, 2)) + obj[t, a=3] # calls __getitem__((1, 2), a=3) + + Why? because in the case ``obj[1, 2, a=3]`` we are passing two elements (which + are then packed as a tuple and passed as the index). In the case ``obj[(1, 2), a=3]`` + we are passing a single element (which is passed as is) which happens to be a tuple. + The final result is that they are the same. C Interface =========== @@ -677,18 +679,18 @@ behavior. Given: :: - def __getitem_ex__(self, x, y): ... + def __getitem_ex__(self, x, y): ... These all just work and produce the same result effortlessly: :: - obj[1, 2] - obj[1, y=2] - obj[y=2, x=1] + obj[1, 2] + obj[1, y=2] + obj[y=2, x=1] -In other words, this solution would unify the behavior of __getitem__ to the traditional -function signature, but since we can't change __getitem__ and break backward compatibility, +In other words, this solution would unify the behavior of ``__getitem__`` to the traditional +function signature, but since we can't change ``__getitem__`` and break backward compatibility, we would have an extended version that is used preferentially. The problems with this approach were found to be: @@ -720,9 +722,9 @@ The problems with this approach were found to be: indexes. This would look awkward because the visual notation does not match the signature: -:: + :: - obj[1, 2] = 3 # calls obj.__setitem_ex__(3, 1, 2) + obj[1, 2] = 3 # calls obj.__setitem_ex__(3, 1, 2) - the solution relies on the assumption that all keyword indices necessarily map into positional indices, or that they must have a name. This assumption may be @@ -748,7 +750,7 @@ This proposal has already been explored in "New arguments contents" P4 in PEP 47 :: - obj[a, b:c, x=1] # calls __getitem__(a, slice(b, c), key(x=1)) + obj[a, b:c, x=1] # calls __getitem__(a, slice(b, c), key(x=1)) This solution requires everyone who needs keyword arguments to parse the tuple and/or key object by hand to extract them. This is painful and opens up to the @@ -762,29 +764,31 @@ Using a single bit to change the behavior A special class dunder flag +:: __keyfn__ = True -would change the signature of the __get|set|delitem__ to a "function like" dispatch, +would change the signature of the ``__get|set|delitem__`` to a "function like" dispatch, meaning that this :: - >>> d[1, 2, z=3] + >>> d[1, 2, z=3] would result in a call to :: - >>> d.__getitem__(1, 2, z=3) # instead of d.__getitem__((1, 2), z=3) + + >>> d.__getitem__(1, 2, z=3) # instead of d.__getitem__((1, 2), z=3) This option has been rejected because it feels odd that a signature of a method depends on a specific value of another dunder. It would be confusing for both static type checkers and for humans: a static type checker would have to hard-code a special case for this, because there really is nothing else in Python where the signature of a dunder depends on the value of another dunder. -A human that has to implement a __getitem__ dunder would have to look if in the -class (or in any of its subclasses) for a __keyfn__ before the dunder can be written. -Moreover, adding a base classes that have the __keyfn__ flag set would break +A human that has to implement a ``__getitem__`` dunder would have to look if in the +class (or in any of its subclasses) for a ``__keyfn__`` before the dunder can be written. +Moreover, adding a base classes that have the ``__keyfn__`` flag set would break the signature of the current methods. This would be even more problematic if the flag is changed at runtime, or if the flag is generated by calling a function that returns randomly True or something else. @@ -820,43 +824,43 @@ a ``np.newaxis`` alias as well): :: - arr = np.array(5) - arr.ndim == 0 - arr[None].ndim == arr[None,].ndim == 1 + arr = np.array(5) + arr.ndim == 0 + arr[None].ndim == arr[None,].ndim == 1 So the final conclusion is that we favor the following series: :: - obj[k=3] # __getitem__((), k=3). Empty tuple - obj[1, k=3] # __getitem__(1, k=3). Integer - obj[1, 2, k=3] # __getitem__((1, 2), k=3). Tuple + obj[k=3] # __getitem__((), k=3). Empty tuple + obj[1, k=3] # __getitem__(1, k=3). Integer + obj[1, 2, k=3] # __getitem__((1, 2), k=3). Tuple more than this: :: - obj[k=3] # __getitem__(None, k=3). None - obj[1, k=3] # __getitem__(1, k=3). Integer - obj[1, 2, k=3] # __getitem__((1, 2), k=3). Tuple + obj[k=3] # __getitem__(None, k=3). None + obj[1, k=3] # __getitem__(1, k=3). Integer + obj[1, 2, k=3] # __getitem__((1, 2), k=3). Tuple With the first more in line with a \*args semantics for calling a routine with no positional arguments :: - >>> def foo(*args, **kwargs): - ... print(args, kwargs) - ... - >>> foo(k=3) - () {'k': 3} + >>> def foo(*args, **kwargs): + ... print(args, kwargs) + ... + >>> foo(k=3) + () {'k': 3} Although we accept the following asymmetry: :: - >>> foo(1, k=3) - (1,) {'k': 3} + >>> foo(1, k=3) + (1,) {'k': 3} Common objections @@ -864,25 +868,25 @@ Common objections 1. Just use a method call. -One of the use cases is typing, where the indexing is used exclusively, and -function calls are out of the question. Moreover, function calls do not handle -slice notation, which is commonly used in some cases for arrays. + One of the use cases is typing, where the indexing is used exclusively, and + function calls are out of the question. Moreover, function calls do not handle + slice notation, which is commonly used in some cases for arrays. -One problem is type hint creation has been extended to built-ins in python 3.9, -so that you do not have to import Dict, List, et al anymore. + One problem is type hint creation has been extended to built-ins in python 3.9, + so that you do not have to import Dict, List, et al anymore. -Without kwdargs inside ``[]``, you would not be able to do this: + Without kwdargs inside ``[]``, you would not be able to do this: -:: - - Vector = dict[i=float, j=float] + :: + + Vector = dict[i=float, j=float] -but for obvious reasons, call syntax using builtins to create custom type hints -isn't an option: + but for obvious reasons, call syntax using builtins to create custom type hints + isn't an option: -:: + :: - dict(i=float, j=float) # would create a dictionary, not a type + dict(i=float, j=float) # would create a dictionary, not a type References ========== From a6dbac49625d2c1df5105af2254fadbbd98773be Mon Sep 17 00:00:00 2001 From: Stefano Borini Date: Mon, 21 Sep 2020 21:16:59 +0100 Subject: [PATCH 25/29] Integrated changes from Jonathan Fine --- pep-9999.txt | 122 ++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 110 insertions(+), 12 deletions(-) diff --git a/pep-9999.txt b/pep-9999.txt index 89049796393..b6febf7eb5e 100644 --- a/pep-9999.txt +++ b/pep-9999.txt @@ -16,15 +16,31 @@ Resolution: Abstract ======== -This PEP proposed extending python to allow keyword-like arguments to be -accepted during indexing operations. Notations in the form ``a[42, K=3, R=2]`` -would become legal syntax. A strategy will be proposed in terms of -semantics and implementation. +At present keyword arguments are allowed in function calls, but not in +item access. This PEP proposes that Python be extended to allow keyword +arguments in item access. + +The following example shows keyword arguments for ordinary function calls: + +:: + + >>> val = f(1, 2, a=3, b=4) + +The proposal would extend the syntax to allow a similar construct +to indexing operations: + +:: + + >>> val = x[1, 2, a=3, b=4] # getitem + >>> x[1, 2, a=3, b=4] = val # setitem + >>> del x[1, 2, a=3, b=4] # delitem + + +and would also provide appropriate semantics. + +This PEP is a successor to PEP 472, which was rejected due to lack of +interest in 2019. Since then there's been renewed interest in the feature. -This PEP is a rework and expansion of PEP 472, where an extension of the -indexing operation to support keyword arguments was analysed. PEP 472 was -Rejected due to apparent lack of interest back in 2019. However, renewed -interest has prompted a re-analysis and therefore this PEP. Overview ======== @@ -145,11 +161,11 @@ obj(x). The current python syntax focuses exclusively on position to express the index, and also contains syntactic sugar to refer to non-punctiform selection (slices). Some common examples: -:: + :: - >>> a[3] # returns the fourth element of a - >>> a[1:10:2] # slice notation (extract a non-trivial data subset) - >>> a[3, 2] # multiple indexes (for multidimensional arrays) + >>> a[3] # returns the fourth element of a + >>> a[1:10:2] # slice notation (extract a non-trivial data subset) + >>> a[3, 2] # multiple indexes (for multidimensional arrays) This translates into a __(get|set|del)item__ dunder call which is passed a single parameter containing the index (for __getitem__ and __delitem__) or two parameters @@ -647,6 +663,86 @@ to invoke the old functions. We propose ``BINARY_SUBSCR_EX``, have to generate these new opcodes. The ``PyObject_(Get|Set|Del)Item`` implementations will call the extended methods passing ``NULL`` as kwargs. + +Workarounds +=========== + +Every PEP that changes the Python language should "clearly explain why +the existing language specification is inadequate to address the +problem that the PEP solves." [#pep-0001]_ + +Some rough equivalents to the proposed extension, which we call work-arounds, +are already possible. The work-arounds provide an alternative to enabling the +new syntax, while leaving the semantics to be defined elsewhere. + +These work-arounds follow. In them the helpers ``H`` and ``P`` are not intended to +be universal. For example, a module or package might require the use of its own +helpers. + +1. User defined classes can be given ``getitem`` and ``delitem`` methods, + that respectively get and delete values stored in a container. + + :: + + >>> val = x.getitem(1, 2, a=3, b=4) + >>> x.delitem(1, 2, a=3, b=4) + + The same can't be done for ``setitem``. It's not valid syntax. + + :: + + >>> x.setitem(1, 2, a=3, b=4) = val + SyntaxError: can't assign to function call + +2. A helper class, here called ``H``, can be used to swap the container + and parameter roles. In other words, we use + + :: + + H(1, 2, a=3, b=4)[x] + + as a substitute for + + :: + + x[1, 2, a=3, b=4] + + This method will work for ``getitem``, ``delitem`` and also for + ``setitem``. This is because + + :: + + >>> H(1, 2, a=3, b=4)[x] = val + + is valid syntax, which can be given the appropriate semantics. + +3. A helper function, here called ``P``, can be used to store the + arguments in a single object. For example + + :: + + >>> x[P(1, 2, a=3, b=4)] = val + + is valid syntax, and can be given the appropriate semantics. + +4. The ``lo:hi:step`` syntax for slices is sometimes very useful. This + syntax is not directly available in the work-arounds. However + + :: + + s[lo:hi:step] + + provides a work-around that is available everything, where + + :: + + class S: + def __getitem__(self, key): return key + + s = S() + + defines the helper object `s`. + Rejected Ideas ============== @@ -899,6 +995,8 @@ References (https://mail.python.org/archives/list/python-ideas@python.org/thread/EUGDRTRFIY36K4RM3QRR52CKCI7MIR2M/) .. [#request-2] "PEP 472 -- Support for indexing with keyword arguments" (https://mail.python.org/archives/list/python-ideas@python.org/thread/6OGAFDWCXT5QVV23OZWKBY4TXGZBVYZS/) +.. [#pep-0001] "PEP 1 -- PEP Purpose and Guidelines" + (https://www.python.org/dev/peps/pep-0001/#what-belongs-in-a-successful-pep) Copyright From 702a3e702d7d3a3e97eb08cdbbd4e5c9e52beb5b Mon Sep 17 00:00:00 2001 From: Stefano Borini Date: Mon, 21 Sep 2020 21:42:02 +0100 Subject: [PATCH 26/29] Added double backticks to format inline code --- pep-9999.txt | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/pep-9999.txt b/pep-9999.txt index b6febf7eb5e..9f54907c25a 100644 --- a/pep-9999.txt +++ b/pep-9999.txt @@ -155,9 +155,9 @@ Before attacking the problem of detailing the new syntax and semantics to the indexing notation, it is relevant to analyse how the indexing notation works today, in which contexts, and how it is different from a function call. -Subscripting obj[x] is, effectively, an alternate and specialised form of +Subscripting ``obj[x]`` is, effectively, an alternate and specialised form of function call syntax with a number of differences and restrictions compared to -obj(x). The current python syntax focuses exclusively on position to express +``obj(x)``. The current python syntax focuses exclusively on position to express the index, and also contains syntactic sugar to refer to non-punctiform selection (slices). Some common examples: @@ -167,9 +167,9 @@ selection (slices). Some common examples: >>> a[1:10:2] # slice notation (extract a non-trivial data subset) >>> a[3, 2] # multiple indexes (for multidimensional arrays) -This translates into a __(get|set|del)item__ dunder call which is passed a single -parameter containing the index (for __getitem__ and __delitem__) or two parameters -containing index and value (for __setitem__). +This translates into a ``__(get|set|del)item__`` dunder call which is passed a single +parameter containing the index (for ``__getitem__`` and ``__delitem__``) or two parameters +containing index and value (for ``__setitem__``). The behavior of the indexing call is fundamentally different from a function call in various aspects: From 64b208349c8cce3aa5ae0ca6898a67f46dd76998 Mon Sep 17 00:00:00 2001 From: Stefano Borini Date: Mon, 21 Sep 2020 21:43:53 +0100 Subject: [PATCH 27/29] Fixed phrasing --- pep-9999.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/pep-9999.txt b/pep-9999.txt index 9f54907c25a..1313a952834 100644 --- a/pep-9999.txt +++ b/pep-9999.txt @@ -199,9 +199,9 @@ but only the first one of these is valid >>> x = f(1, 2) >>> f(1, 2) = 5 # invalid -This asymmetry is important to understand that there is a natural imbalance -between the two forms, and therefore it is not a given that the two should -behave transparently and symmetrically. +This asymmetry is important, and makes one understand that there is a natural +imbalance between the two forms. It is therefore not a given that the two +should behave transparently and symmetrically. The third difference is that functions have names assigned to their arguments, unless the passed parameters are captured with \*args, in which case From 706b077620f20d8749842e367c7df008c65d2cc9 Mon Sep 17 00:00:00 2001 From: Stefano Borini Date: Mon, 21 Sep 2020 21:44:58 +0100 Subject: [PATCH 28/29] Fixed more backticks --- pep-9999.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/pep-9999.txt b/pep-9999.txt index 1313a952834..1ecb22ba515 100644 --- a/pep-9999.txt +++ b/pep-9999.txt @@ -207,7 +207,7 @@ The third difference is that functions have names assigned to their arguments, unless the passed parameters are captured with \*args, in which case they end up as entries in the args tuple. In other words, functions already have anonymous argument semantic, exactly like the indexing operation. However, -__(get|set|del)item__ is not always receiving a tuple as the `index` argument +__(get|set|del)item__ is not always receiving a tuple as the ``index`` argument (to be uniform in behavior with \*args). In fact, given a trivial class: @@ -218,7 +218,7 @@ __(get|set|del)item__ is not always receiving a tuple as the `index` argument print(index) The index operation basically forwards the content of the square brackets "as is" -in the `index` argument: +in the ``index`` argument: :: From 9de50067260a464d0aea0894eb612589b6014ad9 Mon Sep 17 00:00:00 2001 From: Stefano Borini Date: Tue, 22 Sep 2020 21:19:38 +0100 Subject: [PATCH 29/29] Assigned official PEP number --- pep-9999.txt => pep-0637.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename pep-9999.txt => pep-0637.txt (99%) diff --git a/pep-9999.txt b/pep-0637.txt similarity index 99% rename from pep-9999.txt rename to pep-0637.txt index 1ecb22ba515..abaede2c0de 100644 --- a/pep-9999.txt +++ b/pep-0637.txt @@ -1,4 +1,4 @@ -PEP: 9999 +PEP: 637 Title: Support for indexing with keyword arguments Version: $Revision$ Last-Modified: $Date$