Add 'key' argument to `list.index()` #113773

kristjanvalur · 2024-01-06T14:31:55Z

Feature or enhancement

Proposal:

It would be useful to be able to search a list in a functional-style manner. :

# list with structured data
data = [(i, chr(i)) for i in range(100)]

# find index of "a" in classic manner:
for idx, item in enumerate(data):
    if item[1] == "a":
        break

# functional-style search:
idx = data.index("a", key=lambda i: i[1])

This idea came up in the discussion to pr #112498, in the context of searching for objects in a heap.

Task = namedtuple('Task', ('priority', 'name', 'timestamp', 'callback'))
data: List[Task] = get_data()

# remove "bob"
del data[data.index("bob", key=attrgetter("name"))]

An implementation is provided in pr #113772

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

I mentioned this in pr #112498, as a way to decouple searching and heap removal.
Instead of having

heapq.heapremove(heap, value, key=keymethod)

the desired functionality could be easily provided if list itself provided the search functionality:

heapq.heapremove(heap, heap.index(value, key=keymethod))

Linked PRs

gh-113773: add list.index() "key" named argument #113772

The text was updated successfully, but these errors were encountered:

grigoriev-semyon · 2024-01-06T15:46:13Z

+1
I think it would be useful

Eclips4 · 2024-01-06T15:52:02Z

I'm neutral on this.
Seems that this really solves some use cases and this doesn't break any backward compatibility, so it's probably ok to have this feature :)

terryjreedy · 2024-01-06T17:13:55Z

The example is not very motivating since the answer is data.index((ord('a'), 'a')) which is just ord('a'). A better example would a list of tuples of a character and a random number, so that the exact tuple being searched for is unknown. Can you find more use cases in the stdlib? (Perhaps search for breaks.)

kristjanvalur · 2024-01-06T20:08:07Z

The example is not very motivating since the answer is data.index((ord('a'), 'a'))

It was not supposed to be motivating but illustrate how to find an object without exact equivalence.

The original motivation is in the referenced PR where a search is performed for a dataclass instance based on an attribute, in the functional style.

Since adding this to the c implementation is trivial, it seems preferable compared to a convoluted approach using 'indexOf' and 'map'

It is equivalent to data.index(value) if key is none else indexOf(map(key, data), value) , which does the job but is needlessly complicated.

oscarbenjamin · 2024-01-06T23:11:26Z

Changing the interface of basic types like list and tuple is problematic especially when that interface is specified in an ABC:
https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes

Other types that are designed to mimic list or to satisfy the Sequence ABC would not be compatible with list any more if list changed the signature of its index method.

There already is an indexOf function that can be used to do this:

In [9]: from operator import indexOf

In [10]: data = [(i, chr(i)) for i in range(100)]

In [11]: indexOf(map(lambda c: c[1] == 'a', data), True)
Out[11]: 97

Using indexOf and map like this already works with any sequence (in fact any iterable).

If this is a common enough operation that using indexOf and map is too verbose then a function could be added:

In [12]: def index_key(iterable, key):
    ...:     """Return first item x in iterable such that key(x) is True"""
    ...:     return indexOf(map(key, iterable), True)
    ...: 

In [14]: index_key(data, lambda c: c[1] == 'a')
Out[14]: 97

If it is not worth adding a function for this that could work for any iterable then it is definitely not worth changing the interface of basic concrete types like list or the Sequence ABC.

kristjanvalur · 2024-01-07T12:46:38Z

Sure, we could modify indexOf or add another one to operators since that one has a weird non-pep8 name. That would complement this idea nicely.

But the motivation here is specifically in the context of heaps which are based on concrete lists and where performance is important. The use of key functions is prevalent in the lib when searching for things, for example in the bisect module.

Regarding ABCs: list.index() has optional start and stop positional-only arguments and thus already has a different signature from Sequence.index(). Adding optional functionality does not break inheritance, any more than adding methods or overloading does.

Anyway, this is convenience/performance feature only, such as many features are, and maybe should be considered only in the context of #112498.

oscarbenjamin · 2024-01-07T13:46:31Z

Anyway, this is convenience/performance feature only

Then it needs a performance justification:

In [12]: N = 1000000

In [13]: a = list(range(N))

In [14]: %time a.index(N-1)
CPU times: user 44.9 ms, sys: 2 µs, total: 44.9 ms
Wall time: 44.3 ms
Out[14]: 999999

In [15]: from operator import indexOf

In [16]: %time indexOf(a, N-1)
CPU times: user 47.2 ms, sys: 0 ns, total: 47.2 ms
Wall time: 47 ms
Out[16]: 999999

In [17]: %timeit a.index(N-1)
17 ms ± 123 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [18]: %timeit indexOf(a, N-1)
22.2 ms ± 323 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

There is not much speed difference between list.index and indexOf. Throw in a key function and that speed difference will be less significant:

In [20]: %time indexOf(map(abs, a), N-1)
CPU times: user 85.2 ms, sys: 0 ns, total: 85.2 ms
Wall time: 85 ms
Out[20]: 999999

In [22]: %time indexOf(map(lambda a: a if a >= 0 else -a, a), N-1)
CPU times: user 132 ms, sys: 1 µs, total: 132 ms
Wall time: 132 ms
Out[22]: 999999

In [24]: %timeit indexOf(map(abs, a), N-1)
46.4 ms ± 287 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [25]: %timeit indexOf(map(lambda a: a if a >= 0 else -a, a), N-1)
109 ms ± 6.39 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

A list method is significantly less general than using indexOf or making a function that can work with any iterable.

In the first instance it would be better to see if indexOf can be optimised in any simple way rather than changing the interface of list.index.

kristjanvalur · 2024-01-07T13:57:01Z

Fair enough. I didn't realize that indexOf had a c implementation, essentially PySequence_Index(). A PySequence_IndexKey() can be trivially added and the indexOf modified accordingly. Similarly, countOf could get the same enhancement, PySequence_CountKey().

Is everyone happy with indexOf and countOf?, btw? Would this be an opportunity to rename them (index_of) and keep the old ones around as aliases? Or is there some deeper significance to their strange capitalization? After all, they don't represent real operators.

kristjanvalur · 2024-01-07T13:58:50Z

Also, I should note, list.index() does not have any .rst documentation. Probably many of the old builtins don't. But that is why I was surprised when I stumbled upon its start and stop arguments.

pochmann3 · 2024-01-07T17:46:51Z

list.index() does not have any .rst documentation

It does, just not in the section specific to lists but in "Common Sequence Operations" of Sequence Types — list, tuple, range. Which the lists section also references: "Lists implement all of the common and mutable sequence operations. Lists also provide the following additional method:"

sobolevn · 2024-01-07T19:44:05Z

One more thing: can you please also provide some prototype of the type-hint that this new parameter should have? This would be very helpful for us in typeshed.

Right now this seems rather hard in my opinion :(

kristjanvalur · 2024-01-08T15:51:33Z

It does, just not in the section specific to lists but in "Common Sequence Operations" of [Sequence Types — list, tuple, range]

I see. It also mentions that "start" and "stop" arguments are not supported by all implementations. I only saw it for list, maybe it exists for tuple too. But anyway, it seems that modifying or adding an argument like this to a single implementation goes a bit against the grain. I guess enhancing indexOf with would work, but once we start using functional operators, there is perhaps no need create a special case for somethign which can be done with map.

A user can simply define

def index_of_key(seq, val, *, key=None):
  return indexOf(seq if key is None else map(key, seq), val)

I guess we can drop this idea, then. Still, good talk.

kristjanvalur · 2024-01-08T16:04:54Z

One more thing: can you please also provide some prototype of the type-hint that this new parameter should have? This would be very helpful for us in typeshed.

Type annotations is only my third language, but it is something like:

T = TypeVar("T")
V = TypeVar("V")
class List[T];
    @overload
    def index(value: T, start: int=0, stop: int=maxsize, /) -> int:
       ...
    @overload
    def index(value: V, start: int=0, stop: int=maxsize, /, *, key: Callable[[T], V]) -> int:
        ...
    def index(value: Any, start: int=0, stop: int=maxsize, /, *, key: Optional[Callable[[T], Any]]) -> int:
       pass

Yup, messy. The type of the first argument changes depending on the presence of a key argument. Again, does not speak in the favour of this proposal :(

kristjanvalur · 2024-01-09T13:03:19Z

Actually, I guess the above is incorrect. In the end, a comparison is performed, PyObject_RichCompareBool(item, value), which can invoke the comparison operators on the list values. value isn't restricted to T:

T = TypeVar("T")
class List[T];
    def index(value: Any, start: int=0, stop: int=maxsize, /, *, key: Optional[Callable[[T], Any]]) -> int:
       ...

It is interesting to note that the left side in the comparson is "item" and not "value" which makes it hard to create special comparison object to search for. the magic methods (__eq__ et al) are invoked for the list items, not the value. An oversight, maybe? (this is consistent with what PySequence_Index() does, sequence item is on the left of the comparison.)

encukou · 2024-03-13T13:23:13Z

Is everyone happy with indexOf and countOf?

I'd be happy with adding key for those, yes. One can use a map or generator expression, but that's not as obvious.
Changing list/sequence API is, of course, a much bigger deal -- IMO that would need discussion on Discourse not just on an issue here.

Note that operator has both a C implementation and a Python one, as per PEP-399.

Would this be an opportunity to rename them (index_of) and keep the old ones around as aliases?

That should be a separate change.
But it's OK to let them be; PEP 8 specifically says:

Some other good reasons to ignore a particular guideline: [...]

Because the code in question predates the introduction of the guideline and there is no other reason to be modifying that code.

kristjanvalur · 2024-03-13T16:22:04Z

I've been persuaded that changing the list.index() method is not a good idea. I guess I should close this. adding key to indexOf might be much simpler. I guess I should close this?

encukou · 2024-03-13T16:31:34Z

Yes, I guess it's best to close this issue and open a new one; reusing it for indexOf could be confusing.

kristjanvalur added the type-feature A feature request or enhancement label Jan 6, 2024

bedevere-app bot mentioned this issue Jan 6, 2024

gh-113773: add list.index() "key" named argument #113772

Closed

grigoriev-semyon mentioned this issue Jan 6, 2024

Add key argument to tuple.index() #113779

Closed

kristjanvalur closed this as completed Mar 13, 2024

kristjanvalur closed this as not planned Won't fix, can't repro, duplicate, stale Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 'key' argument to `list.index()` #113773

Add 'key' argument to `list.index()` #113773

kristjanvalur commented Jan 6, 2024 •

edited

grigoriev-semyon commented Jan 6, 2024

Eclips4 commented Jan 6, 2024

terryjreedy commented Jan 6, 2024

kristjanvalur commented Jan 6, 2024 •

edited

oscarbenjamin commented Jan 6, 2024

kristjanvalur commented Jan 7, 2024

oscarbenjamin commented Jan 7, 2024

kristjanvalur commented Jan 7, 2024 •

edited

kristjanvalur commented Jan 7, 2024

pochmann3 commented Jan 7, 2024 •

edited

sobolevn commented Jan 7, 2024

kristjanvalur commented Jan 8, 2024 •

edited

kristjanvalur commented Jan 8, 2024

kristjanvalur commented Jan 9, 2024

encukou commented Mar 13, 2024

kristjanvalur commented Mar 13, 2024

encukou commented Mar 13, 2024

Add 'key' argument to list.index() #113773

Add 'key' argument to list.index() #113773

Comments

kristjanvalur commented Jan 6, 2024 • edited

Feature or enhancement

Proposal:

Has this already been discussed elsewhere?

Links to previous discussion of this feature:

Linked PRs

grigoriev-semyon commented Jan 6, 2024

Eclips4 commented Jan 6, 2024

terryjreedy commented Jan 6, 2024

kristjanvalur commented Jan 6, 2024 • edited

oscarbenjamin commented Jan 6, 2024

kristjanvalur commented Jan 7, 2024

oscarbenjamin commented Jan 7, 2024

kristjanvalur commented Jan 7, 2024 • edited

kristjanvalur commented Jan 7, 2024

pochmann3 commented Jan 7, 2024 • edited

sobolevn commented Jan 7, 2024

kristjanvalur commented Jan 8, 2024 • edited

kristjanvalur commented Jan 8, 2024

kristjanvalur commented Jan 9, 2024

encukou commented Mar 13, 2024

kristjanvalur commented Mar 13, 2024

encukou commented Mar 13, 2024

Add 'key' argument to `list.index()` #113773

Add 'key' argument to `list.index()` #113773

kristjanvalur commented Jan 6, 2024 •

edited

kristjanvalur commented Jan 6, 2024 •

edited

kristjanvalur commented Jan 7, 2024 •

edited

pochmann3 commented Jan 7, 2024 •

edited

kristjanvalur commented Jan 8, 2024 •

edited