Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'key' argument to list.index() #113773

Closed
kristjanvalur opened this issue Jan 6, 2024 · 17 comments
Closed

Add 'key' argument to list.index() #113773

kristjanvalur opened this issue Jan 6, 2024 · 17 comments
Labels
type-feature A feature request or enhancement

Comments

@kristjanvalur
Copy link
Contributor

kristjanvalur commented Jan 6, 2024

Feature or enhancement

Proposal:

It would be useful to be able to search a list in a functional-style manner. :

# list with structured data
data = [(i, chr(i)) for i in range(100)]

# find index of "a" in classic manner:
for idx, item in enumerate(data):
    if item[1] == "a":
        break

# functional-style search:
idx = data.index("a", key=lambda i: i[1])

This idea came up in the discussion to pr #112498, in the context of searching for objects in a heap.

Task = namedtuple('Task', ('priority', 'name', 'timestamp', 'callback'))
data: List[Task] = get_data()

# remove "bob"
del data[data.index("bob", key=attrgetter("name"))]

An implementation is provided in pr #113772

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

I mentioned this in pr #112498, as a way to decouple searching and heap removal.
Instead of having

heapq.heapremove(heap, value, key=keymethod)

the desired functionality could be easily provided if list itself provided the search functionality:

heapq.heapremove(heap, heap.index(value, key=keymethod))

Linked PRs

@kristjanvalur kristjanvalur added the type-feature A feature request or enhancement label Jan 6, 2024
@grigoriev-semyon
Copy link
Contributor

+1
I think it would be useful

@Eclips4
Copy link
Member

Eclips4 commented Jan 6, 2024

I'm neutral on this.
Seems that this really solves some use cases and this doesn't break any backward compatibility, so it's probably ok to have this feature :)

@terryjreedy
Copy link
Member

The example is not very motivating since the answer is data.index((ord('a'), 'a')) which is just ord('a'). A better example would a list of tuples of a character and a random number, so that the exact tuple being searched for is unknown. Can you find more use cases in the stdlib? (Perhaps search for breaks.)

@kristjanvalur
Copy link
Contributor Author

kristjanvalur commented Jan 6, 2024

The example is not very motivating since the answer is data.index((ord('a'), 'a'))

It was not supposed to be motivating but illustrate how to find an object without exact equivalence.

The original motivation is in the referenced PR where a search is performed for a dataclass instance based on an attribute, in the functional style.

Since adding this to the c implementation is trivial, it seems preferable compared to a convoluted approach using 'indexOf' and 'map'

It is equivalent to data.index(value) if key is none else indexOf(map(key, data), value) , which does the job but is needlessly complicated.

@oscarbenjamin
Copy link
Contributor

Changing the interface of basic types like list and tuple is problematic especially when that interface is specified in an ABC:
https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes

Other types that are designed to mimic list or to satisfy the Sequence ABC would not be compatible with list any more if list changed the signature of its index method.

There already is an indexOf function that can be used to do this:

In [9]: from operator import indexOf

In [10]: data = [(i, chr(i)) for i in range(100)]

In [11]: indexOf(map(lambda c: c[1] == 'a', data), True)
Out[11]: 97

Using indexOf and map like this already works with any sequence (in fact any iterable).

If this is a common enough operation that using indexOf and map is too verbose then a function could be added:

In [12]: def index_key(iterable, key):
    ...:     """Return first item x in iterable such that key(x) is True"""
    ...:     return indexOf(map(key, iterable), True)
    ...: 

In [14]: index_key(data, lambda c: c[1] == 'a')
Out[14]: 97

If it is not worth adding a function for this that could work for any iterable then it is definitely not worth changing the interface of basic concrete types like list or the Sequence ABC.

@kristjanvalur
Copy link
Contributor Author

Sure, we could modify indexOf or add another one to operators since that one has a weird non-pep8 name. That would complement this idea nicely.

But the motivation here is specifically in the context of heaps which are based on concrete lists and where performance is important. The use of key functions is prevalent in the lib when searching for things, for example in the bisect module.

Regarding ABCs: list.index() has optional start and stop positional-only arguments and thus already has a different signature from Sequence.index(). Adding optional functionality does not break inheritance, any more than adding methods or overloading does.

Anyway, this is convenience/performance feature only, such as many features are, and maybe should be considered only in the context of #112498.

@oscarbenjamin
Copy link
Contributor

Anyway, this is convenience/performance feature only

Then it needs a performance justification:

In [12]: N = 1000000

In [13]: a = list(range(N))

In [14]: %time a.index(N-1)
CPU times: user 44.9 ms, sys: 2 µs, total: 44.9 ms
Wall time: 44.3 ms
Out[14]: 999999

In [15]: from operator import indexOf

In [16]: %time indexOf(a, N-1)
CPU times: user 47.2 ms, sys: 0 ns, total: 47.2 ms
Wall time: 47 ms
Out[16]: 999999

In [17]: %timeit a.index(N-1)
17 ms ± 123 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [18]: %timeit indexOf(a, N-1)
22.2 ms ± 323 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

There is not much speed difference between list.index and indexOf. Throw in a key function and that speed difference will be less significant:

In [20]: %time indexOf(map(abs, a), N-1)
CPU times: user 85.2 ms, sys: 0 ns, total: 85.2 ms
Wall time: 85 ms
Out[20]: 999999

In [22]: %time indexOf(map(lambda a: a if a >= 0 else -a, a), N-1)
CPU times: user 132 ms, sys: 1 µs, total: 132 ms
Wall time: 132 ms
Out[22]: 999999

In [24]: %timeit indexOf(map(abs, a), N-1)
46.4 ms ± 287 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [25]: %timeit indexOf(map(lambda a: a if a >= 0 else -a, a), N-1)
109 ms ± 6.39 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

A list method is significantly less general than using indexOf or making a function that can work with any iterable.

In the first instance it would be better to see if indexOf can be optimised in any simple way rather than changing the interface of list.index.

@kristjanvalur
Copy link
Contributor Author

kristjanvalur commented Jan 7, 2024

Fair enough. I didn't realize that indexOf had a c implementation, essentially PySequence_Index(). A PySequence_IndexKey() can be trivially added and the indexOf modified accordingly. Similarly, countOf could get the same enhancement, PySequence_CountKey().

Is everyone happy with indexOf and countOf?, btw? Would this be an opportunity to rename them (index_of) and keep the old ones around as aliases? Or is there some deeper significance to their strange capitalization? After all, they don't represent real operators.

@kristjanvalur
Copy link
Contributor Author

Also, I should note, list.index() does not have any .rst documentation. Probably many of the old builtins don't. But that is why I was surprised when I stumbled upon its start and stop arguments.

@pochmann3
Copy link

pochmann3 commented Jan 7, 2024

list.index() does not have any .rst documentation

It does, just not in the section specific to lists but in "Common Sequence Operations" of Sequence Types — list, tuple, range. Which the lists section also references: "Lists implement all of the common and mutable sequence operations. Lists also provide the following additional method:"

@sobolevn
Copy link
Member

sobolevn commented Jan 7, 2024

One more thing: can you please also provide some prototype of the type-hint that this new parameter should have? This would be very helpful for us in typeshed.

Right now this seems rather hard in my opinion :(

@kristjanvalur
Copy link
Contributor Author

kristjanvalur commented Jan 8, 2024

It does, just not in the section specific to lists but in "Common Sequence Operations" of [Sequence Types — list, tuple, range]

I see. It also mentions that "start" and "stop" arguments are not supported by all implementations. I only saw it for list, maybe it exists for tuple too. But anyway, it seems that modifying or adding an argument like this to a single implementation goes a bit against the grain. I guess enhancing indexOf with would work, but once we start using functional operators, there is perhaps no need create a special case for somethign which can be done with map.

A user can simply define

def index_of_key(seq, val, *, key=None):
  return indexOf(seq if key is None else map(key, seq), val) 

I guess we can drop this idea, then. Still, good talk.

@kristjanvalur
Copy link
Contributor Author

One more thing: can you please also provide some prototype of the type-hint that this new parameter should have? This would be very helpful for us in typeshed.

Type annotations is only my third language, but it is something like:

T = TypeVar("T")
V = TypeVar("V")
class List[T];
    @overload
    def index(value: T, start: int=0, stop: int=maxsize, /) -> int:
       ...
    @overload
    def index(value: V, start: int=0, stop: int=maxsize, /, *, key: Callable[[T], V]) -> int:
        ...
    def index(value: Any, start: int=0, stop: int=maxsize, /, *, key: Optional[Callable[[T], Any]]) -> int:
       pass

Yup, messy. The type of the first argument changes depending on the presence of a key argument. Again, does not speak in the favour of this proposal :(

@kristjanvalur
Copy link
Contributor Author

Actually, I guess the above is incorrect. In the end, a comparison is performed, PyObject_RichCompareBool(item, value), which can invoke the comparison operators on the list values. value isn't restricted to T:

T = TypeVar("T")
class List[T];
    def index(value: Any, start: int=0, stop: int=maxsize, /, *, key: Optional[Callable[[T], Any]]) -> int:
       ...

It is interesting to note that the left side in the comparson is "item" and not "value" which makes it hard to create special comparison object to search for. the magic methods (__eq__ et al) are invoked for the list items, not the value. An oversight, maybe? (this is consistent with what PySequence_Index() does, sequence item is on the left of the comparison.)

@encukou
Copy link
Member

encukou commented Mar 13, 2024

Is everyone happy with indexOf and countOf?

I'd be happy with adding key for those, yes. One can use a map or generator expression, but that's not as obvious.
Changing list/sequence API is, of course, a much bigger deal -- IMO that would need discussion on Discourse not just on an issue here.

Note that operator has both a C implementation and a Python one, as per PEP-399.

Would this be an opportunity to rename them (index_of) and keep the old ones around as aliases?

That should be a separate change.
But it's OK to let them be; PEP 8 specifically says:

Some other good reasons to ignore a particular guideline: [...]

  • Because the code in question predates the introduction of the guideline and there is no other reason to be modifying that code.

@kristjanvalur
Copy link
Contributor Author

I've been persuaded that changing the list.index() method is not a good idea. I guess I should close this. adding key to indexOf might be much simpler. I guess I should close this?

@encukou
Copy link
Member

encukou commented Mar 13, 2024

Yes, I guess it's best to close this issue and open a new one; reusing it for indexOf could be confusing.

@kristjanvalur kristjanvalur closed this as not planned Won't fix, can't repro, duplicate, stale Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

8 participants