Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many functions that accept Iterable should actually accept Iterable | SupportsGetItem #7813

Closed
matangover opened this issue May 9, 2022 · 28 comments

Comments

@matangover
Copy link
Contributor

matangover commented May 9, 2022

Many functions in stdlib that are documented to accept an "iterable" are annotated as Iterable[T]. This means the arguments must have __iter__. However, the actual functions don't require the arguments to have __iter__, they also work with arguments that have only __getitem__. That's because internally they use iter (in Python) or PySequence_Fast (in C).

As the docs for both iter and PySequence_Fast functions show, these functions support two protocols for the argument: the "iterable" protocol (__iter__) and the "sequence" protocol (__getitem__).

Example functions that have this bug right now:

enumerate
iter
str.join
bytes.__new__
bytes.join

For enumerate, a bug was actually filed on the mypy repo but in fact I believe this should be corrected in typeshed.

Aside: for loops also similarly accept "iterables" that have only __getitem__, but that is implemented inside type checkers themselves. For now mypy still doesn't support it properly, but Pyright does.

@hauntsaninja
Copy link
Collaborator

Related python/mypy#2220

@matangover
Copy link
Contributor Author

I suggest adding to _typeshed an alias AnyIterable: TypeAlias = Union[Iterable | SupportsGetItem]

@matangover
Copy link
Contributor Author

Or, well, since SupportsGetItem has two type parameters that won't exactly work, but something along those lines.

@JelleZijlstra
Copy link
Member

This affects every single place where Iterable is accepted as an argument, except in the unlikely case that the code directly calls .__iter__.

We could add an alias like you suggest and use it everywhere, but that doesn't seem like a great solution. The alias wouldn't get used in code outside typeshed, it would be harder for users to pick up this special case, and type checker error messages may get worse.

@matangover
Copy link
Contributor Author

The alias wouldn't get used in code outside typeshed, it would be harder for users to pick up this special case

I don't think it's common to call iter or PySequence_Fast on function arguments in "user" code... At least I never do that in my code. Do you suggest to add a new type to typing_extensions or typing?

type checker error messages may get worse

True. But in a minor way. It would probably only change from showing "Iterable" to "SequenceOrIterable". Which is in fact more correct and less misleading.

@matangover
Copy link
Contributor Author

In the end, the job of typeshed is to correctly annotate the stdlib

@JelleZijlstra
Copy link
Member

I don't think it's common to call iter or PySequence_Fast on function arguments in "user" code... At least I never do that in my code. Do you suggest to add a new type to typing_extensions or typing?

Anything useful you do with an iterable will ultimately call one of those functions (or some other part of the Python implementation that supports the __getitem__-based protocol).

@matangover
Copy link
Contributor Author

Right. So maybe indeed the right thing is to add it to typing or collections.abc. Maybe we could fix typeshed's annotations first, and later if a new type is introduced to Python it can be adopted.

@TeamSpen210
Copy link
Contributor

I don't think typeshed needs changes here, the problem is that Iterable has special additional behaviour not representable in the stub. It'd need to be handled specially in the type checker.

@matangover
Copy link
Contributor Author

matangover commented May 9, 2022

I disagree, the type annotations are wrong. They specify the argument has to have __iter__ but it's not true. This is really easy to fix in typeshed.

@matangover
Copy link
Contributor Author

matangover commented May 9, 2022

special additional behaviour not representable in the stub

Wait what behavior do you mean? Apart from having __getitem__?

@JelleZijlstra
Copy link
Member

JelleZijlstra commented May 9, 2022

The behavior is representable in a stub. Iterable would be defined something like this:

class _IterIterable(Protocol[_T_co]):
    def __iter__(self) -> Iterator[_T_co]: ...
class _GetItemIterable(Protocol[_T_co]):
    def __getitem__(self, __idx: int) -> _T_co: ...
Iterable: TypeAlias = _IterIterable[_T_co] | _GetItemIterable[_T_co]

But I still don't like this change much. Suppose we add a typing_extensions.RealIterable defined like the above. And then:

  • We get generations of confused users who need to be explained what the difference is and why it matters. (With a 99% chance that it doesn't actually matter to them.)
  • We still need to maintain the existing Iterable ABC so people can use it as a base class.
  • Library authors will be pressured to change their type annotations from Iterable to RealIterable, creating tons of work throughout the ecosystem.

And all that for an edge case so rare that it's only been reported a handful of times in mypy's years of history.

@TeamSpen210
Copy link
Contributor

Even that's not exactly correct, since it doesn't do the __iter__ = None behaviour, and since unions aren't ordered this is ambiguous with conflicting getitem/iter definitions - mappings for instance. It's just another case where magic methods have unusual behaviour.

@matangover
Copy link
Contributor Author

And all that for an edge case so rare that it's only been reported a handful of times in mypy's years of history.

I ran into it with torch.utils.data.Dataset, which implements __getitem__ but not __iter__. It works great with stdlib functions at runtime but generates lots of false type errors. A lot of people use this class.

@matangover matangover changed the title Many functions that accept an Iterable should actually accept Iterable SupportsGetItem Many functions that accept Iterable should actually accept Iterable | SupportsGetItem May 9, 2022
@matangover
Copy link
Contributor Author

I really wouldn't mind if the fix is in type checkers instead of in typeshed. But I do think there should be a fix :) Personally I think that having a special case for Iterable in all type checkers is an "even worse" solution.

@erictraut maybe you could have an opinion or suggestion here

@erictraut
Copy link
Contributor

The fix here cannot (or rather should not) be provided by type checkers. If this problem needs to be fixed, it should be done so in the stubs.

I also question whether this is a problem that needs fixing. As Jelle indicated, this seems like an extreme edge case that rarely comes up. Perhaps we'd be better off documenting a workaround?

In addition to the downsides that Jelle mentioned, I'll add one more: replacing Iterable with a union of two generic protocol types will likely have a measurable impact on type checking performance for many code bases. This is such a core type that is used in so many common stdlib methods, and forcing type checkers to evaluate two protocols instead of one will likely add to analysis times.

@matangover
Copy link
Contributor Author

Thanks for weighing in, good points

Perhaps we'd be better off documenting a workaround?

Do you mean implementing __iter__? (i.e. in this case, the developers of PyTorch adding Dataset.__iter__)

@erictraut
Copy link
Contributor

Do you mean implementing __iter__?

Or wrapping it in a call to the list constructor. I realize this has a bit of runtime overhead, but in most cases where the number of elements is small, that's a reasonable workaround.

@matangover
Copy link
Contributor Author

Or wrapping it in a call to the list constructor

Definitely not an option for Dataset

@hauntsaninja
Copy link
Collaborator

I opened a PR against torch that might help :-) pytorch/pytorch#77116

@matangover
Copy link
Contributor Author

matangover commented May 10, 2022

Here's another possible workaround, assuming I can't modify original library's code (as @hauntsaninja did), and I want to have a function accepting any iterable (e.g. pytorch Dataset and other "normal" iterables), with only a thin wrapper without coercing to list:

from typing import Generic, Iterable, Iterator, Protocol, TypeVar, Union
from typing_extensions import TypeAlias


T_co = TypeVar("T_co", covariant=True)
class GetItemIterable(Protocol[T_co]):
    def __getitem__(self, __idx: int) -> T_co: ...
AnyIterable: TypeAlias = Union[Iterable[T_co], GetItemIterable[T_co]]

class GetItemIterableWrapper(Generic[T_co]):
    def __init__(self, iterable: GetItemIterable[T_co]):
        self.iterable = iterable
    def __iter__(self) -> Iterator[T_co]:
        return iter(self.iterable) # type: ignore

def to_iterable(x: AnyIterable[T_co]) -> Iterable[T_co]:
    return x if isinstance(x, Iterable) else GetItemIterableWrapper(x)

# Example use:

class MyIterable:
    def __getitem__(self, idx: int) -> str:
        if idx >= 3:
            raise IndexError()
        return "blah"

def strjoin1(sep: str, iterable: AnyIterable[str]):
    # The following line gives type error (but works at runtime)
    return sep.join(iterable)

def strjoin2(sep: str, iterable: AnyIterable[str]):
    # The following line gives no error
    return sep.join(to_iterable(iterable))

print("Testing __iter__ iterable:")
print(strjoin1(" ", ["1", "2", "3"]))
print(strjoin2(" ", ["1", "2", "3"]))

print("\nTesting __getitem__ iterable:")
print(strjoin1(" ", MyIterable()))
print(strjoin2(" ", MyIterable()))

@hauntsaninja
Copy link
Collaborator

hauntsaninja commented May 10, 2022

Hmm, I guess I would be in favour of a single typeshed change: allow iter to accept a GetItemIterable. Then you could use iter(MyIterable()) in places that expect an Iterator. Hopefully in most situations this is most of what you need and you can get away without the friction of defining your own protocols and helpers as above.

@matangover
Copy link
Contributor Author

Yes, that would remove the need for type: ignore in my example. The rest would still be needed.

@matangover
Copy link
Contributor Author

matangover commented May 10, 2022

Oh, you are totally right, it isn't! Because an iterator is also iterable...

@hauntsaninja
Copy link
Collaborator

Hopefully #7817 will go some way towards making things more ergonomic :-)

@AlexWaygood
Copy link
Member

Thanks for the report, and I agree that this is one of 9857 things that are slightly annoying about Python's typing system. I think I speak for my fellow maintainers, though, when I say that we're not going to make the change you're asking for here. While it's true that objects with __getitem__ but not __iter__ do support iteration, I think that should properly be viewed as an accident of history rather than something we should bend over backwards to account for in typeshed. If a class is meant to be iterable, it should generally define __iter__.

I'm glad @hauntsaninja's ingenious #7817 was merged, and I hope his pytorch PR is also merged :)

@matangover
Copy link
Contributor Author

Thanks all for your insightful comments, I totally agree with what you said @AlexWaygood. I think the change applied by @hauntsaninja is an excellent workaround. One thing that could be done is to document this somewhere, but seems like there is no real "Python typing docs" place, only the type checkers document this individually

@JelleZijlstra
Copy link
Member

We could put up an essay on typing.readthedocs.io, similar to https://typing.readthedocs.io/en/latest/source/unreachable.html about exhaustiveness checking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants