Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Make generic types non-classes. #468

Closed
ilevkivskyi opened this issue Sep 3, 2017 · 20 comments
Closed

Proposal: Make generic types non-classes. #468

ilevkivskyi opened this issue Sep 3, 2017 · 20 comments

Comments

@ilevkivskyi
Copy link
Member

It is proposed to add special methods __subclass_base__ and __class_getitem__ to CPython, these will allow making generics non-classes thus simplifying them and significantly improving their performance.
@gvanrossum, the main question now is should this be a PEP?

Motivation:

There are three main points of motivation: performance of typing module, metaclass conflicts, and large amount of hacks currently used in typing.

Performance:

The typing module is one of the heaviest and slowest modules in stdlib even with all the optimizations made. Mainly this is because subscripted generics are classes. See also #432. The three main ways how the performance will be improved:

  • Creation of generic classes is slow since the GenericMeta.__new__ is very slow, we will not need it anymore.

  • Very long MROs for generic classes will be twice shorter, they are present because we duplicate the collections.abc inheritance chain in typing.

  • Time of instantiation of generic classes will be improved (this is minor however).

Metaclass conflicts:

All generic types are instances of GenericMeta, so if a user uses a custom metaclass, it is hard to make a corresponding class generic. This is in particular hard for library classes, that a user doesn't control. A workaround is to always mix-in GenericMeta:

class AdHocMeta(GenericMeta, LibraryMeta):
    pass

class UserClass(LibraryBase, Generic[T], metaclass=AdHocMeta):
    ...

but this is not always practical or even possible.

Hacks and bugs that will be removed by this proposal:

  • _generic_new hack that exists since __init__ is not called on instances with a type differing form the type whose __new__ was called, C[int]().__class__ is C.

  • _next_in_mro speed hack will be not necessary since subscription will not create new classes.

  • Ugly sys._getframe hack, this one is particularly nasty, since it looks like we can't remove it without changes outside typing.

  • Currently generics do "dangerous" things with private ABC caches to fix large memory consumption that grows at least as O(N**2), see Optimize ABC caches #383. This point is also important because I would like to re-implement ABCMeta in C. This will allow to reduce Python start-up time and also start-up times for many programs that extensively use ABCs. My implementation passes all tests except test_typing, because I want to make _abc_cache etc. read-only, so that one can't do something like MyABC._abc_cache = "Surprise when updating caches!")

  • Problems with sharing attributes between subscripted generics, see Subscripted generic classes should not have independent class variables #392. Current solution already uses __getattr__ and __setattr__, but it is still incomplete, and solving this without the current proposal will be hard and will need __getattribute__.

  • _no_slots_copy hack, where we clean-up the class dictionary on every subscription thus allowing generics with __slots__.

  • General complexity of typing module, the new proposal will not only allow to remove the above mentioned hacks/bugs, but also simplify the implementation, so that it will be easier to maintain.

Details of the proposal:

New methods API:

  • Idea of __class_getitem__ is very simple, it is an exact analog of __getitem__ with an exception that it is called on a class that defines it, not on its instances, this allows us to avoid GenericMeta.__getitem__.

  • If an object that is not a class object appears in bases of a class definition, the __subclass_base__ is searched on it. If found, it is given an original tuple of bases as an argument. If the result of call is not None, then it is substituted instead of this object. Otherwise, the base is just removed. This is necessary to avoid inconsistent MRO errors, that are currently prevented by manipulations in GnericMeta.__new__. After creating the class, original bases are saved in __orig_bases__ (now this is also done by the metaclass).

Changes necessary in typing module:

Key point is instead of GenericMeta metaclass, we will have GenericAlias class.

Generic will have:

  • a __class_getitem__ that will return instances of GenericAlias which keep track of the original class and type arguments.
  • __init_subclass__ that will properly initialize the subclasses, and perform necessary bookkeeping.

GenericAlias will have:

  • a normal __getitem__ so that it can be further subscripted thus preserving the current API.
  • __call__, __getattr__, and __setattr__ that will simply pass everything to the original class object.
  • __subclass_base__ that will return the original class (or None in some special cases).

The generic versions of collections.abc classes will be simple subclasses like this:

class Sequence(collections.abc.Sequence, Generic[T_co]):
    pass

(typeshed of course will track that Sequence[T_co] inherits from Iterable[T_co] etc.)

Transition plan:

  • Merge the changes into CPython (ideally before the end of September).
  • Branch a separate version of typing for Python 3.7 and simplify it by removing backward compatibility hacks.
  • Update the 3.7 version to use the dedicated CPython API (this might be done in few separate PRs).

Backwards compatibility and impact on users who don't use typing:

This proposal will allow to have practically 100% backwards compatibility with current public typing API. Actually the whole idea of introducing two special methods appeared form the desire to preserve backwards compatibility while solving the above listed problems.
The only two exceptions that I see now are that currently issubclass(List[int], List) returns True, with this proposal it will raise TypeError. Also issubclass(collections.abc.Iterable, typing.Iterable) will return False, which is actually good I think, since currently we have a (virtual) inheritance cycle between them.

With my implementation, see https://github.com/ilevkivskyi/cpython/pull/2/files, I measured negligible effects (under 1%) for regular (non-generic) classes.

@ethanhs
Copy link
Contributor

ethanhs commented Sep 3, 2017

Could you explain how __class_getitem__ differs from this?

@classmethod
def __getitem__(cls, b): 
   ...

Im probably missing something, but based on your description they seem to be the same?

Also the first alpha for 3.7 is in two weeks, so while I think it would be great to have this in 3.7, Im concerned about timing.

Otherwise, I am all for removing hacks and making things faster, and these changes look like cleaner solutions to the Generics complexity.

@gvanrossum
Copy link
Member

gvanrossum commented Sep 4, 2017 via email

@ilevkivskyi
Copy link
Member Author

@ethanhs

Could you explain how __class_getitem__ differs from this?

This will not work, special methods, like __iter__, __len__, __getitem__, etc., are never looked up on instances (class objects in this case), but immediately on their classes (this is a deliberate decision, not a bug in CPython), for example:

>>> class C:
...     @classmethod
...     def __getitem__(cls, item):
...         return item
>>> C[int]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'type' object is not subscriptable

This is why GenericMeta.__getitem__ was necessary.

@ethanhs
Copy link
Contributor

ethanhs commented Sep 4, 2017

Ah, right, because the type object does not allow getitem access, I forgot about that. Then I suppose adding a new dunder method makes a lot of sense.

I built your patched Python and am experimenting with it. I like the idea of the changes, but I want to try to test some real world class usages on performance.

@markshannon
Copy link
Member

Overall, a big 👍 from me.
I think this ought to be a PEP as there are backwards compatibility issues. It doesn't have to a long PEP, I think your summary above is more or less sufficient.

@JukkaL
Copy link
Contributor

JukkaL commented Sep 5, 2017

Another +1 -- I'd love to have this in Python 3.7. I agree with @markshannon -- this should probably be a PEP. In addition to the backward compatbility issues, this also adds two language features (__class_getitem__ and __subclass_base__).

@gvanrossum
Copy link
Member

gvanrossum commented Sep 6, 2017 via email

@JukkaL
Copy link
Contributor

JukkaL commented Sep 7, 2017

I have a follow-up idea. What if we made it possible to define generic types without importing typing? We could perhaps define GenericAlias (the type of __class_getitem__ return values) outside typing in the stdlib (implemented in C). This way we could plausibly make stdlib classes such as list, tuple and Queue support indexing by just adding suitable __class_getitem__ methods:

# No typing import needed!

def first_and_last(x: list[int]) -> tuple[int, int]:
    return x[0], x[-1]

I haven't thought about this carefully yet. GenericAlias would need to support at least generic type aliases:

from typing import TypeVar

T = TypeVar('T')

Namespace = dict[str, T]
def f() -> Namespace[int]: ...

I believe that this could be implemented without bringing in the rest of typing, as that would add too much bloat to the core stdlib.

@ilevkivskyi
Copy link
Member Author

@JukkaL :-) You are reading my thoughts. I was thinking about using __class_getitem__ to allow list[int] etc., but this might require quite a lot of C coding. In general, this is a very reasonable idea, this however will probable also need a simple __init_subclass__ to make subclasses generic, i.e.

class MyList(list[T]):
    ...
lst: MyList[int] # ideally this should work as it works with 'List[T]'

I am not sure I will have time to do all this in time for 3.7 beta 1. My plan is like this:

  • Get the mini-PEP written, accepted and merged (I am finishing writing tests, so PR for CPython will be ready soon).
  • Implement the __class_getitem__ logic in typing in Python.
  • If everything works well and I (or someone else) have time, then I can add C version of the same code to list, dict etc.

@ethanhs
Copy link
Contributor

ethanhs commented Sep 7, 2017

I really like the idea of adding __class_getitem__ to list, dict, etc, and I'd be happy to help with patches. Though those changes would need to be approved through some PEP, no?

@gvanrossum
Copy link
Member

gvanrossum commented Sep 8, 2017 via email

@ilevkivskyi
Copy link
Member Author

@gvanrossum

Yeah, allowing list[int] should also be in a PEP (whether the same one or a different one I'm not sure, but I suggest a different one since it feels less straightforward). Also I kind of regret that we had to go through all the effort of disallowing it first...

To me this is a bit similar to UserList, also at the time of PEP 484 writing it was not clear if typing would get traction, so it was probably wise to have this intermediate step. Yes, I remember it was non-trivial to prohibit list[int] in mypy, but note that it is still incomplete as collections.abc.Iterable[int] is still allowed.

I am also leaning now towards two separate PEPs. The first one as outlined above, and second one implementing list[int], Queue[int], collections.abc.Iterable[int] etc. (this will not require any additional API but would probably require moving GenericAlias to types as @JukkaL said to avoid typing imports, then typing can just define List = list, Iterable = collections.abc.Iterable, etc.)

@ilevkivskyi
Copy link
Member Author

It is two weeks since I posted this idea (PEP 560) on pyton-ideas list, but got no responses (apart from few typos noticed by Eric).
@gvanrossum is it a good sign or a bad sign, and what should be the next step?

@gvanrossum
Copy link
Member

I think it's neither good nor bad -- you merely posted a PEP when a dozen other PEPs were also being debated (spurred on by the core dev sprint).

I would recommend trying again now that things have quieted down, and this time I would post the full text of the PEP to python-ideas -- that's what most people do, and it makes it easier for a certain category of readers to provide comments.

I should mention that @JukkaL is quite eager to get this done (since it promises to be a startup performance booster for heavily-annotated modules and possibly also an instantiation speedup). Myself I am +0; I like the perf boosts, but I fear that it's going to be a lot of work, both for typing.py and for mypy, and I worry that not enough people are familiar with the inner workings of typing.py in particular (even if the net result is that typing.py becomes simpler, there's still the need to support the old way for years until Python 3.6's end of life, and the risk of breaking 3rd party code that does anything at runtime with annotations).

@ilevkivskyi
Copy link
Member Author

OK, I posted PEP 560 to python-ideas once more.

@gvanrossum

but I fear that it's going to be a lot of work, both for typing.py and for mypy

I don't think anything needs to be changed in mypy now (only if we we go with another PEP allowing list[int]). Concerning typing,py I spent quite some time planning this update, so I think everything should go smoothly.

@gvanrossum
Copy link
Member

OK, on second thought I realize this may not require changes to mypy, as how everything's spelled stays the same.

Do you really want to allow list[int]? Won't that require lots of changes to core CPython? And if you allow list[int] you should also be allowed to use the ABCs from collections, for example, and the generic non-abstract types defined today in typing (e.g. re.Match). That seems to be an endless set of stdlib changes (also all generic types defined in the stubs would need to be handled in the stdlib).

@ilevkivskyi
Copy link
Member Author

@gvanrossum

Do you really want to allow list[int]?

No (mostly because I am too lazy for this). This is rather a small bonus for this proposal, maybe someday someone will really want this, then __class_getitem__ and friends would simplify life.

@gvanrossum
Copy link
Member

OK then let's leave that out. Maybe for Python 3.8.

@ilevkivskyi
Copy link
Member Author

ilevkivskyi commented Sep 27, 2017

OK then let's leave that out. Maybe for Python 3.8.

Yes, list[int] is not mentioned in the current version of PEP 560, we can focus on performance improvements and avoiding metaclass conflicts.

@ilevkivskyi
Copy link
Member Author

OK, this now has landed as PEP 560 so can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants