Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type annotation erasure at compile time #400

Closed
ambv opened this issue Mar 10, 2017 · 27 comments
Closed

Type annotation erasure at compile time #400

ambv opened this issue Mar 10, 2017 · 27 comments

Comments

@ambv
Copy link
Contributor

ambv commented Mar 10, 2017

Problem definition

Current usage of type hints shows the following patterns:

  • users often use "Dict", "List", "Set" without adding the required imports from typing; this is especially possible in case of type comments which don't fail at runtime;
  • users try to put generics on built-in collections, which again mostly happens in type comments which don't fail at runtime;
  • users get confused as to why there is List but there's no Str.

On top of this, I noticed the following tricky typing situation (generalization of actual code at FB):

if TYPE_CHECKING:
    from expensive_unrelated_module import SomeType  # noqa
    from dependency_cycle import CycleType  # noqa

def function(arg: 'SomeType') -> 'CycleType':
    from dependency_cycle import CycleType  # noqa
    return CycleType(arg)

You cannot simply write SomeType or CycleType because it would fail at runtime. So you wrap it in strings. But when you wrap, linters start reporting the imports as unused, or shadowing previous unused imports. So you need to additionally add silencing comments. The resulting code is pretty hideous.

Solution

I'm proposing revisiting the PEP 484 suggestion to make all annotations evaluate to strings at runtime with Python 3.7.

Rationale:

  • this lets us seamlessly integrate generics on builtin collections without the need to modify C code to support indexing;
  • this provides seamless forward reference support;
  • this gives the user seamless access to imports made in the if TYPE_CHECKING: block;
  • this accelerates import time and decreases memory usage at runtime.

Details:

  • access to an evaluated expression is still possible via typing.get_type_hints();
  • the expression is still expected to be syntactically valid and using known names;
  • for 3.7 a future import would be required, tentatively called static_annotations (2 characters longer than absolute_imports, 4 characters than with_statement and print_function)
  • no other behavioral changes are expected.

Summing up, I think this greatly improves the user experience.

@ambv ambv changed the title Type annotation erasure at runtime Type annotation erasure at parse time Mar 10, 2017
@ambv
Copy link
Contributor Author

ambv commented Mar 10, 2017

If we get an agreement, I'm going to PEPify it and implement it.

@ilevkivskyi
Copy link
Member

I think I am +1 on this, but I would prefer to not make this a default behaviour in 3.7 (at least for one release cycle or as it was done for PEP 479) but rather use a flag or __future__ import. I think there are many people who still rely on annotations being evaluated.

Also I noticed you changed title to "... at parse time", but note that some syntax errors are detected in ast.c i.e. after actual parsing (this is related to Python having very simple parser), so that I would rather "stringify" annotations at a later stage in compile.c (this has a bonus of having them in AST in a form of expression, not string)

@ambv
Copy link
Contributor Author

ambv commented Mar 10, 2017

I agree it should be in compile.c.

@ambv ambv changed the title Type annotation erasure at parse time Type annotation erasure at compile time Mar 10, 2017
@JukkaL
Copy link
Contributor

JukkaL commented Mar 10, 2017

I agree that something like this would be better than the current situation.

There are bunch of cases where this wouldn't help as the type is used outside an annotation. Examples (there may be others):

  • Alias = <type>
  • cast(<type>, e)
  • T = TypeVar('T', <type>, ...)
  • T = TypeVar('T', bound=<type>)
  • T = NewType('T', <type>)
  • Non-class-based syntax for named tuples

For these we could continue to use the existing workarounds, such as List[x] or string literal escaping. However, it may be best to deprecate List[x] and friends and recommend exclusively using string literal escaping with the new behavior so that there would no longer be any list[x] vs. List[x] inconsistency.

@ambv
Copy link
Contributor Author

ambv commented Mar 10, 2017

Yes, for some cases we would need to keep using the current syntax but for most things it could go away. As for aliases, you'd put them in if TYPE_CHECKING:. And note what's possible now:

T: TypeVar
T: TypeVar[T1, T2, ...]
T: BoundTypeVar[SomeType]
T: NewType[<type>]

Solves our problem and looks more elegant.

@JukkaL
Copy link
Contributor

JukkaL commented Mar 10, 2017

As discussed in python/mypy#2869, generic base classes would also not support this syntax, and there we also can't use string literal escaping. We could continue to write class C(List[int]):, though. String literal escaping also tricky for type aliases -- it's unclear how we'd know whether x = 'list[int]' is supposed to define a type alias or not. So deprecating typing.List and others would perhaps not be possible.

@ilevkivskyi
Copy link
Member

ilevkivskyi commented Mar 10, 2017

@JukkaL

However, it may be best to deprecate List[x] and friends and recommend exclusively using string literal escaping with the new behavior so that there would no longer be any list[x] vs. List[x] inconsistency.

But what about using them in base classes? I think some people still might want to write:

class UserList(List[X]):
    ...
class MyTuple(Tuple[int, str]):
    ...

etc.


@ambv
T: TypeVar is interesting, but note that this leaves name T uninitialized, so that class Node(Generic[T]): ... will raise NameError. Maybe we can do something like:

T: TypeVar = (str, int) # for variable with type restrictions
T: TypeVar = Bound[SomeType]
T: NewType = OldType

etc.

@ilevkivskyi
Copy link
Member

@JukkaL
Looks like we have a race condition here :-)

@ambv
Copy link
Contributor Author

ambv commented Mar 10, 2017

I didn't propose to downright deprecate List, and I also just wanted to signal that there are ways in which we can solve TypeVars. Defining new collection types is the outlier situation that would need to exercise the current style. Otherwise we can put most things either in if TYPE_CHECKING or as function/variable annotations.

But before we dive deep into followups from this idea, I'd like to make sure I get the "go ahead" from @gvanrossum first :)

@drhagen
Copy link

drhagen commented Mar 10, 2017

My understanding was that access to the types at runtime (like reflection) was a large motivator for resolving the annotations rather than leaving them as strings. But the need to wrap some types in strings comes almost entirely from these names being resolved at definition time rather than at runtime. Python could eliminate the need to string wrap forward references while increasing the types available at runtime if it parsed the annotations, but kept them as expressions (probably behind some kind of getter) that could be evaluated at runtime.

@ilevkivskyi
Copy link
Member

Just another minor (and probably obvious) idea: we could add a decorator @eval_type_hints with the same optional arguments (glabals and locals) that get_type_hints. It will just evaluate the hints and put them back in __annotations__ (for both classes and functions). This will allow to use runtime effects with reasonable effort.

@gvanrossum
Copy link
Member

gvanrossum commented Mar 10, 2017

I need to think more about this but right now I really am not keen on this. It really is a big hack and not very principled.

EDIT: I am pondering it but still have a visceral reaction against the idea. Let's not panic just because some users didn't read the docs. And mypy will now complain if you forget to import List.

@ambv
Copy link
Contributor Author

ambv commented Mar 18, 2017

@gvanrossum, it sounds like most of all you dislike the idea to support fake generics on builtins. Since mypy already removed support for them, that cat is out of the bag. But another important point of this __future__ is, and always was, forward reference support. Both for out-of-order definitions as well as for accessing names imported in if TYPE_CHECKING: blocks without having to stringify them.

Do you disagree with this as well?

@gvanrossum
Copy link
Member

No, I don't think that's it. What gets all my hackles up is that there's no precedent for a syntactic construct (apart from the identifiers in import, def or class) that at runtime gets turned back into a string equivalent to the original expression's source code. (Compare lambda and def -- those get turned into bytecode, which is very different.)

It would be a complicated change to a Python parser to recover the source code for a specific expression (the heroics that inspect.getsource() goes through are inappropriate in this context).

I don't recall if you are proposing that this should eventually (e.g. in Python 3.8) become the default and only behavior. If you're not, a __future__ would be inappropriate. Regardless this would definitely have to be its own PEP.

PS. I think you got your "cat out of the bag" idiom backwards -- https://en.wikipedia.org/wiki/Letting_the_cat_out_of_the_bag

@ambv
Copy link
Contributor Author

ambv commented Mar 18, 2017

a string equivalent to the original expression's source code

It would be enough to essentially "unparse" the AST. It would be semantically equivalent (compiles the same) but doesn't have to be syntactically equivalent (whitespace/parentheses/commas not necessarily the same). I can't come up with any use for verbatim preservation of the original string.

Yes, the intent would be to make this the default in 3.8. The other option (a new kind of -O) is not applicable in this case because it's global and would create code which might not work at runtime depending on the args given to the interpreter.

PS. Oh, TIL about the idiom. I have been using it wrong all along. You're the first to point it out, or to notice!

@ambv
Copy link
Contributor Author

ambv commented Mar 18, 2017

Oh, and I noticed you saying: "at runtime". I'd like this to happen at compile time so the .pyc would preserve the transformation. The unparsing cost would be paid once, and likely offset by cheaper instantiation of strings in subsequent runs.

@drhagen
Copy link

drhagen commented Mar 18, 2017

It would be enough to essentially "unparse" the AST.

What are the advantages of the unparsed AST rather than the AST itself? You still get forward references, and then the type checker doesn't have to reparse.

@gvanrossum
Copy link
Member

Oh, and I noticed you saying: "at runtime". I'd like this to happen at compile time so the .pyc would preserve the transformation.

If you're saying that the generated byte code should just contain the string object, yes, that's what I imagined you wanted.

However nothing brought up so far has managed to soften me up about this. To the contrary I am no envisioning people abusing this proposal for all sorts of nefarious purposes (like defaults that are evaluated at call time rather than at run time, using some clever decorator).

@ambv
Copy link
Contributor Author

ambv commented Mar 18, 2017

Nothing brought up so far has managed to soften me up about this.

You likely have the best calibrated intuition in the matter. It would be crazy not to trust that.

I'd like to understand the risk involved here, e.g. how this addition would make the language worse. Let me try again.

I get your argument about shoving strings into __annotations__ being inelegant. So is making people use string literals in the code to work around forward references. My proposal moves this pain point from the typical usage of type annotations to a less commonly used one (runtime preservation). And it doesn't sacrifice functionality, as correct code today is already using get_type_hints() to read annotations at runtime.

The argument that "nefarious purposes" invalidate the idea is new to me. What happened to "consenting adults"? Most Python features can be abused and yet we don't limit people from overloading operators, import hooks, source file codecs, accessing "private" members (even across stack frames), monkey patching, etc. etc. The community has consistently kept the insanity at bay by avoiding abuse of the features available. Am I missing something?

Is there anything else I'm not seeing?

@gvanrossum
Copy link
Member

I'll have to do more reflection to explain my reaction. Here is a partial response.

One thing that comes to mind is that it's a very random change to the language. It might be useful to have a more compact way to indicate deferred execution of expressions (using less syntax than lambda:). But why would the use case of type annotations be so all-important to change the language to do it there first (rather than proposing a more general solution), given that there's already a solution for this particular use case that requires very minimal syntax?

Also, I notice that the first two bullets of your original motivation are no longer valid, at least not with mypy master -- they're solved by python/mypy#2863 and python/mypy#2869 respectively.

So we're left with:

  • plain forward references
  • cyclic references between classes (or perhaps the current class)
  • references to things imported inside if TYPE_CHECKING

My response to these:

  • For forward references that aren't needed to work around reference cycles, I think in general these violate the spirit of "import/define before use".
  • If you're using if TYPE_CHECKING (a hack, albeit sometimes a useful one) it's not so terrible that you have to mark the usages explicitly as "deferred".
  • For cycles I have no pat answer, except to point out that these should be relatively uncommon (I'm no great fan of recursion as you may have realized).

But I think my reasoning is more related to the nature of the proposed change than to its use cases. There are no other places in Python where an expression is stringified like this -- you'd have to add a significant amount of new logic to the AST to implement it. (Come to think of this, this part of my discomfort would go away if you were to change the proposal to just turn all annotations into code objects or lambdas, though I'd still be unhappy with the implicitness.)

@ethanhs
Copy link
Contributor

ethanhs commented May 27, 2017

I know I am jumping on this late, and am probably the least qualified here to boot, but I wanted to add that in many discussions at PyCon, I have heard that people do not like the current state of forward declarations.

I think that not requiring the string syntax is needed. I like the idea of compile time type erasure, but on the other hand, I understand Guido's discomfort in changing the language like this.

But why would the use case of type annotations be so all-important to change the language to do it there first (rather than proposing a more general solution), given that there's already a solution for this particular use case that requires very minimal syntax?

I don't think that annotations are "important" enough to change the language, I think that the argument could be made that they are different enough to merit a change, but I suppose that depends on what type annotations truely are. In my mind, an annotation is a comment, and so making a comment a string is a completely valid decision. However, currently they are not just comments.

On the other hand it would really bother me to make them not objects, since essentially everything evaluates to be an object in Python. I suppose with a method of deferred resolution this might work, however, I am not sure that would simplify anything, as it too would complicate Python internally. I suppose at this point I am asking myself is generalized deferred resolution needed elsewhere?

Apologies if this is off point or confusing.

@schollii
Copy link

schollii commented May 27, 2017

in many discussions at PyCon, I have heard that people do not like the current state of forward declarations... I think that not requiring the string syntax is needed... the argument could be made that (annotations) are different enough to merit a change (to the language)...

Total agree. Even more so, annotations should be, to the Python application/library developer, first class citizens that can express not only expected types but relationship between types (as seems to be the use case for TypeVar) in such a way that they remain easy to understand by people and can be used by IDE's to provide better code completion, refactoring, and symbol usage validation. Type hints should appear in source code as fully integrated to the language, not as strings (which makes them look as an after-thought, a patch). The application/library programmer should not have to resort to TypeVar (their use should be for those who implement the type hinting system). Type hinting will be big in Python, it deserves new syntax that remains in the spirit of Python (clean and simple and expressive).

However much of these thoughts may not be relevant to this thread so I have created a gist as suggested by EthanHS. I will edit it so it stands on its own and post a link on gitter/mypy.

@ethanhs
Copy link
Contributor

ethanhs commented May 27, 2017

I think parts of this are related to this discussion, but a fair amount is not related to this. The suggestion to change the syntax will likely not happen, as the function annotation syntax was part of Python 3000, and has been tested well. I also think that discussion the syntax of PEP 484 naming is out of scope of this issue, as Łukasz is not intending that change. So TypeVar and the current naming scheme is here to stay.
Also while TypeOf seems interesting, I don't think it is in the scope of this issue either. If you moved this to gist perhaps and shared it on the mypy gitter it would be more fitting for the discussion of your ideas, and would avoid cluttering this issue.

I think the main thing you raise in your comment is that you think annotations should be a special type. You don't really talk much about that however. This would likely mean more syntax to specify that a forward declaration is made. Which is an entirely valid decision. I wanted to make it clear when I talk about object vs string, I don't mean require people to say def foo(a: 'int'):.... I mean that the entirety of the annotation would become a string as in Łukasz's proposal. So at runtime type(foo.__annotations__['a']) is str.

@gvanrossum
Copy link
Member

I'm going to ignore the long slightly off-topic proposal for now.

Responding to Ethan and Łukasz, maybe we could add a "future import" like this to Python 3.7:

from __future__ import stringify_annotations  # Name can be bikeshedded

after which all annotations (argument, return and variable) are turned into strings that are stored in __annotations__ (except where PEP 526 says the annotation is ignored, i.e. when it occurs in a local scope).

Annotations must still be syntactically valid, and at the end of the containing scope, they should be semantically valid, i.e. evaluating the string as an expression in that scope should not raise an exception. This latter requirement is to prevent abuse -- unquoted annotations may be used in place of forward references, but the resulting string must still abide by the rules for forward references.

Annotations can not be ignored entirely -- they must still end up (as strings) in __annotations__ in those cases where PEP 484 or PEP 526 mandate this. This rule exists to support runtime introspection of annotations, so that typing.get_type_hints(x) continues to work.

I am happy for the stringification to happen at compile time (when the bytecode is generated), but I don't insist on it. Stringification may alter whitespace as long as the AST resulting from parsing the string is the same. It may not add/remove redundant parentheses.

The "future" referenced by the magical import won't happen until Python 4.0 (and even then maybe we'll end up doing something else).

This proposal has to be a separate PEP (it can't be an update to PEP 484). The PEP doesn't have to be much longer than what I wrote in this comment.

Generalized deferred resolution sounds like a topic for a totally different PEP and out of scope for this tracker.

@ilevkivskyi
Copy link
Member

at the end of the containing scope, they should be semantically valid, i.e. evaluating the string as an expression in that scope should not raise an exception.

IIUC this is exactly the same requirement for current forward references: they should be valid when evaluated by get_type_hints.

I suppose with a method of deferred resolution this might work, however, I am not sure that would simplify anything, as it too would complicate Python internally.

I think that "stringification" is a much better solution than any kind of deferred resolution. First, the former is very easy to implement in CPython, second, typing will continue to work without any modifications. Finally, it follows the dynamic spirit of Python (everyone is familiar with eval and string-code relationship, while having undefined names to work only in some places will be too magical).

@ambv
Copy link
Contributor Author

ambv commented Sep 11, 2017

This is now PEP 563: python/peps@454c889

@ilevkivskyi
Copy link
Member

The PEP 563 is now accepted and implemented, so I think this can be closed now. We can open separate issues for any new usability improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants