Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: syntax for variable and attribute annotations (PEP 526) #258

Closed
gvanrossum opened this issue Aug 3, 2016 · 210 comments
Closed

Proposal: syntax for variable and attribute annotations (PEP 526) #258

gvanrossum opened this issue Aug 3, 2016 · 210 comments
Labels

Comments

@gvanrossum
Copy link
Member

@gvanrossum gvanrossum commented Aug 3, 2016

Introduction

This issue is reserved for substantive work on PEP 526, "Syntax for Variable and Attribute Annotations". For textual nits please comment directly on the latest PR for this PEP in the peps repo.

I sent a strawman proposal to python-ideas. The feedback was mixed but useful -- people tried to poke holes in it from many angles.

In this issue I want to arrive at a more solid specification. I'm out of time right now, but here are some notes:

  • Class variables vs. instance variables
  • Specify instance variables in class body vs. in __init__ or __new__
  • Thinking with your runtime hat on vs. your type checking hat
  • Importance of a: <type> vs. how it strikes people the wrong way
  • Tuple unpacking is a mess, let's avoid it entirely
  • Collecting the types in something similar to __annotations__
  • Cost of doing that for locals
  • Cost of introducing a new keywords

Work in progress here!

I'm updating the issue description to avoid spamming subscribers to this tracker. I'll keep doing this until we have reasonable discussion.

Basic proposal

My basic observation is that introducing a new keyword has two downsides: (a) choice of a good keyword is hard (e.g. it can't be 'var' because that is way too common a variable name, and it can't be 'local' if we want to use it for class variables or globals,) and (b) no matter what we choose, we'll still need a __future__ import.

So I'm proposing something keyword-free:

a: List[int] = []
b: Optional[int] = None

The idea is that this is pretty easy to explain to someone who's already familiar with function annotations.

Multiple types/variables

An obvious question is whether to allow combining type declarations with tuple unpacking (e.g. a, b, c = x). This leads to (real or perceived) ambiguity, and I propose not to support this. If there's a type annotation there can only be one variable to its left, and one value to its right. This still allows tuple packing (just put the tuple in parentheses) but it disallows tuple unpacking. (It's been proposed to allow multiple parenthesized variable names, or types inside parentheses, but none of these look attractive to me.)

There's a similar question about what to about the type of a = b = c = x. My answer to this is the same: Let's not go there; if you want to add a type you have to split it up.

Omitting the initial value

My next step is to observe that sometimes it's convenient to decouple the type declaration from the initialization. One example is a variable that is initialized in each branch of a big sequence of if/elif/etc. blocks, where you want to declare its type before entering the first if, and there's no convenient initial value (e.g. None is not valid because the type is not Optional[...]). So I propose to allow leaving out the assignment:

log: Logger
if develop_mode():
    log = heavy_logger()
elif production_mode():
    log = fatal_only_logger()
else:
    log = default_logger()
log.info("Server starting up...")

The line log: Logger looks a little odd at first but I believe you can get used to it easily. Also, it is again similar to what you can do in function annotations. (However, don't hyper-generalize. A line containing just log by itself means something different -- it's probably a NameError.)

Note that this is something that you currently can't do with # type comments -- you currently have to put the type on the (lexically) first assignment, like this:

if develop_mode():
    log = heavy_logger()  # type: Logger
elif production_mode():
    log = fatal_only_logger()  # (No type declaration here!)
# etc.

(In this particular example, a type declaration may be needed because heavy_logger() returns a subclass of Logger, while other branches produce different subclasses; in general the type checker shouldn't just compute the common superclass because then a type error would just infer the type object.)

What about runtime

Suppose we have a: int -- what should this do at runtime? Is it ignored, or does it initialize a to None, or should we perhaps introduce something new like JavaScript's undefined? I feel quite strongly that it should leave a uninitialized, just as if the line was not there at all.

Instance variables and class variables

Based on working with mypy since last December I feel strongly that it's very useful to be able to declare the types of instance variables in class bodies. In fact this is one place where I find the value-less notation (a: int) particularly useful, to declare instance variables that should always be initialized by __init__ (or __new__), e.g. variables whose type is mutable or cannot be None.

We still need a way to declare class variables, and here I propose some new syntax, prefixing the type with a class keyword:

class Starship:
    captain: str                      # instance variable without default
    damage: int = 0                   # instance variable with default (stored in class)
    stats: class Dict[str, int] = {}  # class variable with initialization

I do have to admit that this is entirely unproven. PEP 484 and mypy currently don't have a way to distinguish between instance and class variables, and it hasn't been a big problem (though I think I've seen a few mypy bug reports related to mypy's inability to tell the difference).

Capturing the declared types at runtime

For function annotations, the types are captured in the function's __annotations__ object. It would be an obvious extension of this idea to do the same thing for variable declarations. But where exactly would we store this info? A strawman proposal is to introduce __annotations__ dictionaries at various levels. At each level, the types would go into the __annotations__ dict at that same level. Examples:

Global variables

players: Dict[str, Player]
print(__annotations__)

This would print {'players': Dict[str, Player]} (where the value is the runtime representation of the type Dict[str, Player]).

Class and instance variables:

class Starship:
    # Class variables
    hitpoints: class int = 50
    stats: class Dict[str, int] = {}
    # Instance variables
    damage: int = 0
    shield: int = 100
    captain: str  # no initial value
print(Starship.__annotations__)

This would print a dict with five keys, and corresponding values:

{'hitpoints': ClassVar[int],  # I'm making this up as a runtime representation of "class int"
 'stats': ClassVar[Dict[str, int]],
 'damage': int,
 'shield': int,
 'captain': str
}

Finally, locals. Here I think we should not store the types -- the value of having the annotations available locally is just not enough to offset the cost of creating and populating the dictionary on each function call.

In fact, I don't even think that the type expression should be evaluated during the function execution. So for example:

def side_effect():
    print("Hello world")
def foo():
    a: side_effect()
    a = 12
    return a
foo()

should not print anything. (A type checker would also complain that side_effect() is not a valid type.)

This is inconsistent with the behavior of

def foo(a: side_effect()):
    a = 12
    return a

which does print something (at function definition time). But there's a limit to how much consistency I am prepared to propose. (OTOH for globals and class/instance variables I think that there would be some cool use cases for having the information available.)

Effect of presence of a: <type>

The presence of a local variable declaration without initialization still has an effect: it ensures that the variable is considered to be a local variable, and it is given a "slot" as if it was assigned to. So, for example:

def foo():
    a: int
    print(a)
a = 42
foo()

will raise UnboundLocalError, not NameError. It's the same as if the code had read

def foo():
    if False: a = 0
    print(a)

Instance variables inside methods

Mypy currently supports # type comments on assignments to instance variables (and other things). At least for __init__ (and __new__, and functions called from either) this seems useful, in case you prefer a style where instance variables are declared in __init__ (etc.) rather than in the class body.

I'd like to support this, at least for cases that obviously refer to instance variables of self. In this case we should probably not update __annotations__.

What about global or nonlocal?

We should not change global and nonlocal. The reason is that those don't declare new variables, they declare that an existing variable is write-accessible in the current scope. Their type belongs in the scope where they are defined.

Redundant declarations

I propose that the Python compiler should ignore duplicate declarations of the same variable in the same scope. It should also not bother to validate the type expression (other than evaluating it when not in a local scope). It's up to the type checker to complain about this. The following nonsensical snippet should be allowed at runtime:

a: 2+2
b: int = 'hello'
if b:
    b: str
    a: str
@ilevkivskyi
Copy link
Collaborator

@ilevkivskyi ilevkivskyi commented Aug 4, 2016

Very good proposal! I was thinking some time ago about how to introduce __annotations__ dictionaries at various levels using type declarations for variables and came to exactly the same conclusions.

There is something that could probably be discussed here: should it be possible to add annotations to variables in for and with? Examples:

for i: int in my_iter():
    print(i+42)

with open('/folder/file', 'rb') as f: IO[bytes]:
    print(f.read()[-1]) 

"Practical" questions:

  • In the proposal __attributes__ are used sometimes in place of __annotations__, is it a typo, or am I missing something?
  • Will part of discussion happen here, or should all discussions go to python-ideas?
@gvanrossum
Copy link
Member Author

@gvanrossum gvanrossum commented Aug 4, 2016

  • I haven't thought about for and with. It can be done for for easily, but with would be ambiguous to the parser (requiring too much look-ahead).
  • __attributes__ was a typo; fixed.
  • Much of the discussion is still happening in python-ideas. I am open to doing more of it here.
@gvanrossum
Copy link
Member Author

@gvanrossum gvanrossum commented Aug 4, 2016

This comment is going to list things brought up in python-idea that need more thought...

  • When the class defines speed: int = 0 (meaning an instance var with a class default) can we change the class variable (effectively changing the default for those instances that haven't overridden it yet), as long as the type matches? (I.e. is C.speed += 1 valid?)
  • In a class, should a: class int (i.e. without initializer) be allowed?
  • Explain why this is better than continuing with type comments.

[UPDATES:]

  • Do we enforce that the class <type> syntax is only allowed in a class? (A: Yes, same as return or break.)
  • Could we start evaluating annotations for locals in the future?
  • The class <type> syntax is a bit awkward. Maybe it should be ClassVar[<type>] instead?
@gvanrossum
Copy link
Member Author

@gvanrossum gvanrossum commented Aug 5, 2016

Here are some thoughts about for and with. tl;dr: let's not do it.

PEP 484 support using type comments on a for-statement, to indicate the type of the control variables, e.g.

for x, y in points:  # type: float, float
    # Here x and y are floats
    ...

This isn't implemented by mypy and doesn't strike me as super useful (typically type inference from points should suffice) but we could perhaps support this with the proposal under discussion:

for x: float, y: float in points:
    ...

However, this makes it a bit hard to find the iterable (points). It also seems to violate my desire to avoid having to create syntax for tuple unpacking (which the for-statement supports). So maybe it's better to just write this as follows, in the rare case where it comes up:

x: float
y: float
for x, y in points:
    ...

It's even worse for a with-statement -- we'd have to somehow support

with foo() as tt: Tuple[str, int]:
    ...

which would be unreasonable to ask our parser to handle -- the expression after the first : could well be the body of the with-statement, and by the time the parser sees the second : it's too late to backtrack (at least for CPython's LR(1) parser).

@ilevkivskyi
Copy link
Collaborator

@ilevkivskyi ilevkivskyi commented Aug 5, 2016

OK, I agree, also the syntax:

x: float
y: float
for x, y in points:
    ...

emphasizes the fact that for does not create its own scope and x and y are visible outside for.

Concerning why it is better than comments I have two typecheck-unrelated "practical" points to add (not sure if this was already mentioned):

  • many text editors (and also here on GitHub) comments are shown in light grey or similar dim color. Maybe it is only me, but I think this is very inconvenient.
  • type annotations are somewhat similar to doc-strings (I frequently use them in such way), for functions we have __doc__ and __annotations__, but for modules and classes we have only __doc__, it would be helpful to have __annotations__ also for classes and modules. This will give a systematic way to generate documentation.
@gvanrossum
Copy link
Member Author

@gvanrossum gvanrossum commented Aug 6, 2016

Nick just proposed a: ClassAttr[x] insread of a: class X, which simplifies the grammar and is close to what I proposed to put in annotations anyway. So let's run with this.

@gvanrossum
Copy link
Member Author

@gvanrossum gvanrossum commented Aug 8, 2016

OK, I want to make this into a PEP in time for 3.6. And I want to hack on the implementation at the Core Devs Sprint Sept. 6-9 at Facebook, and get it into 3.6 beta 1. That should just about be possible.

@srkunze
Copy link

@srkunze srkunze commented Aug 8, 2016

It appears to me as if a: ClassAttr could signify a class attribute without any type, right?

@ilevkivskyi
Copy link
Collaborator

@ilevkivskyi ilevkivskyi commented Aug 8, 2016

@srkunze There is a convention for generics: if a type variable is missing it is assumed to be Any, so that a: ClassAttr would be equivalent to a: ClassAttr[Any]

@srkunze
Copy link

@srkunze srkunze commented Aug 8, 2016

  • many text editors (and also here on GitHub) comments are shown in light grey or similar dim color. Maybe it is only me, but I think this is very inconvenient.

I actually think a dim color is a good thing here.

Moreover, it's already true for many editors with docstrings and annotations as those are barely relevant for code execution. I find this very convenient as it helps me focusing on the important pieces.

@ilevkivskyi
Copy link
Collaborator

@ilevkivskyi ilevkivskyi commented Aug 8, 2016

@srkunze I understand your point, but nevertheless I would like to differentiate two levels of "relevance": docstrings and type annotations are in some sense parts of "public API" while comments are not such parts.
Anyway, I already mentioned that this comment is quite personal.

@srkunze
Copy link

@srkunze srkunze commented Aug 8, 2016

@ilevkivskyi

There is a convention for generics: if a type variable is missing it is assumed to be Any [...]

Got it.

@elazarg
Copy link

@elazarg elazarg commented Aug 8, 2016

How about allowing the use of parenthesized (var: type) anywhere slightly ambiguous? (this is the way it's done in COQ)

If I'm repeating old proposals, I still think it will be helpful to document the reasons to reject it.

@srkunze
Copy link

@srkunze srkunze commented Aug 8, 2016

@ilevkivskyi

docstrings and type annotations are in some sense parts of "public API" while comments are not such parts.

That could make sense and can be another motivation of why local variables don't have annotations: they simply don't represent a "public API".

@ilevkivskyi
Copy link
Collaborator

@ilevkivskyi ilevkivskyi commented Aug 8, 2016

That could make sense and can be another motivation why local variables don't have annotations: they simply don't represent a "public API".

Good point.

@gvanrossum
Copy link
Member Author

@gvanrossum gvanrossum commented Aug 8, 2016

@elazarg

... parenthesized (var: type) anywhere slightly ambiguous ...

It was brought up on python-ideas and I briefly mention it in the section "Multiple types/variables" in the original post above, but I expect that the syntax will be hairy, the benefits slight, and the readability poor.

@ilevkivskyi
Copy link
Collaborator

@ilevkivskyi ilevkivskyi commented Aug 9, 2016

@gvanrossum I have few short questions about the new PEP:

  • What workflow do you propose for the new PEP, will it be developed in this repo or elsewhere?
  • If I submit a PR with simply more formal and structured formulation of your draft modified taking into account comments from python-ideas, will it be a good first step?
  • Should any implementation details be listed in the PEP (for example, should we introduce a new AST node or modify Assign or Name, or add another context apart from Load and Store, etc)?
@gvanrossum
Copy link
Member Author

@gvanrossum gvanrossum commented Aug 9, 2016

Oops, @kirbyfan64 and @phouse512 are the lucky lottery winners, and they are writing a draft in a repo cloned from the peps repo by the latter. If you have text to submit best post it here so they can cherry-pick it. I don't think the PEP should go into details about the AST nodes, though if you want to work on a reference implementation separately just go ahead!

@markshannon
Copy link

@markshannon markshannon commented Aug 10, 2016

What is the point of variable declaration? You haven't given any justification, and I don't see any.

@gvanrossum
Copy link
Member Author

@gvanrossum gvanrossum commented Aug 10, 2016

It's less verbose, and makes the meaning clear. The existing notation is
easily confused with cast(), because of its placement after the value. ( I
know because this happened to me long ago. :-)

On Tuesday, August 9, 2016, Mark Shannon notifications@github.com wrote:

What is the point of variable declaration? You haven't given any
justification, and I don't see any.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#258 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ACwrMh0Zbs6lSZY-kCaNU_klenn_98Zeks5qeU-CgaJpZM4Jb--Z
.

--Guido (mobile)

@markshannon
Copy link

@markshannon markshannon commented Aug 10, 2016

Which existing notation?
Declaring the type of parameters, return values and generic types is covered by PEP 484.
The type of local variables can be perfectly inferred from the assigned values.
For attributes of a name spaces (classes and modules) inference may fail, but should be generally good enough.
For those values where the inferred type would be either too specific or general, then a type comment should suffice.

@gvanrossum
Copy link
Member Author

@gvanrossum gvanrossum commented Aug 10, 2016

The type comments. Even when types can be inferred, sometimes adding them
explicitly helps the reader. And there is currently no good solution for
declaring an instance variable without initializer in a class.

On Tuesday, August 9, 2016, Mark Shannon notifications@github.com wrote:

Which existing notation?
Declaring the type of parameters, return values and generic types is
covered by PEP 484.
The type of local variables can be perfectly inferred from the assigned
values.
For attributes of a name spaces (classes and modules) inference may fail,
but should be generally good enough.
For those values where the inferred type would be either too specific or
general, then a type comment should suffice.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#258 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ACwrMhDO0DsN4cMJoESsC8Lh4CXdTY1qks5qeWBzgaJpZM4Jb--Z
.

--Guido (mobile)

@NeilGirdhar
Copy link

@NeilGirdhar NeilGirdhar commented Aug 10, 2016

Regarding typed instance variables with initializers, it might be worth comparing this proposal with the ipython traitlets project, which is a heavier version of that.

@ilevkivskyi
Copy link
Collaborator

@ilevkivskyi ilevkivskyi commented Aug 10, 2016

@gvanrossum There is one important question: Currently, there is a statement in the draft

If there's a type annotation there can only be one variable to its left, and one value to its right

I think that the limitation for the right side is not necessary, and these could be allowed (both assignments currently work, if you omit the types):

a: Tuple[int, int] = 1, 2
b: Tuple[int, int, int, int] = *a, *a

If we are going to stick only to the left side limitation (one variable per type annotation), then it looks like there is a nice and simple way to change the grammar by adding a "declaration statement" (you choose the name). This requires only two small changes to Grammar/Grammar:

  • small_stmt: (expr_stmt | decl_stmt | del_stmt | ... etc.)
  • decl_stmt: NAME ':' test ['=' (yield_expr|testlist_star_expr)] (right hand side is the same as for normal assignment in expr_stmt, type annotation is the same as for functions). Note that this prohibits adding annotations for attributes after class creation, i.e., a.x: int = 1 is prohibited.

The above change requires to add an AST node:

stmt = 
    ...
    | Delete(expr* targets)
    | Declare(identifier name, expr annotation, expr? value)
    | Assign(expr* targets, expr value)
    ...

If you agree with this, then I will proceed this way.

@ilevkivskyi
Copy link
Collaborator

@ilevkivskyi ilevkivskyi commented Aug 10, 2016

@gvanrossum I tried my idea above and it turns out it requires a bit more work on grammar to avoid ambiguity, so that the actual implementation is a bit different form described above. I implemented the parse step (source to AST) with the rules described in previous post and nice syntax errors, here is a snippet:

>>> x: int = 5; y: float = 10
>>> a: Tuple[int, int] = 1, 2
>>> b: Tuple[int, int, int, int] = *a, *a
>>> x, y: int
  File "<stdin>", line 1
SyntaxError: only single variable can be type annotated
>>> 2+2: int 
  File "<stdin>", line 1
SyntaxError: can't type annotate operator
>>> True: bool = 0
  File "<stdin>", line 1
SyntaxError: can't type annotate keyword
>>> x, *y, z: int = range(5)
  File "<stdin>", line 1
SyntaxError: only single variable can be type annotated
>>> x:
  File "<stdin>", line 1
    x:
     ^
SyntaxError: invalid syntax

(note: set of allowed expressions for type annotations is exactly the same as for function annotations)

If you think this is OK, I will continue with the actual compilation.

@gvanrossum
Copy link
Member Author

@gvanrossum gvanrossum commented Aug 10, 2016

@NeilGirdhar Regarding traitlets, that seems focused on runtime enforcement -- here we're interested in signalling types to the type checker. We're also only interested in a notation that reuses PEP 484 as the way to "spell" types (e.g. List[int]).

@gvanrossum
Copy link
Member Author

@gvanrossum gvanrossum commented Aug 10, 2016

@ilevkivskyi I am not super committed to allowing only a single expression on the right, but I feel allowing it would just reopen the debate on why we are not allowing multiple variables on the left.

Regarding your syntax: I'm glad you've got something working, but IIRC the parser looks ahead only one token, which means that if your have a rule

decl_stmt: NAME ':' test # etc.

it will prevent the parser from also recognizing regular assignments

expr_stmt: testlist_star_expr # etc.

that start with a NAME. That part of the grammer would become ambiguous.

In addition I think we need to allow more than just a single name on the left -- I think we must allow at least self.NAME which would make the grammar even more ambiguous.

My own thoughts on how to update the grammar are more along the lines of changing the existing

expr_stmt: testlist_star_expr (augassign (yield_expr|testlist) |
                     ('=' (yield_expr|testlist_star_expr))*)

to

expr_stmt: testlist_star_expr (declaration | augassign (yield_expr|testlist) |
                     ('=' (yield_expr|testlist_star_expr))*)
declaration: ':' test ['=' test]

There will then have to be some additional manual checks to ensure that when a declaration is detected, the testlist_star_expr is actually a simple lvalue -- such tests are already necessary for assignments, because the grammar actually allows things like

len(a) = 42

I guess we can quibble about whether the initializer should be test, testlist or more, and that's not ambiguous.

@ilevkivskyi
Copy link
Collaborator

@ilevkivskyi ilevkivskyi commented Aug 10, 2016

@gvanrossum

That part of the grammer would become ambiguous

Exactly, I implemented something very similar to the version you propose, and indeed I added some checks to ast.c. I could allow arbitrary dotted names (so that a typecheker can decide is it OK or not).

Ah, you just didn't write it up that way in your previous comment.

I'd accept anything that's acceptable as a single lvalue, so foo().bar, foo[x], foo[x].bar but not foo() and not (x, y) (the latter so people won't get too clever and try to declare multiple variables in one declaration).

@ilevkivskyi
Copy link
Collaborator

@ilevkivskyi ilevkivskyi commented Aug 10, 2016

@gvanrossum
OK, I allowed the dotted names on the left.

Actually, I have just checked that the grammar in my implementation is equivalent to what you propose expcept with declaration: ':' test ['=' (yield_expr|testlist_star_expr)]. The logic here is that the allowed type annotations are the same as for functions, and allowed right hand sides are the same as for normal assignment.

It looks like it does not lead to ambiguity, at least all the test suite runs OK and I tried to play with new syntax a bit without generating any errors in parser.
Do you think it is OK or do you prefer test for the right hand side?

@gvanrossum
Copy link
Member Author

@gvanrossum gvanrossum commented Sep 1, 2016

@kirbyfan64 Yes that's right. And sorry, I agree that mega-threads stink no matter which medium you use. :-(

@gvanrossum
Copy link
Member Author

@gvanrossum gvanrossum commented Sep 1, 2016

So who wants to write the PEP update?

@ilevkivskyi
Copy link
Collaborator

@ilevkivskyi ilevkivskyi commented Sep 1, 2016

@gvanrossum @kirbyfan64

So who wants to write the PEP update?

I could do this in two-three hours, if someone is available sooner, please go ahead.

@refi64
Copy link
Contributor

@refi64 refi64 commented Sep 1, 2016

I'll try to do it now a sec.

@refi64
Copy link
Contributor

@refi64 refi64 commented Sep 1, 2016

Done. Note that I'm assuming (x): int is now valid...right?

@ilevkivskyi
Copy link
Collaborator

@ilevkivskyi ilevkivskyi commented Sep 1, 2016

@kirbyfan64 Yes, it is valid, but it will not store an annotation in __annotations__ since it is not a simple name.

@ilevkivskyi
Copy link
Collaborator

@ilevkivskyi ilevkivskyi commented Sep 1, 2016

@kirbyfan64 Your commit looks good. I have left three small comments on the commit.

@gvanrossum
Copy link
Member Author

@gvanrossum gvanrossum commented Sep 1, 2016

@vedgar
Copy link

@vedgar vedgar commented Sep 3, 2016

Can my idea (of treating bare annotations the same way as global or nonlocal declarations) be added to the PEP as rejected? The closest thing it currently says is

the placement of the annotation strongly suggests that it's in the same scope as the surrounding code.

but the same thing can be said for global and nonlocal, and that didn't stop you then. I'm sure you have valid reasons of how exactly is this fundamentally different, but I'd like to see them in the PEP.

At some moment Guido said

Python is a dynamic language, and statements are executed (or not) in a specific order.
This is not negotiable. Type annotations are not comments with a funny syntax.

and while this is a strong stance, I don't feel it's adequately explained - again, global and nonlocal statements are also "not comments with a funny syntax", but they are evaluated statically, no matter where they are put in the scope.


I'm very worried about what semantics should I assign to re-annotations and conditional annotations.

for tp in int, float, complex:
    x: Optional[tp] = None

if input():
    y: bool
else:
    y: int
y = True

z: str
z = 'Hi'
z: Sequence[str]

t: int = 8
t: str = 'What?'

In each of these cases, I'm extremely confused about what actually happens. And reading the PEP doesn't help very much. Are these illegal? What tool detects them (if any)? If the behavior is specified, where is the specification?

(Note that all (or most) of these would be trivially flagged as errors if annotations were treated statically.)

@ilevkivskyi
Copy link
Collaborator

@ilevkivskyi ilevkivskyi commented Sep 3, 2016

@vedgar

I'm very worried about what semantics should I assign to re-annotations and conditional annotations.

Please don't worry. The title of the PEP contains "Syntax", not "Semantics" because we don't want to impose any new type semantics (apart from addition of ClassVar) beyond PEP 484. Moreover, PEP 484 is something like PEP 333, i.e. just a convention for tools that deal with type metadata. Your examples could be treated differently by different type checkers depending on what they see as safe/unsafe (or even a single type checker can have different modes like --sloppy-mode or --paranoid-mode).

@gvanrossum
Copy link
Member Author

@gvanrossum gvanrossum commented Sep 3, 2016

@vedgar I've added text to the PEP to explain (hopefully) your rejected proposal:

Treating bare annotations the same as global or nonlocal:
The rejected proposal would prefer that the presence of an
annotation without assignment in a function body should not involve
any evaluation. In contrast, the PEP implies that if the target
is more complex than a single name, its "left-hand part" should be
evaluated at the point where it occurs in the function body, just to
enforce that it is defined. For example, in this example::

    def foo(self):
        slef.name: str

the name slef should be evaluated, just so that if it is not
defined (as is likely in this example :-), the error will be caught
at runtime. This is more in line with what happens when there is
an initial value, and thus is expected to lead to fewer surprises.
(Also note that if the target was self.name (this time correctly
spelled :-), an optimizing compiler has no obligation to evaluate
self as long as it can prove that it will definitely be
defined.)

I have nothing to add to what @ilevkivskyi said about the semantics of redefinition -- that is to be worked out between type checkers. (Much like PEP 484 doesn't specify how type checkers should behave -- while it gives examples of suggested behavior, those examples are not normative, and there are huge gray areas where the PEP doesn't give any guidance. It's the same for PEP 526.)

@vedgar
Copy link

@vedgar vedgar commented Sep 4, 2016

Reading your explanation, it just occured to me that you could detect misnamings and treat bare annotations statically. At least, Python already does this when nonlocal is used (not for global, of course, since the name can be inserted dynamically afterwards).

def f():
    def g():
        nonlocal x
        return x

If I end the input here with a blank line, I get "SyntaxError: no binding for nonlocal 'x' found". (But I can introduce x later, and get no error.) Python obviously doesn't evaluate x, but it somehow knows, statically, whether there is a variable it could refer to, even if it's assigned to later. Similar thing can be done here, I presume. "no binding for 'slef' found" could be reported without evaluating slef. But yeah, it gets hairy with more complicated names.

Ok, I'll stop here. I'm happy with the current solution. At least I think so. :-)

@SylvainCorlay
Copy link

@SylvainCorlay SylvainCorlay commented Sep 7, 2016

@fperez @Carreau I am 👍 on introspect-ability.

  • Jupyter and Sage use introspection to do some simple automatic GUI generation to interact with a function with sliders / dropdowns / toggle buttons etc. At the moment, default values of function parameters are used to determine the type of widget that should be used for each parameter.
    It would be very nice to specify the widgets to use based on the type annotations.
  • Encoding more information (like bounds) in an annotation would be very useful. I don't know what would be the preferred / recommended way to pass richer annotations.
  • I can't help thinking of how this would play with traitlets and PEP 487. Would it be possible to have a metaclass which uses the type annotations of a class to create traitlets-like features?
@gvanrossum
Copy link
Member Author

@gvanrossum gvanrossum commented Sep 7, 2016

@SylvainCorlay @fperez @Carreau
Yes, there are 1000s of new uses for annotations in classes! You can indeed write a metaclass or a class decorator that uses the annotations as a better way to define traits. Let your imagination go wild!

Mypy already doesn't understand what's going on with libraries that use traitlets so I'm not particularly worried about how mypy should deal with this -- we'll cross that bridge when we get to it.

Personally I hope to be able to define named tuples like this:

class User(NamedTuple):
    username: str
    userid: int
    first_name: Optional[str]
    last_name: Optional[str]
@SylvainCorlay
Copy link

@SylvainCorlay SylvainCorlay commented Sep 7, 2016

@SylvainCorlay @fperez @Carreau
Yes, there are 1000s of new uses for annotations in classes! You can indeed write a metaclass or a > class decorator that uses the annotations as a better way to define traits. Let your imagination go wild!

Mypy already doesn't understand what's going on with libraries that use traitlets so I'm not particularly worried about how mypy should deal with this -- we'll cross that bridge when we get to it.

Personally I hope to be able to define named tuples like this:

class User(NamedTuple):
   username: str
   userid: int
   first_name: Optional[str]
   last_name: Optional[str]

^ @ellisonbg @minrk.

@ilevkivskyi
Copy link
Collaborator

@ilevkivskyi ilevkivskyi commented Sep 7, 2016

@gvanrossum

Personally I hope to be able to define named tuples like this:

class User(NamedTuple):
    username: str
    userid: int
    first_name: Optional[str]
    last_name: Optional[str]

This looks cool! I already want to implement this. Maybe this can even go into typing or future_typing instead of the current implementation of typed NamedTuple?

@NeilGirdhar
Copy link

@NeilGirdhar NeilGirdhar commented Sep 7, 2016

@ilevkivskyi Can't you just add an __init_subclass__ method to NamedTuple that walks __annotations__?

@ilevkivskyi
Copy link
Collaborator

@ilevkivskyi ilevkivskyi commented Sep 7, 2016

@NeilGirdhar As I understand __init_subclass__ is like __init__, not like __new__, so that if you want your subclass to be a real collections.namedtuple, then probably a bit of metaclass magic is needed.

@gvanrossum
Copy link
Member Author

@gvanrossum gvanrossum commented Sep 7, 2016

@vedgar
Copy link

@vedgar vedgar commented Sep 8, 2016

Yes, there are 1000s of new uses for annotations in classes. ... Let your imagination go wild!

Since you're in the mood of allowing expressivity expansion of the language, I have allowed my imagination to go wild a bit: what about Enums? Currently, there is a big discussion about how to declare Enum (and Flags and similar classes) whose member values "don't matter". I think it could be really cool if we could say

class Color(Enum):
    blue: 'Color'
    green: 'Color'
    red: 'Color'

and be done with it. Of course, lack of unpacking and unavaliability of class name are technical problems, and probably the best way would be just

class Color(Enum):
    blue, green, red: Color

but that's probably too big a change. However, the first variant is already possible to implement, and requires just a bit of philosophical adaptation, which I think is justified in this case. After all, blue, green and red are of "type" (Enum, to be precise) Color. If we really want to strenghten the bond between annotation and assignment, it's not too great a stretch to envision a metaclass treating bare annotations as "I don't care about assigned values" than just ordinary "missing default assigned values". What do you say? For me, it reads much better than the currently proposed abomination of assigning None, (), or some "magical object" like _auto_ to those members.

@gvanrossum
Copy link
Member Author

@gvanrossum gvanrossum commented Sep 8, 2016

@vedgar
Copy link

@vedgar vedgar commented Sep 8, 2016

I'm happy with explicit values too. But there are people who really want to have the syntax for "don't care" values. I just think that having a class body where you have

class X(...):
    ...
    a = b

(a and b are simple identifiers) and after the class definition you have X.a != X.b (if b was previously assigned a "magical object") is infinitely worse, since it breaks a lot of assumptions of how Python assignments work. I think the "annotations way" is surely better, and of course allowing the unpacking semantics would be the best Christmas present to get after quite a few years;-), but let's first make clear whether you're ok with such a usage of bare annotations (semantically - syntax can always be tweaked if there's a need, as you say).

@gvanrossum
Copy link
Member Author

@gvanrossum gvanrossum commented Sep 8, 2016

@vedgar
Copy link

@vedgar vedgar commented Sep 8, 2016

Hmm... by syntactic support, do you mean the "relenting" you have talked above (allowing unpacking on bare annotations, and maybe even a forward ("outward"?:) declaration without quotes), or you mean a full-fledged syntactic support like enum keyword? If we're going to go that route, I'm sure the general "make" keyword would be the better choice, and IIRC, you were against it.

Or maybe you mean something in between? Like

class Color(Enum, style='declarative'):
    red, green, blue

This we can do right now, but it requires philosophical adjustment too.

[The true solution is probably giving up the idea of the current type being "the root metaclass", and having different "grand metaclasses" for different purposes. They can be descended from one protometaclass having truly common behavior, but surely things like "__call__(T,...) calls T.__new__(...) and then maybe .__init__ on the result", or even descriptor semantics, shouldn't be there. We already had our share of problems implementing typing.py, ABCs, and other things (I think Enums are there too, they just feel it less because they are not that different from normal classes) just because type does too many things. But that's probably too big a change even for Py4.:]

@gvanrossum
Copy link
Member Author

@gvanrossum gvanrossum commented Sep 8, 2016

@gvanrossum
Copy link
Member Author

@gvanrossum gvanrossum commented May 10, 2017

Shouldn't we close this issue? PEP 526 is accepted and marked Final.

@ilevkivskyi
Copy link
Collaborator

@ilevkivskyi ilevkivskyi commented May 10, 2017

Shouldn't we close this issue? PEP 526 is accepted and marked Final.

I am glad to close this megathread. IIRC PEP 526 is accepted provisionally, but if there will be some ideas, it is better to open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.