The use of comments to communicate with the type checker #35

Closed
gvanrossum opened this Issue Jan 8, 2015 · 16 comments

Comments

Projects
None yet
6 participants
@gvanrossum
Member

gvanrossum commented Jan 8, 2015

mypy supports comments of the form # type: <some_type> and there are proposals for other uses of comments (e.g. # typing: off). Comments are a convenient solution for things that are hard or inefficient to express in existing Python syntax (i.e. code that works at runtime). But they have the downside that they are inaccessible to runtime machinery that might want to use type annotations, and the Python AST module doesn't preserve comments, so code that needs access to the comments must implement its own parser (like mypy does).

I ask: should we define such comments in the PEP or not?

@JukkaL

This comment has been minimized.

Show comment
Hide comment
@JukkaL

JukkaL Jan 8, 2015

Contributor

One of the most common cases in mypy code where annotations are needed is when creating an empty collection. I like it better with '# type:' comments (and mypy contributors seem to agree):

x = []  # type: List[int]

# vs

x = Undefined(List[int])
x = []

Using cast() is another option, but it's semantically different, since it can hide type errors (see #15).

If we absolutely decide that comments are a no-no, another option would be to define an additional helper that is similar to cast() but that can't be used to hide type errors. It could be called coerce, for example:

x = coerce(List[int], [])

All of the non-comment alternatives have the problem that especially complex types can introduce non-trivial runtime overhead, which would go against the goals of mypy.

Contributor

JukkaL commented Jan 8, 2015

One of the most common cases in mypy code where annotations are needed is when creating an empty collection. I like it better with '# type:' comments (and mypy contributors seem to agree):

x = []  # type: List[int]

# vs

x = Undefined(List[int])
x = []

Using cast() is another option, but it's semantically different, since it can hide type errors (see #15).

If we absolutely decide that comments are a no-no, another option would be to define an additional helper that is similar to cast() but that can't be used to hide type errors. It could be called coerce, for example:

x = coerce(List[int], [])

All of the non-comment alternatives have the problem that especially complex types can introduce non-trivial runtime overhead, which would go against the goals of mypy.

@gvanrossum

This comment has been minimized.

Show comment
Hide comment
@gvanrossum

gvanrossum Jan 8, 2015

Member

I have to agree, and I don't see much value in forcing the type expression
to be evaluated at run time. So at least # type: comments should probably
be defined by the PEP.

Member

gvanrossum commented Jan 8, 2015

I have to agree, and I don't see much value in forcing the type expression
to be evaluated at run time. So at least # type: comments should probably
be defined by the PEP.

@ambv ambv self-assigned this Jan 8, 2015

@ambv

This comment has been minimized.

Show comment
Hide comment
@ambv

ambv Jan 8, 2015

Contributor

OK, will define # type comments in the document.

Contributor

ambv commented Jan 8, 2015

OK, will define # type comments in the document.

@ambv

This comment has been minimized.

Show comment
Hide comment
@ambv

ambv Jan 14, 2015

Contributor

Fixed in e2e6fc4.

Contributor

ambv commented Jan 14, 2015

Fixed in e2e6fc4.

@ambv ambv closed this Jan 14, 2015

@flying-sheep

This comment has been minimized.

Show comment
Hide comment
@flying-sheep

flying-sheep Jan 17, 2015

as said in the mailing list, please consider reopening this.

type annotations being available at runtime would be extremely useful for optimization purposes, and people will use it like that.

i like the coerce idea, since Guido mentioned the problems with assert isinstance(…) after the declaration.

or let’s at least use a string literal in the line above, which would appear in the AST!

'''type: List[int]'''
x = []

as said in the mailing list, please consider reopening this.

type annotations being available at runtime would be extremely useful for optimization purposes, and people will use it like that.

i like the coerce idea, since Guido mentioned the problems with assert isinstance(…) after the declaration.

or let’s at least use a string literal in the line above, which would appear in the AST!

'''type: List[int]'''
x = []

flying-sheep added a commit to flying-sheep/typehinting that referenced this issue Jan 17, 2015

@gvanrossum

This comment has been minimized.

Show comment
Hide comment
@gvanrossum

gvanrossum Jan 17, 2015

Member

I'm happy to reopen the discussion.

First, the use of annotations for optimization or code generation is explicitly out of scope for this PEP. (Optimizers like PyPy have shown that they don't need annotations.)

Now, when you use argument annotations, they will be available at run time (just inspect the __annotations__ attribute -- see PEP 3107). In this repo I have a partially-completed set of classes that implement the various generic and abstract constructs from PEP 483 and PEP 484 and you will find instances of those classes in the annotations.

In many cases you can actually write things like assert isinstance(x, Tuple[int, str, Callable[[int, str], Any]]). But it's pretty slow, and I have no intention of making that code even more complex that it already is by trying to optimize it. Also, once implemented, assert isinstance(range(1000000), Sequence[int]) would instantiate a million integers and all check them. By comparison, the # type: ... comments don't affect runtime performance at all. Function annotations affect runtime only during function definition time, which is usually during module load time, so their effect is much more tolerable.

Member

gvanrossum commented Jan 17, 2015

I'm happy to reopen the discussion.

First, the use of annotations for optimization or code generation is explicitly out of scope for this PEP. (Optimizers like PyPy have shown that they don't need annotations.)

Now, when you use argument annotations, they will be available at run time (just inspect the __annotations__ attribute -- see PEP 3107). In this repo I have a partially-completed set of classes that implement the various generic and abstract constructs from PEP 483 and PEP 484 and you will find instances of those classes in the annotations.

In many cases you can actually write things like assert isinstance(x, Tuple[int, str, Callable[[int, str], Any]]). But it's pretty slow, and I have no intention of making that code even more complex that it already is by trying to optimize it. Also, once implemented, assert isinstance(range(1000000), Sequence[int]) would instantiate a million integers and all check them. By comparison, the # type: ... comments don't affect runtime performance at all. Function annotations affect runtime only during function definition time, which is usually during module load time, so their effect is much more tolerable.

@gvanrossum gvanrossum reopened this Jan 17, 2015

@JukkaL

This comment has been minimized.

Show comment
Hide comment
@JukkaL

JukkaL Jan 17, 2015

Contributor

Also, most variable types (other than function arguments) will be inferred, so they won't be directly in the AST either. Implementing a type inference algorithm for Python is much harder than writing a parser that retains comments.

Contributor

JukkaL commented Jan 17, 2015

Also, most variable types (other than function arguments) will be inferred, so they won't be directly in the AST either. Implementing a type inference algorithm for Python is much harder than writing a parser that retains comments.

@flying-sheep

This comment has been minimized.

Show comment
Hide comment
@flying-sheep

flying-sheep Jan 17, 2015

Now, when you use argument annotations, they will be available at run time

i know, but thanks for bringing my attention to this. runtime availability is of course even more easily accessible than literals only appearing in the AST. but this also makes the gap between significant comments and annotations even more apparent: one has language support and is available during runtime, the other isn’t even available to the interpreter after parsing.


about runtime costs: yes, you’re right, i’ll stop championing assert isinstance(…)

let’s discuss other non-comment ideas instead!

one thing used by PyCharm is string literals:

'''type: List[int]'''
foo = []

or:

'type: List[int]'
foo = []

the disadvantage is that it is more spacey compared to a comment.

this could be fixed using this style:

foo = [];  'type: List[int]'

or we could do something like this:

foo, * = [], List[int]

or introduce explicit language support:

foo: List[int] = []

any other ideas?

Now, when you use argument annotations, they will be available at run time

i know, but thanks for bringing my attention to this. runtime availability is of course even more easily accessible than literals only appearing in the AST. but this also makes the gap between significant comments and annotations even more apparent: one has language support and is available during runtime, the other isn’t even available to the interpreter after parsing.


about runtime costs: yes, you’re right, i’ll stop championing assert isinstance(…)

let’s discuss other non-comment ideas instead!

one thing used by PyCharm is string literals:

'''type: List[int]'''
foo = []

or:

'type: List[int]'
foo = []

the disadvantage is that it is more spacey compared to a comment.

this could be fixed using this style:

foo = [];  'type: List[int]'

or we could do something like this:

foo, * = [], List[int]

or introduce explicit language support:

foo: List[int] = []

any other ideas?

@JukkaL

This comment has been minimized.

Show comment
Hide comment
@JukkaL

JukkaL Jan 17, 2015

Contributor

Undefined is also possible and already included in the PEP (clearly it's pretty verbose, but it seems pretty clear to me):

x = Undefined(List[int])
x = []

I mentioned this elsewhere; it's more efficient:

x = Undefined('List[int]')
x = []

Though clever, @flying-sheep's proposals above (except for the new syntax, which I like, but which will not happen in this PEP) look pretty difficult to read to me and inconsistent with the rest of the language (even more than the # type: comments).

Contributor

JukkaL commented Jan 17, 2015

Undefined is also possible and already included in the PEP (clearly it's pretty verbose, but it seems pretty clear to me):

x = Undefined(List[int])
x = []

I mentioned this elsewhere; it's more efficient:

x = Undefined('List[int]')
x = []

Though clever, @flying-sheep's proposals above (except for the new syntax, which I like, but which will not happen in this PEP) look pretty difficult to read to me and inconsistent with the rest of the language (even more than the # type: comments).

@flying-sheep

This comment has been minimized.

Show comment
Hide comment
@flying-sheep

flying-sheep Jan 17, 2015

those are all ideas. feel free to add your own

those are all ideas. feel free to add your own

@gvanrossum

This comment has been minimized.

Show comment
Hide comment
@gvanrossum

gvanrossum Jan 18, 2015

Member

After thinking it over and considering it some more, I still don't like the string literal proposal. So let's not waste more time discussing it (there are enough other things more worthy of our attention if we want this to land in 3.5).

PS. If you want a parser that preserves comments, there's one right in the Python standard library, in the lib2to3 package.

Member

gvanrossum commented Jan 18, 2015

After thinking it over and considering it some more, I still don't like the string literal proposal. So let's not waste more time discussing it (there are enough other things more worthy of our attention if we want this to land in 3.5).

PS. If you want a parser that preserves comments, there's one right in the Python standard library, in the lib2to3 package.

@ryepesg

This comment has been minimized.

Show comment
Hide comment
@ryepesg

ryepesg Jan 21, 2015

I like a lot the proposal from flying-sheep (for another PEP, I know):
foo: List[int] = []

It is consistent with many modern programming languages (Swift, Scala, Go, Typescript, Rust, Objeck, Julia, Nim/Nimrod...): Rationale behind the ordering of Scala's declaration. Yo can compare some of them at: http://rosettacode.org/wiki/Variables

ryepesg commented Jan 21, 2015

I like a lot the proposal from flying-sheep (for another PEP, I know):
foo: List[int] = []

It is consistent with many modern programming languages (Swift, Scala, Go, Typescript, Rust, Objeck, Julia, Nim/Nimrod...): Rationale behind the ordering of Scala's declaration. Yo can compare some of them at: http://rosettacode.org/wiki/Variables

@gvanrossum

This comment has been minimized.

Show comment
Hide comment
@gvanrossum

gvanrossum Jan 21, 2015

Member

Yeah, if this PEP is successful, we'll probably introduce some syntax close to that. However, for the current proposal the requirement is that we don't change the language's grammar; that way it will be easy for the typing module to be back-ported to Python 3.4 or earlier.

Member

gvanrossum commented Jan 21, 2015

Yeah, if this PEP is successful, we'll probably introduce some syntax close to that. However, for the current proposal the requirement is that we don't change the language's grammar; that way it will be easy for the typing module to be back-ported to Python 3.4 or earlier.

@ceridwen

This comment has been minimized.

Show comment
Hide comment
@ceridwen

ceridwen Jan 26, 2015

I'd like to argue against using comments to communicate with the type checker from another direction: conventionally comments are used by humans to communicate with other humans, not with machines. If someone reads

x = [] # type: List[int]

without context, it's not going to be clear that this is intended at least as much for the type-checker as it is for the reader, and it's going to be harder to find the associated syntax. Something like,

x = Undefined(List[int])

is clearly intended for the interpreter and easier to discover since it's obvious what to grep for. (x = typing.Undefined(List[int]) is better, but that's up to the user.)

The flip side of this problem is that if there's type-hinting syntax in the comments, any program that wants to use the type-hinting is going to need its own tool chain to do so, starting with the parser and then proceeding up. There are lots of uses for typing out there that don't involve optimization these days, and making type information available programmatically from the start will ensure there's a unified API for typing-based extensions.

I'd like to argue against using comments to communicate with the type checker from another direction: conventionally comments are used by humans to communicate with other humans, not with machines. If someone reads

x = [] # type: List[int]

without context, it's not going to be clear that this is intended at least as much for the type-checker as it is for the reader, and it's going to be harder to find the associated syntax. Something like,

x = Undefined(List[int])

is clearly intended for the interpreter and easier to discover since it's obvious what to grep for. (x = typing.Undefined(List[int]) is better, but that's up to the user.)

The flip side of this problem is that if there's type-hinting syntax in the comments, any program that wants to use the type-hinting is going to need its own tool chain to do so, starting with the parser and then proceeding up. There are lots of uses for typing out there that don't involve optimization these days, and making type information available programmatically from the start will ensure there's a unified API for typing-based extensions.

@JukkaL

This comment has been minimized.

Show comment
Hide comment
@JukkaL

JukkaL Jan 28, 2015

Contributor

As discussed above, the # type: comment syntax has benefits over Undefined(...): it's more concise in the common case of initializing a variable with an empty list or dict, and it doesn't add any runtime overhead.

If type annotations really take off, it's likely that somebody will create a generic AST module that will know about the comment syntax. This is not very hard to do. Existing tools can migrate to use the new module to access the type comments. Hopefully it would have a mostly compatible API with existing parsers.

Also, IDEs and editors can implement special syntax highlighting rules for type comments (and string literal types) to make them stand out better.

Contributor

JukkaL commented Jan 28, 2015

As discussed above, the # type: comment syntax has benefits over Undefined(...): it's more concise in the common case of initializing a variable with an empty list or dict, and it doesn't add any runtime overhead.

If type annotations really take off, it's likely that somebody will create a generic AST module that will know about the comment syntax. This is not very hard to do. Existing tools can migrate to use the new module to access the type comments. Hopefully it would have a mostly compatible API with existing parsers.

Also, IDEs and editors can implement special syntax highlighting rules for type comments (and string literal types) to make them stand out better.

@gvanrossum

This comment has been minimized.

Show comment
Hide comment
@gvanrossum

gvanrossum Mar 26, 2015

Member

Closing. Type comments are now in the PEP, and so is # type: ignore.

Member

gvanrossum commented Mar 26, 2015

Closing. Type comments are now in the PEP, and so is # type: ignore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment