Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dictionaries as structs #985

Closed
JukkaL opened this issue Nov 22, 2015 · 15 comments
Closed

Dictionaries as structs #985

JukkaL opened this issue Nov 22, 2015 · 15 comments

Comments

@JukkaL
Copy link
Collaborator

JukkaL commented Nov 22, 2015

We should probably support precise types for heterogeneous dictionaries used like structures, such as {'name': 'Mary', 'age': 5}. Potential example:

person = {'name': 'Mary', 'age': 5}
person['name'] + ''  # ok
person['age'] + 1  # ok
person['x']  # error: no key x

This is just a placeholder issue for now and doesn't actually propose how to implement this.

@rowillia
Copy link
Contributor

Here's an example from Hack - http://docs.hhvm.com/manual/en/hack.shapes.php

@JukkaL
Copy link
Collaborator Author

JukkaL commented Sep 11, 2016

See also python/typing#28.

@davidfstr
Copy link
Contributor

I would be very interested in the ability to recognize this kind of "typed dictionary". Significant portions of my current codebase use such dictionaries. Currently I have to approximate with Mapping[str, Any] which nullifies any type checks on the keys.

Would this Github issue be the appropriate venue for hashing out the proposed semantics and spellings for an initial implementation in mypy?

@gvanrossum
Copy link
Member

I recommend first trying to rally support for a specific syntax, and for that python/typing#28 seems more appropriate.

@davidfstr
Copy link
Contributor

I've figured out most of what I need to know to start on an initial implementation. What I haven't been able to determine is what the "dict" type is in mypy. Via static analysis I found many other types, including the TupleType that both regular tuples and NamedTuples use.

Two questions:

(1) Since you probably know offhand, what is the type used for dict? Perhaps a generic Instance?

(2) Is there an easy way for me to do a dynamic analysis to see what internal type variables mypy sees? Perhaps there's a way to print out some kind of annotated AST with type information? Knowing this would make answering questions similar to (1) easier in the future.

@gvanrossum
Copy link
Member

The Dict type is currently an alias for builtins.dict (though it should really be the other way around). It has some hand-coded properties but mostly the definition just comes from builtins.pyi in typeshed. Grepping for builtins.dict should reveal the many places where it is used.

One way to see what types mypy infers for a given piece of code is to use reveal_type(x) -- this is a pseudo builtin that's always available. E.g.

from types import Union

def f(a: Union[int, str]) -> None:
  reveal_type(a)
  if isinstance(a, int):
    reveal_type(a)

produces:

__tmp__.py: note: In function "f":
__tmp__.py:4: error: Revealed type is 'Union[builtins.int, builtins.str]'
__tmp__.py:6: error: Revealed type is 'builtins.int'

@gvanrossum gvanrossum added this to the Future milestone Sep 28, 2016
@davidfstr
Copy link
Contributor

Here's my initial implementation plan:

(A) It must be possible to define a named TypedDict type.

from typing import TYPE_CHECKING

if TYPE_CHECKING:
    from typing import TypedDict

    Point = TypedDict('Point', x=int, y=int)

It is necessary to guard from typing import TypingDict and everything derived from it until TypingDict becomes part of the official typing module. This is somewhat awkward, so maybe there's a convention such as from mypy.typing import TypedDict that would work better in the interim?

Details:

  • TypedDict will be defined in typing.pyi.
  • The call TypedDict(...) will be recognized as a type in the same way that check_namedtuple recognizes NamedTuple(...) as a type.
  • In the mypy type system I will introduce a new TypedDictType that extends Instance. It will continue to be treated as Instance[builtins.dict] by most existing mypy code.
    • I'm suspect that defining a new top-level Type (rather than an Instance) might be a better approach but I'm not sufficiently familiar with the mypy codebase (and visitor system) to take that approach confidently.

(B) It must be possible to declare a variable as having a named TypedDict as its type.

point = dict(x=42, y=1337)  # type: Point
  • The dict expression must type-evaluate to an anonymous TypedDict(x=int, y=int) type. I haven't looked for this part of mypy codebase yet.
  • Type-compatibility checking code must recognize that TypedDict(x=int, y=int) and TypedDict('Point', x=int, y=int) are structurally compatible. Probably this is implemented in some kind of "meet" operation:
    • (anonymous TypedDictType) meet (anonymous TypedDictType) -> (anonymous TypedDictType)
    • (named TypedDictType A) meet (anonymous TypedDictType) -> (named TypedDictType A)
    • (named TypedDictType A) meet (named TypedDictType A) -> (named TypedDictType A)
    • (named TypedDictType A) meet (named TypedDictType B) -> (anonymous TypedDictType), where A != B

In the initial iteration the "meet" operation will only recognize exact matches for simplicity (and consider inexact matches to be errors). Future iterations will likely refine it to intersect the keys of TypedDictType in some fancier way.

(C) It must be possible to declare a variable as having an anonymous TypedDict as its type.

def create_point(x: int, y: int) -> 'TypedDict(x=int, y=int)':
    return dict(x=x, y=y)
  • Code related to (A) must recognize the type-constructor 'TypedDict' when it lacks a name.

(D) Index expressions involving TypedDicts must be recognized when used as an rvalue:

def get_x(point: 'Point') -> int:
    return point['x']
  • visit_index_expr_helper (which knows about val[member] expressions used as rvalues) will be updated to recognize TypedDictType specially and check members. The member checking will use the same strategy that NamedTuple does. In particular TypedDictType will have a fallback Instance (created in the same way as expr_to_analyzed_type) that is checked with analyze_member_access.
  • An index expression using a non-string-literal as a member name will behave in the same way as if you had a NamedTuple and you did a getattr() on it using a non-string-literal. (I don't happen to know what that behavior is offhand.)

(E) Index expressions involving TypedDicts must be recognized when used as an lvalue:

def set_x(point: 'Point', x: int) -> None:
    point['x'] = x
  • check_indexed_assignment (which knows about val[member] expressions used as lvalues) will be updated to recognize TypedDictType specially and check members.

(F) Support for printing TypedDictType will be added.

point = dict(x=42, y=1337)  # type: Point
reveal_type(point)

When checking the above program mypy should emit something like:

input.py:1: error: Revealed type is 'TypedDict("Point", dict(x=int, y=int))'

The above implies that the canonical form of TypedDict uses the dict syntax rather than the {} syntax in (G) below.

(G) Support for the alternate dict-literal syntax for the TypedDict type constructor will be added to (A) and (C):

Point = TypedDict('Point', {'x': int, 'y': int})

anon_point = {'x': 42, 'y': 1337}  # type: TypedDict({'x': int, 'y': int})

Comments?

@JukkaL
Copy link
Collaborator Author

JukkaL commented Sep 29, 2016

Thanks for the detailed write-up -- sounds like a good plan! (But I have some suggestions below.)

Comments:

  • I'd suggest starting with a minimal PR that only supports a very small subset of functionality, such as only parsing TypedDict definitions and maybe declaring a variable with a TypedDict type. Small PRs are faster to review and marge and you'll get feedback sooner. It's okay to have half-finished functionality in master (we just don't advertise it until it works well enough).

  • Please don't extend Instance -- the Type hierarchy is intended to have only one level of inheritance. An approach similar to TupleType should work -- add the corresponding Instance type (for example, Dict[str, Any]) to a fallback attribute. Then look for all visit_tuple_type methods. We'll likely eventually need a new visit_typed_dict method for each such method. It's okay to start with mostly unimplemented visit methods as otherwise the first PR could be impractically big.

  • You'll need to special case type checking of {'x': ...}. See visit_dict_expr in checkexpr.py.

  • We may eventually need to special case some forms of dict(x=y, ...). Simple ones with just keyword arguments are translated to corresponding dict expressions/literals, I think, so these would be covered by the above item. See type_object_type_from_function in checkmember.py and translate_dict_call in semanal.py. I'd leave this out from the initial PR, as {'x': ...} is easier to support and it should cover the most common cases.

  • Use type context in type checker to determine whether {'x': ...} should be treated as a typed dict. The simplest approach would require typed dicts usually to be annotated. By default {'x': ...} expressions would still generate regular Dict[...] types. This is something we can perhaps revisit later once typed dicts are more fully implemented. Example:

    TD = TypedDict('TD', x=int, y=str)
    a = dict(x=1, y='x')              # a has type Dict[str, object]
    b = dict(x=1, y='x')  # type: TD  # b has typed dict type TD
    
  • Type compatibility checking happens in the confusingly named is_subtype function (which is actually for compatibility checking). I think that this is easy to implement directly without relying on a meet operation: a is a subtype of b (assuming both are typed dicts) if a has all the keys of b and all shared keys have equivalent types (compatible both ways).

  • I'd suggest leaving out anonymous typed dicts and the alternative, dict-literal-based type definition syntax initially, as they are just syntactic sugar and not essential.

Feel free to ask about any things that are unclear, and it can be useful to send a half-finished PR for feedback (just mark it as so).

@davidfstr
Copy link
Contributor

Understood. I will begin.

@davidfstr
Copy link
Contributor

davidfstr commented Sep 30, 2016

Implementation Checklist

(will be revised over time)

Prerequisites - Understanding

  • Wrap brain around mypy.semanal.SemanticAnalyzer
    • 60% of the class's lines of code could be moved into the following supergroups:
    • Significantly easier for me to read after performing this extraction locally.
  • Wrap brain around mypy.checker.TypeChecker

PR - Declare TypedDict type --> #2206 (merged), #2492 (on deck)

Tests:

  • Tests: Create Type

PR - Create TypedDict instance --> #2342 (merged)

Literal constructor:

  • Declare var as TypedDict: point = Point(x=42, y=1337)
  • Declare var as TypedDict: point = Point(dict(x=42, y=1337))
  • Declare var as TypedDict: point = Point({'x': 42, 'y': 1337})

Copy constructor:

Reveal Type:

  • Can print TypedDictType: reveal_type(point)

Tests:

  • Tests: Create Instance
  • Tests: Subtype
  • Tests: Join
  • Tests: Meet
  • Tests: Constraint Solve == (not filed)

PR - Manipulate named TypedDict --> #2526 (merged)

PR - Declare TypedDict type with class-based syntax (for Python 3.6+) --> #2740

Hold until core implementation is proven useful in a pre-3.6 context.

  • Support basic declaration:
class Point2D(TypedDict):
    x: int
    y: int
  • Support class-based inheritance as syntactic sugar for copying fields
class Point3D(Point2D):
    z: int

(HOLD) PR - Create and manipulate anonymous TypedDict --> (not filed)

Hold until demand is proven.

...

(HOLD) PR - Dictionary literals (via {} and dict(...)) are typed as anonymous TypedDict if possible --> #2487

Hold until demand is proven.

  • Declare var as TypedDict: point = {'x': 42, 'y': 1337} # type: Point
  • Declare var as TypedDict: point = dict(x=42, y=1337) # type: Point

Related

Issues:

@gvanrossum
Copy link
Member

Is it still worth keeping this open? There's now a label "topic-typed-dict" whose issues track remaining work. Or is the TODO list above still more detailed?

@davidfstr
Copy link
Contributor

There are a few line items above that haven't been converted to subissues. I can do that, which will allow this TODO-issue to be closed.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Jan 16, 2017

Ok, let's close this issue after all still relevant TODO items discussed above are covered by other, smaller issues.

@davidfstr
Copy link
Contributor

I've filed smaller issues for the items in the TODO list above that I'm pretty sure we want to tackle. Thus suggest closing this tracking issue.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Jan 23, 2017

Thanks! Closing this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants