Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-43224: Draft implementation of PEP 646 #30398

Closed
wants to merge 18 commits into from
Closed

Conversation

mrahtz
Copy link
Contributor

@mrahtz mrahtz commented Jan 4, 2022

PEP 646 (Variadic Generics) hasn't yet been completely approved by the steering council, but here's a draft of its implementation so we can get started on code review.

Update 2021-01-19: the SC has approved the PEP, so full steam ahead!

TODO:

A couple of notes below.

@pradeep90 @gvanrossum @Fidget-Spinner

Starred tuple types

I realised there's a tricky issue with starred tuple types.

If we have e.g. tuple[int, *tuple[str, bool]], the obvious thing to do would be to unpack the inner tuple at runtime, resulting in tuple[int, str, bool]. This would imply that iter(tuple[str, bool]) creates an iterator returning str then bool.

But this would cause problems when using starred tuple types as the type of *args, e.g. def foo(*args: *tuple[str, bool]):

  1. The type of foo.__annotations__['args'] should definitely not be tuple[str, bool]. That would instead imply def foo(*args: tuple[str, bool])
  2. The current PEP draft states that when a starred object is used as the annotation of *args, its iterator should return only a single value. (Or at least, this is the behaviour specified for *args: *Ts; @pradeep90, I've just realised, should we update that section of the PEP to make it explicit that that starred tuples can be also used for *args?)

I think the simplest solution to both of these problems is to not unpack tuple types as runtime, having *tuple[stuff] instead creating an iterator that returns only a single item, an instance of a new 'starred tuple'. (Note that such a type does have to exist anyway, in order to represent e.g. *tuple[int, ...].)

For the native tuple type, I've tentatively implemented this by adding a new starred flag to gaobject in genericaliasobject.c rather than creating a whole new type. This makes implementation easier, but maybe it's suboptimal considering that gaobject is used for lots of things other than tuple - feedback appreciated. For typing.Tuple, I've implemented a new class in typing.py called StarredTuple, which should behave in the same way. (This does mean that *tuple[int] != *Tuple[int], but given that tuple[int] != Tuple[int], this seems fine.)

Implementation of *tuple

I'm pretty uncertain about my implementation in genericaliasobject.c. In particular:

  • I don't know what I'm doing with garbage collection; scrutiny on whether I'm doing it right appreciated.
  • I wonder whether there's a way to make it simpler. In typing.py, the implementation is as simple as def __iter__(self): yield StarredTuple(self). Is there a way to do something similar in ga_iter, without the need to also implement ga_iternext?

https://bugs.python.org/issue43224

Formatted, the generated AST is as follows.

`def f(*args: *Ts): pass`:

```
(
    "Module",
    [
        (
            "FunctionDef",
            (1, 0, 1, 23),
            "f",
            (
                "arguments",
                [],
                [],
                (
                    "arg",
                    (1, 7, 1, 16),
                    "args",
                    (
                        "Starred",
                        (1, 13, 1, 16),
                        ("Name", (1, 14, 1, 16), "Ts", ("Load",)),
                        ("Load",),
                    ),
                    None,
                ),
                [],
                [],
                None,
                [],
            ),
            [("Pass", (1, 19, 1, 23))],
            [],
            None,
            None,
        )
    ],
    [],
)
```

`def f(*args: *tuple[int, ...]): pass`:

```
(
    "Module",
    [
        (
            "FunctionDef",
            (1, 0, 1, 36),
            "f",
            (
                "arguments",
                [],
                [],
                (
                    "arg",
                    (1, 7, 1, 29),
                    "args",
                    (
                        "Starred",
                        (1, 13, 1, 29),
                        (
                            "Subscript",
                            (1, 14, 1, 29),
                            ("Name", (1, 14, 1, 19), "tuple", ("Load",)),
                            (
                                "Tuple",
                                (1, 20, 1, 28),
                                [
                                    ("Name", (1, 20, 1, 23), "int", ("Load",)),
                                    ("Constant", (1, 25, 1, 28), Ellipsis, None),
                                ],
                                ("Load",),
                            ),
                            ("Load",),
                        ),
                        ("Load",),
                    ),
                    None,
                ),
                [],
                [],
                None,
                [],
            ),
            [("Pass", (1, 32, 1, 36))],
            [],
            None,
            None,
        )
    ],
    [],
)
```

`def f(*args: *tuple[int, *Ts]): pass`:

```
(
    "Module",
    [
        (
            "FunctionDef",
            (1, 0, 1, 36),
            "f",
            (
                "arguments",
                [],
                [],
                (
                    "arg",
                    (1, 7, 1, 29),
                    "args",
                    (
                        "Starred",
                        (1, 13, 1, 29),
                        (
                            "Subscript",
                            (1, 14, 1, 29),
                            ("Name", (1, 14, 1, 19), "tuple", ("Load",)),
                            (
                                "Tuple",
                                (1, 20, 1, 28),
                                [
                                    ("Name", (1, 20, 1, 23), "int", ("Load",)),
                                    (
                                        "Starred",
                                        (1, 25, 1, 28),
                                        ("Name", (1, 26, 1, 28), "Ts", ("Load",)),
                                        ("Load",),
                                    ),
                                ],
                                ("Load",),
                            ),
                            ("Load",),
                        ),
                        ("Load",),
                    ),
                    None,
                ),
                [],
                [],
                None,
                [],
            ),
            [("Pass", (1, 32, 1, 36))],
            [],
            None,
            None,
        )
    ],
    [],
)
```
Formatted, generated AST is as follows.

`def f() -> tuple[*Ts]: pass`:

```
(
    "Module",
    [
        (
            "FunctionDef",
            (1, 0, 1, 27),
            "f",
            ("arguments", [], [], None, [], [], None, []),
            [("Pass", (1, 23, 1, 27))],
            [],
            (
                "Subscript",
                (1, 11, 1, 21),
                ("Name", (1, 11, 1, 16), "tuple", ("Load",)),
                (
                    "Tuple",
                    (1, 17, 1, 20),
                    [
                        (
                            "Starred",
                            (1, 17, 1, 20),
                            ("Name", (1, 18, 1, 20), "Ts", ("Load",)),
                            ("Load",),
                        )
                    ],
                    ("Load",),
                ),
                ("Load",),
            ),
            None,
        )
    ],
    [],
)
```

`def f() -> tuple[int, *Ts]: pass`:

```
(
    "Module",
    [
        (
            "FunctionDef",
            (1, 0, 1, 32),
            "f",
            ("arguments", [], [], None, [], [], None, []),
            [("Pass", (1, 28, 1, 32))],
            [],
            (
                "Subscript",
                (1, 11, 1, 26),
                ("Name", (1, 11, 1, 16), "tuple", ("Load",)),
                (
                    "Tuple",
                    (1, 17, 1, 25),
                    [
                        ("Name", (1, 17, 1, 20), "int", ("Load",)),
                        (
                            "Starred",
                            (1, 22, 1, 25),
                            ("Name", (1, 23, 1, 25), "Ts", ("Load",)),
                            ("Load",),
                        ),
                    ],
                    ("Load",),
                ),
                ("Load",),
            ),
            None,
        )
    ],
    [],
)
```

`def f() -> tuple[int, *tuple[int, ...]]: pass`:

```
(
    "Module",
    [
        (
            "FunctionDef",
            (1, 0, 1, 45),
            "f",
            ("arguments", [], [], None, [], [], None, []),
            [("Pass", (1, 41, 1, 45))],
            [],
            (
                "Subscript",
                (1, 11, 1, 39),
                ("Name", (1, 11, 1, 16), "tuple", ("Load",)),
                (
                    "Tuple",
                    (1, 17, 1, 38),
                    [
                        ("Name", (1, 17, 1, 20), "int", ("Load",)),
                        (
                            "Starred",
                            (1, 22, 1, 38),
                            (
                                "Subscript",
                                (1, 23, 1, 38),
                                ("Name", (1, 23, 1, 28), "tuple", ("Load",)),
                                (
                                    "Tuple",
                                    (1, 29, 1, 37),
                                    [
                                        ("Name", (1, 29, 1, 32), "int", ("Load",)),
                                        ("Constant", (1, 34, 1, 37), Ellipsis, None),
                                    ],
                                    ("Load",),
                                ),
                                ("Load",),
                            ),
                            ("Load",),
                        ),
                    ],
                    ("Load",),
                ),
                ("Load",),
            ),
            None,
        )
    ],
    [],
)
```
Formatted, generated AST is as follows:

`x: tuple[*Ts]`:

```
(
    "Module",
    [
        (
            "AnnAssign",
            (1, 0, 1, 13),
            ("Name", (1, 0, 1, 1), "x", ("Store",)),
            (
                "Subscript",
                (1, 3, 1, 13),
                ("Name", (1, 3, 1, 8), "tuple", ("Load",)),
                (
                    "Tuple",
                    (1, 9, 1, 12),
                    [
                        (
                            "Starred",
                            (1, 9, 1, 12),
                            ("Name", (1, 10, 1, 12), "Ts", ("Load",)),
                            ("Load",),
                        )
                    ],
                    ("Load",),
                ),
                ("Load",),
            ),
            None,
            1,
        )
    ],
    [],
)
```

`x: tuple[int, *Ts]`:

```
(
    "Module",
    [
        (
            "AnnAssign",
            (1, 0, 1, 18),
            ("Name", (1, 0, 1, 1), "x", ("Store",)),
            (
                "Subscript",
                (1, 3, 1, 18),
                ("Name", (1, 3, 1, 8), "tuple", ("Load",)),
                (
                    "Tuple",
                    (1, 9, 1, 17),
                    [
                        ("Name", (1, 9, 1, 12), "int", ("Load",)),
                        (
                            "Starred",
                            (1, 14, 1, 17),
                            ("Name", (1, 15, 1, 17), "Ts", ("Load",)),
                            ("Load",),
                        ),
                    ],
                    ("Load",),
                ),
                ("Load",),
            ),
            None,
            1,
        )
    ],
    [],
)
```

`x: tuple[int, *tuple[str, ...]]`:

```
(
    "Module",
    [
        (
            "AnnAssign",
            (1, 0, 1, 31),
            ("Name", (1, 0, 1, 1), "x", ("Store",)),
            (
                "Subscript",
                (1, 3, 1, 31),
                ("Name", (1, 3, 1, 8), "tuple", ("Load",)),
                (
                    "Tuple",
                    (1, 9, 1, 30),
                    [
                        ("Name", (1, 9, 1, 12), "int", ("Load",)),
                        (
                            "Starred",
                            (1, 14, 1, 30),
                            (
                                "Subscript",
                                (1, 15, 1, 30),
                                ("Name", (1, 15, 1, 20), "tuple", ("Load",)),
                                (
                                    "Tuple",
                                    (1, 21, 1, 29),
                                    [
                                        ("Name", (1, 21, 1, 24), "str", ("Load",)),
                                        ("Constant", (1, 26, 1, 29), Ellipsis, None),
                                    ],
                                    ("Load",),
                                ),
                                ("Load",),
                            ),
                            ("Load",),
                        ),
                    ],
                    ("Load",),
                ),
                ("Load",),
            ),
            None,
            1,
        )
    ],
    [],
)
```
E.g. tuple[*tuple[int, str]].
and fix a couple of bugs the tests revealed
Copy link
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this is seriously impressive work and a very strong start! I don't have time to review everything right now, so far I've covered genericaliasobject.c and compile.c. I still have typing.py and friends left. I'm hopeless at the parser and grammar so I'll leave those to someone else.

Lib/test/test_pep646_syntax.py Outdated Show resolved Hide resolved
Objects/genericaliasobject.c Outdated Show resolved Hide resolved

if (gi->exhausted) {
gi->obj = NULL;
return NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like you could potentially be missing a decref here for gi->obj since I see it being increfed in gi_iter. If so,

Suggested change
return NULL;
Py_SETREF(gi->obj, NULL);
return NULL;

If I'm just shooting my mouth off, then just please ignore what I said :).

One way to test for refleaks is to run the tests with the refleak hunter (requires a debug build):

python_d.exe -m test test_typing -R 3:3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, super nice about the refleak hunter - I had no idea that was a thing!

And good catch about the decref too. One thing I'm unsure about is where we should do it, though. Reading https://docs.python.org/3/c-api/gcsupport.html, am I right in thinking that the iterator object gi counts as a container type, because it contains references to another object? If so, that same page says "Before fields which refer to other containers are invalidated, PyObject_GC_UnTrack() must be called". That would imply we should be setting gi->obj to NULL in the deallocator, in which case we'd also call the setref there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Friendly poke :)

Objects/genericaliasobject.c Outdated Show resolved Hide resolved
Objects/genericaliasobject.c Outdated Show resolved Hide resolved
@@ -780,7 +787,7 @@ primary[expr_ty]:

slices[expr_ty]:
| a=slice !',' { a }
| a[asdl_expr_seq*]=','.slice+ [','] { _PyAST_Tuple(a, Load, EXTRA) }
| a[asdl_expr_seq*]=','.(slice | starred_expression)+ [','] { _PyAST_Tuple(a, Load, EXTRA) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to move starred_expression as another option of slice as we already have named_expression there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried this, but then I get SyntaxError: can't use starred expression here for e.g. a[*b].

I think what happens is, if we move starred_expression to slice (as | a=starred_expression { a }), then when we do a[*b], it triggers the first option of slices, where the starred_expression doesn't get processed by _PyAST_Tuple.

I can't see any way to make sure _PyAST_Tuple definitely gets called for starred_expression without doing something to slices, so I think the original attempt with (slice | starred_expression) is cleanest. Wdyt?

@@ -331,14 +333,19 @@ kwds[arg_ty]: '**' a=param_no_default { a }
param_no_default[arg_ty]:
| a=param ',' tc=TYPE_COMMENT? { _PyPegen_add_type_comment_to_arg(p, a, tc) }
| a=param tc=TYPE_COMMENT? &')' { _PyPegen_add_type_comment_to_arg(p, a, tc) }
param_no_default_star_annotation[arg_ty]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not join these to param_no_default? That way we can remove the extra rule in star_etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, wouldn't that mess with other things that use param_no_default though? E.g. kwds uses param_no_default, so if we joined these to param_no_default, wouldn't we accidentally allow def foo(***args)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Friendly poke :)

@mrahtz mrahtz marked this pull request as ready for review January 14, 2022 19:33
@mrahtz mrahtz marked this pull request as draft January 14, 2022 19:33
Based on Fidget-Spinner's feedback:
1. Remove the explicit flag for 'exhausted', instead using 'gi->obj == NULL' to tell whether the iterator is exhausted.
2. Construct `starred_tuple` using the `Py_GenericAlias` constructor instead of constructing it manually.
3. Fix reference counting.
4. Make `Py_GenericAliasIterType` static.
Copy link
Member

@JelleZijlstra JelleZijlstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of things I noticed while working on the typing-extensions implementation.

I'm not going to be the one to merge this, but I think it might make sense to split this PR into separate PRs for the grammar changes and the typing.py changes, since they touch very different parts of the codebase and need review from different people.

class StarredTuple(_Final, _Immutable, _root=True):
"""Implementation of starred tuple for older versions of Python.

A starred tuple is e.g. tuple[*tuple[int, str]]. In newer versions of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this implementation is only used in new versions of Python, so why do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, thanks for the reminder on this too - I was unsure about this.

The two things I had in mind were:

  • It was easier to just put everything in one PR so that when doing code review we could compare between the 'new' and the 'backwards compatible' implementations.
  • We probably want some of the 'backwards compatible' implementations even in modern typing.py for the sake of code (or maybe even pickle?) compatibility. Unpack is probably the most important one, so we can read backwards-compatible code written as class C(Generic[Unpack[Ts]]). I'm admittedly less sure about StarredTuple; I don't think there's ever a reason to instantiate it explicitly (and in fact maybe it should be private?). In any case, not sure what best practice is in cases like this - @gvanrossum can you comment?

Lib/typing.py Show resolved Hide resolved
def __class_getitem__(cls, item):
if isinstance(item, types.GenericAlias) and item.__origin__ is tuple:
# tuple[...]
return StarredTuple(item)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this return an instance of Unpack instead so we don't need to add an additional class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely an option, but I'd prefer to keep Unpack as only implementing the operator, rather implementing both the operator and the wrapper class for unpacked items - in line with the principle of "Each component should only do one thing".

Also, I think Unpack could get really messy if we have to merge in the guts of both StarredTuple and UnpackedTypeVarTuple. And there are a bunch of places where we want to check for an unpacked TypeVarTuple but not an unpacked StarredTuple, so they would get more complicated too.

Maybe there's context I'm missing though. Is there a big downside (e.g. maintenance burden) to adding another class?

elif isinstance(item, TypeVarTuple):
return item._unpacked
else:
raise TypeError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to not raise an error here, so we don't hobble potential future extensions. For example, I anticipate that we may want to support Unpack[TypedDict] to type **kwargs in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a) If the concern is future extensions, couldn't we just amend this function when that need arises in the future?

b) Since the result of unpacking depends on what's being unpacked, what would we do instead of raising an error here?

@mrahtz mrahtz marked this pull request as ready for review January 19, 2022 11:23
@mrahtz
Copy link
Contributor Author

mrahtz commented Jan 30, 2022

This PR has grown pretty big, and Pradeep agrees with Jelle that it might be better to split this up into separate parts - so I'm going to close this PR, and open a few new ones instead.

return self._name


class UnpackedTypeVarTuple(_Final, _Immutable, _root=True):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pradeep90 In the previous review, you mentioned you'd prefer to avoid this middleman class, instead for example making it so that starring a TypeVarTuple results in an instance of Unpack.

That example solution seems suboptimal to me for two reasons:

  1. In a few places in the rest of typing.py, we want to check whether an object is an unpacked TypeVarTuple, and instance(obj, UnpackedTypeVarTuple) seems more readable to me than isinstance(obj, Unpacked).
  2. It seems potentially confusing to have Unpack both be an operator and an object.

Another option would be something like:

class TypeVarTuple:

    def __init__(self, name, _starred=False, _id=None):
        self.name = name
        self._starred = starred
        self._id = id

    def __iter__(self):
        yield TypeVarTuple(self.name, _starred=True, _id=id(self))

    def __eq__(self, other):
        if self._id is not None and other._id is not None:
            return self._id == other._id
        else:
            return object.__eq__(self, other)  # Not sure whether this is right, but you get the gist

    def __repr__(self):
         return ('*' if self._starred else '') + self.name

I lean away from this solution too because I feel like manually managing IDs could lead to subtle bugs, and isinstance(obj, UnpackedTypeVarTuple) is still simpler than isinstance(obj, TypeVarTuple) and obj._starred, but WDYT?

@@ -823,6 +874,95 @@ def __init__(self, name, *constraints, bound=None,
self.__module__ = def_mod


class TypeVarTuple(_Final, _Immutable, _root=True):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pradeep90 You mentioned in the previous review you'd prefer to have this inherit from _TypeVarLike. I'm not sure I understood your reasoning - could you explain what you meant by "The flags are things that the typechecker has to deal with"?

Lib/test/test_pep646_syntax.py Outdated Show resolved Hide resolved
Objects/genericaliasobject.c Outdated Show resolved Hide resolved

if (gi->exhausted) {
gi->obj = NULL;
return NULL;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, super nice about the refleak hunter - I had no idea that was a thing!

And good catch about the decref too. One thing I'm unsure about is where we should do it, though. Reading https://docs.python.org/3/c-api/gcsupport.html, am I right in thinking that the iterator object gi counts as a container type, because it contains references to another object? If so, that same page says "Before fields which refer to other containers are invalidated, PyObject_GC_UnTrack() must be called". That would imply we should be setting gi->obj to NULL in the deallocator, in which case we'd also call the setref there?

Objects/genericaliasobject.c Outdated Show resolved Hide resolved
Lib/test/test_genericalias.py Show resolved Hide resolved
Objects/genericaliasobject.c Outdated Show resolved Hide resolved

if (gi->exhausted) {
gi->obj = NULL;
return NULL;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Friendly poke :)

@@ -331,14 +333,19 @@ kwds[arg_ty]: '**' a=param_no_default { a }
param_no_default[arg_ty]:
| a=param ',' tc=TYPE_COMMENT? { _PyPegen_add_type_comment_to_arg(p, a, tc) }
| a=param tc=TYPE_COMMENT? &')' { _PyPegen_add_type_comment_to_arg(p, a, tc) }
param_no_default_star_annotation[arg_ty]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Friendly poke :)

@mrahtz
Copy link
Contributor Author

mrahtz commented Jan 30, 2022

@JelleZijlstra @pablogsal @Fidget-Spinner @pradeep90 Oh, gosh, I'm so sorry - I've just realised I'd misunderstood a critical part of GitHub's code review interface! I'd assumed all these comments had been posted weeks ago - I'd missed the part where you actually have to click the 'Submit code review' button to post them! Uff. I'll copy over these threads to the new PRs.

@pablogsal
Copy link
Member

@mrahtz I can review and help with the GC part. Feel free to add me as a reviewer in that PR

@mrahtz mrahtz mannequin mentioned this pull request Apr 11, 2022
@mrahtz mrahtz deleted the pep646 branch May 1, 2022 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants