Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for dataclasses #5010

Merged
merged 23 commits into from Jun 5, 2018
Merged

add support for dataclasses #5010

merged 23 commits into from Jun 5, 2018

Conversation

Bogdanp
Copy link
Contributor

@Bogdanp Bogdanp commented May 6, 2018

This is a work in progress for adding support for dataclasses as a plugin (heavily inspired by and reusing some parts of the attrs plugin). It's pretty close to complete, but I wanted to get some feedback before continuing as I'm new to the codebase. I'll take this over the finish line next weekend if it's deemed desirable, but here are my open questions:

  • Is the way I'm hooking the plugin in correct? fullname in get_class_decorator_hook doesn't seem to have the value I expect (i.e. the name is qualified to the module dataclass is imported in, so its value is some_package.some_module.dataclass rather than dataclasses.dataclass).
  • Should I be attempting to re-use more code between the two plugins? I only re-used the things that seemed to be fully generic between the two.

I tried it out on a real-world Python 3.6 project that makes heavy use of dataclasses and it seems to work great so far.

This would close #4792.

@ilevkivskyi ilevkivskyi self-assigned this May 7, 2018
mypy/plugin.py Outdated
auto_attribs_default=True
)
# TODO: It's unclear if fullname should be a fully qualified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this happens if the module is not in the typeshed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Should I add the module to the typeshed first?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with this solution. We can check later if this is fixed when the stub is in typeshed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already an open typeshed PR to add a dataclasses stub: python/typeshed#1944. I found some issues in it, but the original author hasn't responded for a while. @Bogdanp if you're interested, you could take over that PR and I can help you get it merged into typeshed.

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! The PR looks good, I have made a first round of review. Please let me know if you are still interested in working on this.

mypy/plugin.py Outdated
@@ -2,20 +2,19 @@

from abc import abstractmethod
from functools import partial
from typing import Callable, List, Tuple, Optional, NamedTuple, TypeVar, Dict
from typing import Callable, Dict, List, NamedTuple, Optional, Tuple, TypeVar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't do automatic sort of imports. This only increases the size of the diff and makes it harder to review.

if fullname in mypy.plugins.attrs.attr_class_makers:
return mypy.plugins.attrs.attr_class_maker_callback
elif fullname in mypy.plugins.attrs.attr_dataclass_makers:
from mypy.plugins import attrs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need the local import? If there is an import cycle, you should try breaking it. At mypy we try hard to not introduce import cycles, because they complicate the (already complex) code logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There had already been a circular dependency that was being resolved by this line. When I extracted mypy.plugins.common the same thing continued to work fine for the test suite, but the dynamically-generated test to ensure import mypy.plugins.common could be imported failed. I think the circular dependency can be fixed by extracting ClassDefContext into a separate module (like mypy.plugin_context). Let me know if you'd like me to do that!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the circular dependency can be fixed by extracting ClassDefContext into a separate module (like mypy.plugin_context). Let me know if you'd like me to do that!

I think moving out all the interfaces to a separate file mypy.api is a more permanent solution. But it is a big refactoring, so it is better to do this in a separate PR.

from mypy.types import CallableType, Overloaded


def _get_decorator_bool_argument(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I need to review these three functions or you just copied these from attrs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these are the same functions from the attrs module, unchanged.

@@ -0,0 +1,27 @@
# Builtins stub used to support @dataclass tests.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need a separate fixture? You can try tweaking an existing one.


def dataclass_class_maker_callback(ctx: ClassDefContext) -> None:
"""Hooks into the class typechecking process to add support for dataclasses.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is good that you added docstrings to all functions. We like the well documented code 👍


app = Application.parse('')

[builtins fixtures/list.pyi]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like several tests are missing. Could you please add at least these:

  • Incremental tests, these are tests where you update something in the code and check that (de-)serialisation works correctly (you can look at the initial and subsequent attrs PR to get some ideas).
  • Test all error messages that you add.
  • Test all special feature in dataclasses, such as ClassVar and InitVar and special methods (you can look for some examples in the PEP)
  • Check situations where user defines some dunder methods manually

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you didn't yet added incremental tests. They live in check-incremental.test (you can look for example in attrs PR).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add some tests for generic dataclasses, in particular type inference (again, you can look for examples in the original attrs PR)

frozen: bool = ...) -> Callable[[_C], _C]: ...


def field(*,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is good that you added this to lib-stub. Be sure to make updates here, if python/typeshed#1944 gets updated.

@dataclass
class Person:
name: str
age: int = field(default=0, init=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add more combinations of different values for init and present/absent defaults.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you added these tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now


reveal_type(Person) # E: Revealed type is 'def (name: builtins.str) -> __main__.Person'
john = Person('John')
john.age = 24
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this below the next line to check the type is inferred from class, not from this assignment.

@dataclass
class Application:
name: str = 'Unnamed'
rating: int = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a test that an error is given if a field without a default comes after the one with a default.

@ilevkivskyi
Copy link
Member

@Bogdanp are you still interested in working on this? If you don't have time or desire, then we can pick up the PR.

@Bogdanp
Copy link
Contributor Author

Bogdanp commented May 20, 2018

@ilevkivskyi yup. I plan on working on it later today.

@Bogdanp
Copy link
Contributor Author

Bogdanp commented May 20, 2018

@ilevkivskyi I believe I have addressed all the points you raised except for

  • the circular import (see this comment) and
  • adding support for InitVars.

I'm a bit out of my depth when it comes to adding support for InitVars. I could not find how to add a new generic alias type (which I think InitVar counts as). Help would be appreciated here or I would be happy to let someone more experienced implement that part.

@Bogdanp Bogdanp changed the title [wip] add basic support for dataclasses add support for dataclasses May 20, 2018
self._freeze(attributes)

info.metadata['dataclass'] = {
'attributes': {attr.name: attr.serialize() for attr in attributes},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably should be an OrderedDict in order to preserve order in pre-3.6 Python versions. I think that's what's causing the test failure on 3.4 and 3.5 in Travis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, that's right! I'll make the change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change made. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! For the future though, it's generally easier on the reviewer to push new commits instead of amending your previous commit and force-pushing as you're doing now. That way, it's easier for the reviewer to see what changes were made in response to the review.

We squash the commits anyway before merging, so it's OK to have a lot of commits in your PR branch.

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for continuing to work on this!

I have several more comments and suggestions. Fox the next time, please don't rebase: I need to see your full commit history, otherwise it makes reviewing hard (especially for large PRs like this one).

if fullname in mypy.plugins.attrs.attr_class_makers:
return mypy.plugins.attrs.attr_class_maker_callback
elif fullname in mypy.plugins.attrs.attr_dataclass_makers:
from mypy.plugins import attrs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the circular dependency can be fixed by extracting ClassDefContext into a separate module (like mypy.plugin_context). Let me know if you'd like me to do that!

I think moving out all the interfaces to a separate file mypy.api is a more permanent solution. But it is a big refactoring, so it is better to do this in a separate PR.

mypy/plugin.py Outdated
auto_attribs_default=True
)
# TODO: Drop the or clause once dataclasses lands in typeshed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can actually overtake the typeshed PR python/typeshed#1944, the original author seems to not have time to finish it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but I don't think I'll have the time to carry that through at the moment.

from mypy.types import CallableType, NoneTyp, Type, TypeVarDef, TypeVarType
from mypy.typevars import fill_typevars

#: The set of decorators that generate dataclasses.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a colon after #?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Habit. :)

Sphinx picks these up as docstrings for the module vars, but I'll remove this one.

def __init__(self, ctx: ClassDefContext) -> None:
self._ctx = ctx

def transform(self) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a docstring for this function.

args=[attr.to_argument(info) for attr in attributes if attr.is_in_init],
return_type=NoneTyp(),
)
for stmt in self._ctx.cls.defs.body:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this block? Add a comment about this.

class A:
a: int


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use at most one empty line in tests (even between tests).


reveal_type(SpecializedApplication) # E: Revealed type is 'def (id: Union[builtins.int, None], name: builtins.str, rating: builtins.int =) -> __main__.SpecializedApplication'

[builtins fixtures/list.pyi]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newline is missing at the end of file.


app = Application.parse('')

[builtins fixtures/list.pyi]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you didn't yet added incremental tests. They live in check-incremental.test (you can look for example in attrs PR).


app = Application.parse('')

[builtins fixtures/list.pyi]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add some tests for generic dataclasses, in particular type inference (again, you can look for examples in the original attrs PR)



@overload
def dataclass(_cls: _C,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need to add InitVar (here and on typeshed). You can define it as a normal generic class:

class InitVar(Generic[_T]):
    pass

Then in the plugin you can test for it e.g. like this if isinstance(some_type, Instance) and some_type.info.fullname() == 'dataclasses.InitVar'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had tried that and couldn't get it to work, but I must've messed something up last time because it works just fine now.

@Bogdanp
Copy link
Contributor Author

Bogdanp commented May 27, 2018

@ilevkivskyi do you have any advice with regards to how to best remove InitVar attributes from a class? I've tried removing them from the SymbolTable, but that breaks inheritance. The best I could come up with was to remove them from the class' symbol table and add the types to the DataclassAttributes but then I wasn't sure how to best serialize/deserialize the types.

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks almost ready, but there are few more comments.

Most importantly it looks like you still didn't add incremental tests. They live in check-incremental.test (you can look for examples in attrs tests). Please add at least these tests:

  • Basic functionality (creating boilerplate methods) still works after loading from cache
  • Base classes work after loading from cache (the base class and main class should be in different files).
  • Forward references work correctly. For example:
@dataclass
class C:
    x: 'Other'
class Other:
    ...

IIRC all these kinds of tests should be there for attrs.

'has_default': self.has_default,
'line': self.line,
'column': self.column,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best I could come up with was to remove them from the class' symbol table and add the types to the DataclassAttributes but then I wasn't sure how to best serialize/deserialize the types.

You can add list of InitVars here. Most types in mypy support (de-)serialization (those that are not supported like partial types, or forward reference types, should not appear in at the point where serialization happens). You can find signatures in mypy.types.Type.serialize and mypy.types.Type.deserialize.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had tried that, but deserialization failed with De-serialization failure: TypeInfo not fixed. Reading the docstring for FakeInfo, I understand this is supposed to be done in two passes, but it looks like the fixup step requires a lot of information I don't necessarily have/that it would be ugly to pass around. Am I missing something obvious?

info = self._ctx.cls.info
for attr in attributes:
try:
node = info.names[attr.name].node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will look cleaner if refactored via info.names.get(attr.name) which cannot raise.

setup.cfg Outdated
@@ -48,3 +48,6 @@ parallel = true

[coverage:report]
show_missing = true

[isort]
multi_line_output = 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want this addition? Can it be made in a separate PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was accidental, sorry! I'll remove it.

try:
# parse_bool returns an optional bool, so we corece it
# to a bool here in order to appease the type checker.
is_in_init = bool(ctx.api.parse_bool(field_args['init']))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be definitely cleaner if refactored via filed_args.get('init') that can't raise. (Also the comment/coercion will be unnecessary after refactoring.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing something but I don't think that's true. field_args is a dict from str -> Expression so I'll still need the parse_bool call, which means I'll also need the bool coercion since parse_bool returns an Optional[bool]. I'll refactor to drop the try-catch though (although I do think this approach is cleaner)!


# Treat the assignment as an instance-level assignment
# even though it looks like a class-level assignment.
node.is_initialized_in_class = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a test that fails without this change? Or if you already added, could you please tell me the name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that wasn't necessary after all so I removed it.

def problem(self) -> T:
return self.z # E: Incompatible return value type (got "List[T]", expected "T")

reveal_type(A) # E: Revealed type is 'def [T] (x: T`1, y: T`1, z: builtins.list[T`1]) -> __main__.A[T`1]'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is not very useful, I would rather called it with incompatible args to see an error, e.g. A('no', 'way', [1, 2]).

reveal_type(a) # E: Revealed type is '__main__.A[builtins.int*]'
reveal_type(a.x) # E: Revealed type is 'builtins.int*'
reveal_type(a.y) # E: Revealed type is 'builtins.int*'
reveal_type(a.z) # E: Revealed type is 'builtins.list[builtins.int*]'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a test that a.foo() returns an int.


[builtins fixtures/list.pyi]

[case testDataclassGenerics]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding some tests!

@Bogdanp
Copy link
Contributor Author

Bogdanp commented May 28, 2018

Thanks for the review @ilevkivskyi. I believe I've made all the requested changes apart from InitVar support which I'll try to tackle again towards the end of the week.

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to tackle again towards the end of the week.

We are close to the end of the week, so here is another (hopefully last) round of review.

Note that we will still need to add function plugin hooks to make asdict, astuple, replace etc. more precise (they can't be typed precisely in typeshed) in a separate PR. Are you interested in working on the second PR?

mypy/plugin.py Outdated
auto_attribs_default=True
)
# TODO: Drop the or clause once dataclasses lands in typeshed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The typeshed PR is landed. Is this still needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it still is, not sure why, but after updating the typeshed submodule to the current master, if I remove the or clause and run mypy against my project, the issue still happens.

'has_default': self.has_default,
'line': self.line,
'column': self.column,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I missing something obvious?

No, you are right, this way it will not work. Here is another idea about InitVars that I think should work. You can just store names of InitVars in a special filed in .metadata, then later you can find the types in the type of __init__ method, since in mypy both arg types and and arg names are stored. This is a bit hacky, but should be robust and is much simpler than manually traversing types in fixup phase.

frozen: bool = ...) -> Callable[[_C], _C]: ...


def field(*,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like current definition in typeshed is a bit different, could you please sync this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definition diverges from the one in typeshed in that it returns _T rather than Field[_T]. Returning Field[_T] breaks things like:

@dataclass
class Foo:
  x: int = field(default=42)

Because int and Field[int] are incompatible.


[builtins fixtures/list.pyi]
[out1]
[out2]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test is not testing anything. If all files are unchanged like here, then mypy will at most load caches, but will not make any checks. What you should do is to put first part of the test in another module (e.g. [file b.py]). And then on the second run (i.e. in [file b.py.2]) just make a cosmetic change, like comment somewhere, so that mypy rechecks b.py while using cache for (unchanged) a.py.


[builtins fixtures/list.pyi]
[out1]
main:7: error: Revealed type is 'def (x: builtins.int) -> __main__.B'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any error (including a reveal) prevents writing the cache, so you should use reveal_type only on second run.


[case testIncrementalDataclassesDunder]
from a import A
reveal_type(A) # E: Revealed type is 'def (a: builtins.int) -> a.A'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here, please don't add errors on the first run.


[builtins fixtures/list.pyi]
[out1]
[out2]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this test case, simple and to the point.


[builtins fixtures/list.pyi]
[out1]
main:2: error: Argument 2 to "B" has incompatible type "str"; expected "int"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this one too. Although it has an error on first run, we should keep it, just to be sure, the error is cleared when error is fixed.


[builtins fixtures/list.pyi]
[out1]
[out2]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would introduce a simple change in one of a or b on the second run.

A(b=B(42))
A(b=42) # E: Argument "b" to "A" has incompatible type "int"; expected "B"

[builtins fixtures/list.pyi]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No newline at the end of file. We really should make this a lint error :-)

@Bogdanp
Copy link
Contributor Author

Bogdanp commented Jun 4, 2018

Note that we will still need to add function plugin hooks to make asdict, astuple, replace etc. more precise (they can't be typed precisely in typeshed) in a separate PR. Are you interested in working on the second PR?

I don't think I'll have the free time to tackle that. In fact, I won't have time to work on this PR either in the next two or three weeks. Feel free to take over if you feel there are any other changes that need to be made.

@ilevkivskyi ilevkivskyi mentioned this pull request Jun 5, 2018
Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I added some polish, this is now ready. I will merge once tests pass. Thanks for all your work!

I will continue working on this in subsequent PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for dataclasses
4 participants