@dataclass defaults #82939

anthony · 2019-11-09T20:29:38Z

BPO	38758
Nosy	@gvanrossum, @rhettinger, @ericvsmith, @vedgar

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2019-11-14.06:29:44.726>
created_at = <Date 2019-11-09.20:29:38.368>
labels = ['interpreter-core', 'invalid', 'type-bug', '3.7']
title = '@dataclass defaults'
updated_at = <Date 2019-11-14.06:29:44.722>
user = 'https://bugs.python.org/anthony'

bugs.python.org fields:

activity = <Date 2019-11-14.06:29:44.722>
actor = 'gvanrossum'
assignee = 'none'
closed = True
closed_date = <Date 2019-11-14.06:29:44.726>
closer = 'gvanrossum'
components = ['Interpreter Core']
creation = <Date 2019-11-09.20:29:38.368>
creator = 'anthony'
dependencies = []
files = []
hgrepos = []
issue_num = 38758
keywords = []
message_count = 11.0
messages = ['356306', '356545', '356552', '356553', '356554', '356555', '356562', '356572', '356577', '356578', '356579']
nosy_count = 5.0
nosy_names = ['gvanrossum', 'rhettinger', 'eric.smith', 'veky', 'anthony']
pr_nums = []
priority = 'normal'
resolution = 'not a bug'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue38758'
versions = ['Python 3.7']

anthony · 2019-11-09T20:29:38Z

Given one of the motivations of @DataClass is to reduce boilerplate code then, in the context of @DataClass,

x: list = [] should be equal to x: list = field(default_factory=lambda: [])

The example in PEP-557 is not reasonable. It should be:

class D:
    def __init__(self, x=[]):
        self.x = x

That x = None works (without specifying a default factory, and is different from plain "x") makes the whole "factory" argument even more bizarre. Why would a None Type work, but a List Type not?

I think either the behavior of this should be different or the docs should at address this more clearly.

vedgar · 2019-11-13T18:11:11Z

Have you read ericvsmith/dataclasses#3?

anthony · 2019-11-13T21:04:39Z

Thanks for adding it, I read it now.

And sorry to back track a moment - I love the idea of @DataClass and I can
only imagine how must work it was to implement as I am only a beginner.

I'm looking at this primarily from the narrow view point of a user - not so
much understanding the backdrop of how it can actually work.
But I still think this is worth considering as I think this is common
pattern based on this logic:

When using the default construction method it seems reasonable to pass a
default, such as a List.
Because
def __init__(x=[]):
self.x = x
Is the cleanest way I can think of to write that.
In this new context, if x: list = [] is the cleanest way to write it.
Therefore it should do that, however it is actually implemented.

Let's imagine for a moment that @DataClass becomes the default way of
constructing classes in python.
Do we really want such a common case to require such verbose language?

Looking at Guido's first comment on this, I think that detection mechanism
is what I would expect to happen. To illustrate verbosely:
https://gist.github.com/swirlingsand/d59d01ef79c5ee93f430ed324199bc65
I clearly misunderstood the way classes get passed by default.
I continue to believe that defining it in the argument should be equal to
defining it after. (if there are no other context items present).

From a beginners perspective here, It would appear that in a sense the
default init is actually "silently failing" in that the expected (because
of the two instances not being equal to each other as shown in L10 above)
isolation is not happening.
In a sense then, @DataClass is turning that silent "failure" into a
ValueError which is actually better.

What about something like this:
https://gist.github.com/swirlingsand/2494bc482902fada2248698f7b8af61e
If the common case is for it to be None, then this doesn't fire so there's
no cost.
If it's a common default then it handles it as expected.
If it's not found it raises a ValueError like before, so there's no loss or
harm.
A handful of defaults here may cover 80% of cases and the 20% can continue
to be handled by ValueError
In a project with 40 classes with 10 defaults per class that's 400 lines of
code that look like:
[]
instead of
field(default_factory=lambda: [])
(or {} etc.)

Issue #3 has many comments around copying, but that's not my concern, I'm
just talking about the defaults where the attribute is not provided at all
(ie is None).

I did the above example in regular python since I don't know enough about
how @DataClass is implemented, but it seems reasonable that if it can work
in normal python there should be a way to say augment operation() on line
21 with field()

On Wed, Nov 13, 2019 at 10:11 AM Vedran Čačić <report@bugs.python.org>
wrote:

Vedran Čačić <vedgar@gmail.com> added the comment:

Have you read ericvsmith/dataclasses#3?

----------
nosy: +veky

Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue38758\>

anthony · 2019-11-13T21:13:31Z

To clarify,
A major assumption I'm making is that the default is empty, or the
"copying" is handled as some separately coupled concept.
A motivation here is wanting to do operations like .append() that depend on
the object existing.

On Wed, Nov 13, 2019 at 1:04 PM Anthony Sarkis <anthonysarkis@gmail.com>
wrote:

Thanks for adding it, I read it now.

And sorry to back track a moment - I love the idea of @DataClass and I can
only imagine how must work it was to implement as I am only a beginner.

I'm looking at this primarily from the narrow view point of a user - not
so much understanding the backdrop of how it can actually work.
But I still think this is worth considering as I think this is common
pattern based on this logic:

When using the default construction method it seems reasonable to pass a
default, such as a List.
Because
def __init__(x=[]):
self.x = x
Is the cleanest way I can think of to write that.
In this new context, if x: list = [] is the cleanest way to write it.
Therefore it should do that, however it is actually implemented.

Let's imagine for a moment that @DataClass becomes the default way of
constructing classes in python.
Do we really want such a common case to require such verbose language?

Looking at Guido's first comment on this, I think that detection mechanism
is what I would expect to happen. To illustrate verbosely:
https://gist.github.com/swirlingsand/d59d01ef79c5ee93f430ed324199bc65
I clearly misunderstood the way classes get passed by default.
I continue to believe that defining it in the argument should be equal to
defining it after. (if there are no other context items present).

From a beginners perspective here, It would appear that in a sense the
default init is actually "silently failing" in that the expected (because
of the two instances not being equal to each other as shown in L10 above)
isolation is not happening.
In a sense then, @DataClass is turning that silent "failure" into a
ValueError which is actually better.

What about something like this:
https://gist.github.com/swirlingsand/2494bc482902fada2248698f7b8af61e
If the common case is for it to be None, then this doesn't fire so there's
no cost.
If it's a common default then it handles it as expected.
If it's not found it raises a ValueError like before, so there's no loss
or harm.
A handful of defaults here may cover 80% of cases and the 20% can continue
to be handled by ValueError
In a project with 40 classes with 10 defaults per class that's 400 lines
of code that look like:
[]
instead of
field(default_factory=lambda: [])
(or {} etc.)

Issue #3 has many comments around copying, but that's not my concern, I'm
just talking about the defaults where the attribute is not provided at all
(ie is None).

I did the above example in regular python since I don't know enough about
how @DataClass is implemented, but it seems reasonable that if it can work
in normal python there should be a way to say augment operation() on line
21 with field()

On Wed, Nov 13, 2019 at 10:11 AM Vedran Čačić <report@bugs.python.org>
wrote:

>
> Vedran Čačić <vedgar@gmail.com> added the comment:
>
> Have you read ericvsmith/dataclasses#3?
>
> ----------
> nosy: +veky
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue38758\>
> _______________________________________
>

ericvsmith · 2019-11-13T21:14:33Z

The problem is that what you wrote isn't what most people want. Here's your example without dataclasses. I've added an "append_to_x" method, which does the obvious thing:

>>> class C:
...   def __init__(self, x=[]):
...      self.x = x
...  
...   def append_to_x(self, val):
...     self.x.append(val)
... 

Now create two objects, and inspect their "x" properties:
>>> a = C()
>>> b = C()
>>> a.x
[]
>>> b.x
[]

So far so good. Now append something to "a.x":
>>> a.append_to_x(10)
>>> a.x
[10]

And notice that "b.x" changes, too:
>>> b.x
[10]

So the naive behavior isn't what you want. dataclasses is trying to prevent you from doing this.

You should look at "mutable defaults", perhaps starting here (from a random Google search): https://blog.florimond.dev/python-mutable-defaults-are-the-source-of-all-evil

anthony · 2019-11-13T21:21:14Z

Hey Eric, I think our emails crossed in the wind, please see my comment that includes (as a sub component) a similar idea to what's proposed in the article you linked.

ericvsmith · 2019-11-13T22:19:07Z

I'm not sure what you're proposing.

vedgar · 2019-11-14T04:24:51Z

It seems to me that what you're missing is that "class declarations" are still perfectly normal executable statements (in most other superficially similar programming languages, they are not). So, when you say

class A:
    b = []

it is actually executed, a new empty list is made, and named A.b. If you then construct a = A(), then a.b is that same object. It must be, you never made any other list in the process. So if you really want a.b to be a _different_ empty list, you have to make it at some point. The most obvious way to do it is just to copy the A.b --- that's why people usually talk about copying.

Your approach is different: it seems to me that you say, if A.b is a list, then make a new empty list, if it is a set, then make a new empty set, and if it is a dict, then make a new empty dict. Not to mention that it feels very weird when having e.g.

class A:
    b = [4]

(-:, it doesn't take into account any other types. Which leads to another your problem, the one of perspective. Lists, sets and dicts are not that "common case" as you seem to think. Yes, they are in the beginners' code -- and that's why current dataclass implementation flags them as errors, since it's quite possible that people who write such things don't actually understand how Python is executed. But specialcasing them to actually do something useful would be the wrong thing to do, since it would incentivize people who should know better, back into using those simple types. I think it is shown in the discussion mentioned.

rhettinger · 2019-11-14T05:12:08Z

I concur with Vedran and recommend this proposal be rejected.

During training and code review, we spend a lot of time trying to get people to not write code like this. By making dataclasses behave differently than the rest of language, we would would lose the clear, bright line we have now. IOW, this would be bug bait.

anthony · 2019-11-14T06:19:11Z

Vedran thank you for the detailed comments.

I want to separate the idea side and implementation:
The idea is having defaults expressed in a way of least surprise and in the least amount of code.

Specifically that
[1]
makes more sense then
field(default_factory=lambda: [1])

or more generally that
default
makes more sense then
field(default_factory=lambda: default)

I agree there's a lot lacking in my example starting point implementation.
Let's not let what's lacking in that implementation distract from the spirit of the idea.

The scope of that example is meat to illustrate creating a default that is of least surprising, specifically that a new instance of a class is isolated.
But I think the point of whether or not the default is isolated feels like a distraction.

If a motivation for using type annotations is to restrict to reduce errors. To me, "prototyping" defaults by having them declared as soon as possible in the class is a further restriction. I believe that's a reasonable coding practice, and actually a good thing.

To me this is the current situation, in order to setup a class using @DataClass a user must either:

a) Have a relatively in depth understanding of python. I think proofs of that include both PEP having to have a section justifying that. And that there was such discussion in the issue.
A point of higher level languages is to nicely abstract lower level APIs, if I have to learn the lower level API in order to use the language it reduces the benefit, and that's really what the points made sound like.

b) Calling two "special" / additional functions and adding a (relatively) large amount of code. I can't think of anything else in python where the default method requires knowing two separate additional functions! Python is all about short, semantically meaningful expressions yet this feels like the total opposite!

If setting up defaults is a common point of error or "bad code" then why not setup the standard library in such a way that it does it right?
ie that a user can specify the default they want and the library sets up the default in the right way.
If that's as a separate object, or a mutable object, either way. The point being that setting up a default should be straightforward.

This is my first attempt at a contribution, even if all it is is highlighting a problem and the "solution" I'm suggesting may be the wrong one. But so far all of comments feel like they are defending a cascade of "not ideal" situations. Let's try actually looking at the root from the user's perspective and how can we improve this!

gvanrossum · 2019-11-14T06:29:45Z

This should not have been an issue but a discussion on python-list or perhaps python-ideas.

anthony mannequin added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Nov 9, 2019

gvanrossum closed this as completed Nov 14, 2019

gvanrossum added the invalid label Nov 14, 2019

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

@dataclass defaults #82939

@dataclass defaults #82939

anthony mannequin commented Nov 9, 2019

anthony mannequin commented Nov 9, 2019

vedgar mannequin commented Nov 13, 2019

anthony mannequin commented Nov 13, 2019

anthony mannequin commented Nov 13, 2019

ericvsmith commented Nov 13, 2019

anthony mannequin commented Nov 13, 2019

ericvsmith commented Nov 13, 2019

vedgar mannequin commented Nov 14, 2019

rhettinger commented Nov 14, 2019

anthony mannequin commented Nov 14, 2019

gvanrossum commented Nov 14, 2019

@dataclass defaults #82939

@dataclass defaults #82939

Comments

anthony mannequin commented Nov 9, 2019

anthony mannequin commented Nov 9, 2019

vedgar mannequin commented Nov 13, 2019

anthony mannequin commented Nov 13, 2019

anthony mannequin commented Nov 13, 2019

ericvsmith commented Nov 13, 2019

anthony mannequin commented Nov 13, 2019

ericvsmith commented Nov 13, 2019

vedgar mannequin commented Nov 14, 2019

rhettinger commented Nov 14, 2019

anthony mannequin commented Nov 14, 2019

gvanrossum commented Nov 14, 2019