New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copying mutable defaults #3
Comments
I should note that attrs (and others) use a wrapper around a factory function to solve this:
|
How do you generate the If the former, maybe def __init__(self, x=None):
if x is None:
self.x = []
else:
self.x = x You'd have to special-case at list list, dict and set. I'd like to avoid actually importing |
I'm generating the code as text and then compiling it.
currently produces something like:
That is, if there's a default value, copy whatever is passed in. I could change this to use a sentinel value so I can detect if they supplied something (don't copy) or are using the default (do the copy).
My concern with not using
Having a I'm thinking I should also add type annotations to the generated |
Would it be too confusing to special case exact dicts, sets, and lists to call default.copy(); and for all other cases provide the I suspect this would mean that 95% of cases would just work, but would it be a trap for the cases that don't work? |
That might work, or it might not. We'll have to ask more opinions. |
Sometimes people might actually want the mutable default, so that we can support both @dataclass
class C:
x: int
y: list = [] and something like @dataclass
class C:
x: int
y: List[int] = make(list) or in some other way make the decision explicit. An important question here is what should be the default behavior. |
The default should be safe -- it's a common bug that people make by mistake. (And actually I'm not even getting the use case for wanting a mutable instance variable that's shared between instances except when overridden as an explicit constructor argument.) |
The version of dataclass.py tagged with 'v0' implements the behavior of calling default.copy() if the default is list, set, or dict. That lets you write code like: from dataclass import dataclass, field
Which I think satisfies the "default should be safe", at least for by far the most common cases. I think if we want to override this, we could add a "non-copy factory" sort of function, like:
We could also add a factory function marker to force a new instance for types we're not aware of. That is, something other than str, list, or dict:
I'm thinking we could also add a way to specify a custom copy function, but I haven't thought it through yet. So I guess the question is: if the common cases are safe and we have ways of dealing with non-common cases, does that make describing this too complicated, or too hard to reason about? |
I agree that the default should be safe. Although on a second thought I have seen people who want a mutable default very rarely. Maybe we need to support only: @auto
class C:
x: int
y: list=[] that will provide a safe behavior, but also allow manually override |
I'd launch the PEP without support for mutable defaults. Let people argue for them if they really need them. |
Agreed. |
I read this to mean: don't treat mutable default specially. If a default is mutable, then it's effectively shared among instances. But that seems to fail the "default should be safe" argument, so maybe that's not what you mean. There's no way to detect if a default value is mutable, so unless we always copy, we're effectively supporting mutable defaults. |
I think Guido means exactly opposite - we should not have any special support for creating mutable defaults. |
I'm not sure what that means. I think no special support means we get this behavior:
Did you have something else in mind? |
Yes, sorry, I was not very clear. The idea is to always copy, and by "no special support for mutable default" I meant that it will be not possible to create a "shared" mutable default with always copy strategy (as I mentioned in #3 (comment) some people may want this, and as Guido proposed, if someone will actually ask for this, then this can be allowed by e.g. manually overriding |
I definitely do not want that behavior. I think @DataClass ought to special-case at least list, dict and set defaults, and add copying code for these these to the generated |
Are you worried that the special case rules will be hard to remember, either when writing new code or reading existing code? Safety's a concern and I agree that @ericvsmith 's example above is a natural way to write it, but it's also important that someone reading existing dataclass code will be able to immediately and unambiguously be able to reason about the semantics. Maybe the answer is that the (I suppose you could also specify an |
But only list, dict and set have .copy() methods -- what about int, float, str etc.? I think I meant for the rule to be simple: we always use copy.copy() except when we can prove that that always returns the original object (as a slight optimization and perhaps to avoid having to import the copy module). The special-casing already exists in copy.copy() and copy._copy_dispatch, so we won't have to list it again. |
Right, I meant specifically that because int, float, str, etc. didn't have a That seems reasonable, if the emphasis in the docs is that |
How about something like this for the generated code (but less verbose):
So the logic is: "if you're using the default value, and it's not a known builtin immutable type, we'll copy it the best way we know how". Which from an external viewpoint is: "if you're using the default value, we'll copy it". Which means if you really wanted to share state, you could do:
Then Or, using the default not share state, with the same class definition:
I'm not sure the check for specific types calling i.copy() instead of copy.copy(i) is worthwhile. This can also be optimized some: for example, we know that if |
@ericvsmith Yes, this looks right. |
I would hope that the tests for type(i) could be done at "compile time" (i.e. in the class decorator) since the type is declared. I don't find the "if i is default_i" test very nice, maybe we can use None for the default in class C:
i: int = 42
j: list = [1, 2] the generated def __init__(self, i=42, j=None):
self.i = i
if j is None:
self.j = [1, 2]
else:
self.j = j |
Agreed on the "compile time" aspect. That's what I meant when I said we could optimize it. Agreed on the default value tests, and changing the Given:
Then the
As you show, we could probably do something smarter with the defaults for types we know are immutable like int, str, etc. |
You could check if the dispatch table for the indicated type maps to "immutable" to tell whether a type can be treated as such. |
Removed unused Factory class. The way we're handling mutable defaults doesn't require it (see #3).
This special-casing seems like adding complexity to our code and potentially confusing to the users. IME we need a way to define factories anyway so there’s that. I agree that a core API of the stdlib should be as unsurprising as possible so my idea special case (in accordance with explicit > implicit) would be to raise a helpful error, if e.g. In the best case that might educate the user on the perils of mutable default arguments. |
What special-casing? I think the documentation should emphasize that the default specified in the class will be used to initialize the instance variable via The only true semantic special case is default values produced by |
While I don’t have a conceptual complaint about that, it does sound really slow? Is good performance a goal at all? Not being slower than handwritten code is one of the tenants of attrs so this is a bit of a paradigm shift for me I’ll have to get used to. |
This seems like it could go very badly wrong with examples like this: @dataclass
class ObjectRelationalMappedRow:
database_connection: DatabaseConnection = global_database_connection i.e. there are many cases where a shared, mutable default is what is intended, and |
My comment from before was under the impression that there will also be a way to define factories. But the more I think about it, the more I agree with glyph that it’s not a good idea and a landmine. The average Python programmer has not a full grasp of the implications of [redacted, Tin’s examples are better] is so nice and explicit, I strongly believe that’s the correct route here. |
The shallow copying approach smells to me. That's a lot of contortions to get
to work verbatim. What about:
? The dicts will be shared, but I don't think this is what's implied at first glance. And as @glyph mentioned, the situation with arbitrary user defined classes is even worse. The current
|
Here's a possible alternative. We don't ever copy default values, but we do raise an exception when the default value's top-level type is list, dict or set. To create a value with a default that's an empty list, you have to write something like x: list = field(default_factory=list) The error message (as well as the documentation) should guide the user in this direction. The docs should also show that to create a non-empty list you'd have to use a lambda, e.g. x: list = field(default_factory=lambda: [0 ,1, 2]) I'm not particular about the exact idiom used for x: list = field(default=Factory(list, [0, 1, 2]) # Assumes Factory is imported (One thing I don't want is If we adopt this proposal, I think we should not do anything about other types of mutable values (@glyph's example of a global db connection should just work) nor about e.g. a tuple containing a list. (Users are still expected to learn that defaults are shared between instances.) An alternative solution for Glyph's example would be for the db connection class to define a connection: DbConnection = field(default_factory=lambda: global_db_connection) All in all I'm not sure I like this better than the shallow-copy-everything proposal, but at least it's easy to explain the first approximation (without the exceptions for list/dict/set) and guides users around the most common mistakes (surely an empty list or dict as a default must be a very common beginner's mistake). I'm also not sure that it is more important to support Glyph's global-db-connection example than it is to have a concise notation for defaults with type list/dict/set -- it's hard to come up with a spec that does the right thing for both without using a factory for at least one of them. I don't worry about the performance, the code generator can omit copy calls for common immutable types and we can hint at that in the docs. |
I am inclined more towards the factory idiom, it does not actually look too verbose: class Example:
shared_names: List[str] = []
keys: List[int] = field(factory=list) However I am not sure that we need to raise an exception for the first field above. Potentially, we can still allow plain lists, sets, dicts for class variables: shared: ClassVar[List[str]] = [] # This can be OK (if we not include this in __init__) however this will require importing |
You need to import typing anyways to write |
As someone who thought about this stuff before, I prefer having Would it be technically possible to allow for |
It is possible but I think it may create a wrong impression of creating something else than a normal field. I like the syntax proposed by Guido more: class C:
x: int = field(default=0, repr=False)
y: List[int] = field(factory=list)
|
@ilevkivskyi That actually doesn't look to bad to me either. As a beginner, I understand that factories are things that create things, so |
Why? default=Factory(list) looks uglier than default_factory=list (it has more punctuation) and requires an extra import. Is your worry that people might provide both default= and default_factory=? That's easily rejected with a friendly error message. Or are there use cases that I'm missing? |
Yes, and I prefer consistency but this is absolutely nothing I would waste anyone’s energy with. I’m fine with either version (although something more inline like |
On Jun 06, 2017, at 08:41 AM, Guido van Rossum wrote:
Why? default=Factory(list) looks uglier than default_factory=list
Can we please call that `factory` instead of `default_factory`?
|
I suppose, though then it would lack the connection with this being how the default value is constructed. I don't feel strongly about it, I agree that default_factory is a lot to type. |
It sounds like we're moving away from doing something automatically, and instead using a factory. Do we want to force you to specify a
? I do like the idea of raising an exception if using a list, dict, or set as the default value. It think it should be possible to make a factory that works around this, although I haven't thought it through. We could always relax the restriction later, in the unlikely event that people are clamoring for mutable defaults. |
This sounds reasonable. Although I still like my original proposal of using |
Here's an example where copy by default does something surprising. I ran in to this when testing default values:
This fails because c.y is a copy of sentinel, not sentinel itself. |
My goal is to get this resolved during the core sprint so I can make some progress on other dataclass issues. I think the options we've been discussing for non-ClassVar mutable defaults are:
Any others? |
I would go for one that doesn't preclude picking one of the others later.
For that reason (2) wins in my book. I'd just prohibit the three common
mutable types (list, dict, set) and their subclasses.
|
Option (2) seems reasonable as Guido formulated it, as I understand this will be accompanied by a way to make a factory like @data
class C:
bad_attr: List[int] = [] # Error
attr: List[int] = field(factory=list) # OK and a clear error message like |
Yes, exactly. I don't currently have the factory option implemented, so the initial checkin will just disallow the first line above. I'll make sure the factory option works at a later time. |
disallow default values of this type (since they're mutable and a source of bugs). In the future, we might do something like allow them, or automatically copy them.
Guido and I discussed this yesterday, and we decided we'd just
copy.copy()
the default values when creating a new instance.The
__init__
code I'm currently generating for:Looks something like:
But I don't think this is what we really want. I don't think we want to call
copy.copy()
if passed in an unrelated list, likeC(x=mylist)
. Maybe__init__
should check and only callcopy.copy()
ifx is C.x
?So:
?
(I haven't checked that this actually works as written, but the idea is to only copy the argument if it's the same object as the default that was assigned in the class creation statement.)
The text was updated successfully, but these errors were encountered: