-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dictionary union. (PEP 584) #80325
Comments
...as discussed in python-ideas. Semantically: d1 + d2 <-> d3 = d1.copy(); d3.update(d2); d3 Attached is a working implementation with new/fixed tests for consideration. I've also updated collections.UserDict with the new __add__/radd/iadd methods. |
I believe that Guido rejected this when it was proposed a few years ago. |
Python ideas discussion in 2015 : https://mail.python.org/pipermail/python-ideas/2015-February/031748.html |
I believe it was proposed and rejected multiple times. |
For the record, I'm opposed to the idea.
options.update(user_selections) That reads more like self explanatory English than:
The latter takes more effort to correctly parse and
|
In Python, the plus operator for sequences (strings, lists, tuples) is non-commutative. But I have other arguments against it:
|
For sequences, that is obvious and expected, but not so much with mappings where the order of overlapping keys is determined by the left operand and the value associated with those keys is determined by the right operand. Also with sequences the + operator actually means "add to", but with dictionaries it means "add/or replace" which is contrary to the normal meaning of plus. I think that was one of Guido's reasons for favoring "|" instead of "+" for set-to-set operations.
This is a good point. |
Which doesn't mean that "d1 + d2" isn't much more intuitive than this special-character heavy version. It takes me a while to see the dict merge under that heap of stars. And that's already the shortest example.
The RHS of "d += M" doesn't have to be a dict IMHO, it could be any mapping. And even "dict(X) + M" doesn't look all too bad to me, even though there's "dict(X, **M)".
That's why there would be support for "+=". The exact same argument already fails for lists, where concatenation is usually much more performance critical than for the average little dict. (And remember that most code isn't performance critical at all.)
Which is a different use case that is unlikely to go away with this proposal.
This is a valid argument, although it always depends on the concrete code what the most readable way to express its intentions is. Again, this doesn't really differ for lists. Let's wait for the PEP, I'd say. |
scoder: dict(X, **M) is broken unless M is known to be string keyed (it used to work, but in Python 3, it will raise a TypeError). It's part of the argument for the additional unpacking generalizations from PEP-448; {**X, **M} does what dict(X, **M) is trying to do, but without abusing the keyword argument passing convention. You also claim "It takes me a while to see the dict merge under that heap of stars", but that's at least as much about the newness of PEP-448 (and for many Python coders, a complete lack of familiarity with the pre-existing varargs unpacking rules for functions) as it is about the punctuation; after all, you clearly recognize dict(X, **M) even though it's been wrong in most contexts for years. In any event, I'm a strong -1 on this, for largely the same reasons as Raymond and others:
Problem #2 could be used to argue for allowing | instead of + (which would also resolve #4, and parts of #3), since | is already used for unioning with sets, and this operation is much closer to a union operation than addition or concatenation. Even so, it would still be misleading; at least with sets, there is no associated value, so it's still mostly lossless (you lose the input lengths, but the unique input data is kept); with dicts, you'd be losing values too. Basically, I think the PEP-448 unpacking syntax should remain as the "one-- and preferably only one --obvious way to" combine dictionaries as a one-liner. It's more composable, since it allows adding arbitrary additional key/value pairs, and more efficient, since it allows combining more than two dicts at once with no additional temporaries: dicta + dictb + dictc requires "dictab" to be made first, then thrown away after dictab + dictc produces dictabc, while {**dicta, **dictb, **dictc} builds dictabc directly. The only real argument I can see for not sticking to unpacking is that it doesn't allow for arbitrary dict-like things to produce new dict-like things directly; you'd have to rewrap as myspecialdict({**speciala, **specialb}). But I don't think that's a flaw worth fixing if it means major changes to the behavior of what I'm guessing is one of the three most commonly used types in Python (along with int and tuple, thanks to the integration of dicts into so many facets of the implementation). |
I changed my mind and am now in favor. Most of the arguments against could also be used against list+list. Counter addition is actually a nice special case of this -- it produces the same keys but has a more sophisticated way of merging values for common keys. Please read the python-ideas thread! |
Also note: That Python ideas thread that xtreak linked ( https://mail.python.org/pipermail/python-ideas/2015-February/031748.html ) largely rejected the proposal a couple weeks before PEP-448 was approved. At the time, the proposal wasn't just about +/+=; that was the initial proposal, but operator overloading was heavily criticized for the failure to adhere to either addition or concatenation semantics, so alternate constructors and top-level functions similar to sorted were proposed as alternatives (e.g. merged(dicta, dictb)). The whole thread ended up being about creating an approved, built-in way of one-lining: d3 = d1.copy(); d3.update(d2) A key quote though is that this was needed because there was no other option without rolling your own merged function. Andrew Barnert summarized it best: "I'm +1 on constructor, +0.5 on a function (whether it's called updated or merged, whether it's in builtins or collections), +0.5 on both constructor and function, -0.5 on a method, and -1 on an operator. "Unless someone is seriously championing PEP-448 for 3.5, in which case I'm -0.5 on anything, because it looks like PEP-448 would already give us one obvious way to do it, and none of the alternatives are sufficiently nicer than that way to be worth having another." As it happens, PEP-448 was put in 3.5, and we got the one obvious way to do it. Side-note: It occurs to me there will be one more "way to do it" in 3.8 already, thanks to PEP-572: (d3 := d1.copy()).update(d2) I think I'll stick with d3 = {**d1, **d2} though. :-) |
Current python-ideas thread for the issue : https://mail.python.org/pipermail/python-ideas/2019-February/055509.html |
If we're going to forget about commutativity of +, should we also implement +/+= for sets? |
The question is: what would that do? The same as '|=' ? That would be rather confusing, I think. "|" (meaning: "or") seems a very natural operation for sets, in the same way that "|" operates on bits in integers. That suggests that "|" is the right operator for sets. In any case, this is an unrelated proposal that is better not discussed in this ticket. The only link is whether "|" is the more appropriate operator also for dicts, which is to be discussed in the PEP and thus also not in this ticket. |
Is this issue directly or indirectly related to the PEP-584 "Add + and - operators to the built-in dict class"? |
Ah yes, it's written in the title of the PR. I add it to the bug title as well. |
Another obvious way to do it, but I'm +1 on it. A small side point however - PEP-584 reads:
...
The references cited does not back this assertion up. Perhaps the intent is to reference the "cool/weird hack" dict(d1, **d2) (see https://mail.python.org/pipermail/python-dev/2010-April/099485.html and https://mail.python.org/pipermail/python-dev/2010-April/099459.html), which allowed any hashable keys in Python 2 but only strings in Python 3. If I see {**d1, **d2}, my expectations are that this is the new generalized unpacking and I currently expect any keys to be allowed, and the PEP should be updated to accurately reflect this to prevent future misunderstandings. |
PEP-584 has been approved by the Steering Council (at my recommendation). We will shortly begin landing PRs related to this. |
While the main code has been merged now, I propose to keep this issue open until some other things have happened:
|
My current PR plans are:
I'll also create a BPO issue to discuss whether the dict subclasses in http.cookies should be updated. That should do it for CPython; I'm planning on updating typeshed and adding a handful of tests to mypy for TypedDict, etc. after these are landed. Guido, okay to tag you on these for review? |
Yup, great plan. On Mon, Feb 24, 2020 at 22:29 Brandt Bucher <report@bugs.python.org> wrote:
-- |
Not sure if this is a big deal or not, and it seems likely that the preexisting behaviour of .update() and ** unpacking have already decided it, but is it intentional that you end up with the first-seen key and the last-seen value in the case of collisions? class C:
def __init__(self, *a): self.a = a
def __hash__(self): return hash(self.a[0])
def __eq__(self, o): return self.a[0] == o.a[0]
def __repr__(self): return f"C{self.a}" >>> c1 = C(1, 1); c1
C(1, 1)
>>> c2 = C(1, 2); c2
C(1, 2)
For set union we get the first seen value:
>>> {c1} | {c2}
{C(1, 1)}
For dict union we get the first seen key and the last seen value:
>>> {c1: 'a'} | {c2: 'b'}
{C(1, 1): 'b'}
But similarly for dict unpack (and .update(); code left as an exercise to the reader):
>>> {**{c1: 'a'}, **{c2: 'b'}}
{C(1, 1): 'b'} So the union of two dicts may contain .items() elements that were not in either of the inputs. Honestly, I've never noticed this before, as the only time I create equivalent objects with meaningfully-distinct identities is to use with sets. I just figured I'd try it out after seeing suggestions that the dict union operands were transposed from set union. |
As a somewhat simpler example: >>> f = {False: False}
>>> z = {0: 0}
>>> f | z
{False: 0}
>>> {**f, **z}
{False: 0}
>>> f.update(z); f
{False: 0} Though these hairier cases aren't explicitly addressed, the conflict behavior is covered in the Rationale and Reference Implementation sections of the PEP. All of the above examples share code ( I find it makes more sense if you see a set as valueless keys (rather than keyless values). |
OK, that makes sense, it works similar to ChainMap.copy(), which copies maps[0] and keeps links to the rest. So in particular Im not sure if the dict(other) cast is the best way to go about it. Maybe this would work? def __or__(self, other):
new = self.copy()
new |= other # OR new.update(other) ???
return new
def __ior__(self, other):
self.update(other)
return self Note that there is no ChainMap.update() definition -- it relies on MutableMapping.update(). I guess we need a __ror__ as well, in case there's some other mapping that doesn't implement __or__: def __ror__(self, other):
new = other.copy()
new.update(self)
return new Note that this doesn't return a ChainMap but an instance of type(other). If other doesn't have a copy() method it'll fail. As a refinement, __or__ and __ror__ should perhaps check whether the operation can possibly succeed and return NotImplemented instead of raising? (Based on the type of other only, not its contents.) |
I didn't see your second reply, with I'm not so keen on that, because its special behavior can't be mimicked by |
Yeah, I was imagining something like that... I used the cast for brevity in my reply but that probably wasn't helpful. Note that for __or__, we probably want to check the type of the argument (for either dict or ChainMap, or maybe just Mapping), to keep it from working on an iterable of key-value pairs.
Agreed. Again, we can check for Mapping here to assure success for the copy() move.
Yup, see above. I think a check for Mapping should be fine. |
Just to clarify: If we decide to check isinstance(other, (ChainMap, dict)), '|' should probably be used. If we decide to check isinstance(other, Mapping), I think the copy/update methods should be used. |
def __or__(self, other):
return self.__class__(self.maps[0] | other, *self.maps[1:])
def __ror__(self, other):
return other | dict(self) def __or__(self, other):
return self.__class__(other, *self.maps)
def __ror__(self, other):
return self.__class__(*self.maps, other) There are problems with both variants, so I think it may be better to not add this operator to ChainMap. |
I think we're only seriously considering the first variant (although implemented slightly differently, see my last two messages). And __ror__ would probably change, returning the type of self. What are the "problems" with it, exactly? We seem to be in agreement that the update behavior is reasonable, even for ChainMaps. |
We already have somewhat different semantics of Let's stop the debate and put up a PR. |
Sounds good, I'll have these up soon. |
Still waiting for ChainMap -- what else? |
My brother will have a ChainMap PR up soon. I'm just finishing up MappingProxyType, myself. Probably both this weekend. Then I'll move on to OrderedDict, which looks like it could be tricky. I'll need to familiarize myself with the implementation better (unless there's somebody who is already familiar with it who wants to take over). It looks well-commented, though. I think we can pass on the http.cookies subclasses since there don't appear to be any experts/maintainers for that module. |
bpo-39857 just reminded me that we should update os._Environ as well (the type of os.environ and os.environb). I have another first-timer who will probably want to take it. |
Once this issue will be done, would you mind to update PEP-584? Currently, it only says "This PEP proposes adding merge (|) and update (|=) operators to the built-in dict class." It doesn't mention that other types like OrderedDict, MappingProxy or ChainMap are updated as well. |
Yes, I can add a section explaining that after the PEP was accepted, we decided to add the operators to several non-dict mappings as well. I'm also going to add some explanation as to why Mapping/MutableMapping didn't grow them, too. |
Three other MutableMappings we might want to update:
Shelf is up in the air, since it doesn't look like it defines a copy() equivalent... I also have no experience with it. Since it's a MutableMapping subclass, (not a dict subclass), we could in theory hold off on updating this until someone asks for it, without backward compatibility issues. I think the other two should be updated, though. I can coordinate PRs for them this week. |
I definitely think we should leave Shelf alone, it's a toy class from a different era. It makes sense to update the weak dicts; hopefully the | and |= operators can be implemented in terms of other, more primitive operations, so we will have assurance that these classes' essential behavior is preserved. |
And... that's it! Big thanks to everybody who had a part in making this happen. |
I'm guessing this can be closed? |
I guess we should keep this open until Raymond Hettinger has given feedback on #18832 (where we have the option of changing to Brandt's proposal from #18832 (comment)). |
First off, thanks for adding the feature, it's much appreciated. But it'd be great if you guys can enable list merge for the dict having list as its value, in current form I believe it's handling only "key: value" merge. for e.g.:
>>> d1 = {'spam': [1, 2, 3]}
>>> d2 = {'spam': [2, 3, 4]}
>>> d1 | d2
>>> {'spam': [1, 2, 3, 4]} |
Similar behavior was considered and ultimately rejected by the PEP as being too specialized: https://www.python.org/dev/peps/pep-0584/#concatenate-values-in-a-list What you're asking for it subtly different, and even *more* specialized than that. I'd recommend just subclassing dict and overriding the operator, as the PEP suggests. |
Taking the union of items() in Python 3 (viewitems() in Python 2.7) will also fail when values are unhashable objects (like lists, for example). Even if your values are hashable, since sets are semantically unordered, the behavior is undefined in regards to precedence. So don't do this: >>> c = dict(a.items() | b.items())
This example demonstrates what happens when values are unhashable:
>>> x = {'a': []}
>>> y = {'b': []}
>>> dict(x.items() | y.items())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
Here's an example where y should have precedence, but instead the value from x is retained due to the arbitrary order of sets:
>>> x = {'a': 2}
>>> y = {'a': 1}
>>> dict(x.items() | y.items())
{'a': 2} |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: