Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent behavior between set and dict_keys/dict_items: for non-iterable object x, set().__or__(x) returns NotImplemented, but {}.keys().__or__(x) raises TypeError #68601

Closed
canjo mannequin opened this issue Jun 8, 2015 · 5 comments
Assignees
Labels
3.9 only security fixes type-bug An unexpected behavior, bug, or error

Comments

@canjo
Copy link
Mannequin

canjo mannequin commented Jun 8, 2015

BPO 24413
Nosy @rhettinger, @serhiy-storchaka, @MojoVampire

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/rhettinger'
closed_at = <Date 2019-08-22.16:39:25.892>
created_at = <Date 2015-06-08.21:04:25.582>
labels = ['type-bug', '3.9']
title = 'Inconsistent behavior between set and dict_keys/dict_items: for non-iterable object x, set().__or__(x) returns NotImplemented, but {}.keys().__or__(x) raises TypeError'
updated_at = <Date 2019-08-22.16:39:25.891>
user = 'https://bugs.python.org/canjo'

bugs.python.org fields:

activity = <Date 2019-08-22.16:39:25.891>
actor = 'rhettinger'
assignee = 'rhettinger'
closed = True
closed_date = <Date 2019-08-22.16:39:25.892>
closer = 'rhettinger'
components = []
creation = <Date 2015-06-08.21:04:25.582>
creator = 'canjo'
dependencies = []
files = []
hgrepos = []
issue_num = 24413
keywords = []
message_count = 5.0
messages = ['245044', '245046', '349152', '349195', '350212']
nosy_count = 4.0
nosy_names = ['rhettinger', 'serhiy.storchaka', 'josh.r', 'canjo']
pr_nums = []
priority = 'low'
resolution = 'wont fix'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue24413'
versions = ['Python 3.9']

@canjo canjo mannequin added the type-bug An unexpected behavior, bug, or error label Jun 8, 2015
@rhettinger
Copy link
Contributor

The dictviews_or() function in Objects/dictobject.c is converting the keys to a set and calling the set.update() method with the given argument. The set.update() method doesn't return NotImplemented because it has no reflected operation.

It looks like dictviews_or() should instead call set.__or__() to allow it a chance to return NotImplemented. The other dictview set operations are similarly afflicted.

@rhettinger
Copy link
Contributor

One other thought: Returning NotImplemented may be an easy change but it isn't clear that it would provide a sensible capability for mapping views. The non-iterable *other* argument would need to have a __ror__() method than could do something useful with a KeysView or ItemsView argument. That seems like an improbable use case (which is why I presume this has never come up before).

@serhiy-storchaka
Copy link
Member

set.update() accepts arbitrary iterable, but set.__or__() requires a set or frozenset.

@MojoVampire
Copy link
Mannequin

MojoVampire mannequin commented Aug 7, 2019

To be clear, set().__or__(x) returns NotImplemented, it doesn't raise NotImplementedError. I've edited the title to match.

One major problem that gets in the way of a fix is that the interface of set and dict views doesn't match, because dict views only provide the interface of collections.abc.Set, and collections.abc.Set *only* requires the operator overloads, not the named equivalent methods. And because the named equivalents don't exist, the operator overloads were made more liberal, to accept any iterable, not just set or set-like things.

set's rules are:

  1. Named methods take varargs, and accept any iterable(s)
  2. Operator overloads take only one argument, which must be a set/frozenset

dict view's rules are:

  1. There are no named methods (aside from isdisjoint, which has no operator equivalent)
  2. Operator overloads take only one argument, which can be any iterable

So fixing this is problematic so long as we want to keep a simple implementation; right now, all the methods simplify to:

  1. Convert to set
  2. Call named, in-place method on resulting set to get liberal behavior

Fixing this would involve either:

  1. Rewriting all the dict views operator overloads to do the work themselves, including the logic to reject non-iterables by returning NotImplemented like set's operators do.

    Pro: Not a hack. Resulting code would be more efficient in many cases (e.g. dict view's and could make a new empty set [currently makes set with a copy of the view's values], and iterate the right hand side while performing membership tests in the underlying dict before inserting into the result set, avoiding the need to make a large set that will almost certainly shrink, possibly requiring rehashes)

    Con: Lots of new code to maintain. Would not benefit from any algorithmic improvements to equivalent set code without mirroring them manually.

  2. Checking for TypeError from the set method and explicitly returning NotImplemented.

    Pro: Easy, code remains maintainable, benefits from any future optimizations to set

    Cons: Can't distinguish fundamental problems that should be TypeErrors (right hand side contains unhashable objects) from simple case that should return NotImplemented (right hand side isn't iterable). Potentially fixable by explicitly performing the conversion to iterator on the dict view's side (though you'd only do it to confirm it's possible; you'd want to pass the original object unmodified for the common case where it's a set/frozenset and could be processed more efficiently). Also makes failure case even slower than it already is (exception is raised, then ignored, then NotImplemented is returned, then right-hand side is checked, and odds are, the right-hand side won't know what to do either, so it'll return NotImplemented anyway, and a TypeError gets thrown after more work).

  3. Refactor the set API to expose (private, C level) helpers for the operator overloads (specifically, the *in-place* versions like __ior__, since we're always making a new true set to call it on anyway, and it seems silly to use __or__ which would make yet another set, and force us to throw away the first set we made) that are less strict about their arguments.

    Pros: Roughly the same as Rename README to README.rst and enhance formatting #2, plus a minor per-call (not per item) performance boost: C helpers could be called directly, avoiding overhead of calling Python level methods via _PyObject_CallMethodIdOneArg(result, &PyId_update, other); invoking dict lookups and the like when we know we just made a true set object, so there is no possibility of update being overridden, so a direct C level call would save time. Continues to benefit from any optimizations to set's code (since 95% of the code remains the same).

    Con: More code churn than Rename README to README.rst and enhance formatting #2; unfortunately, unlike list (which has type loose in-place operator overloads), set's in-place operator overloads are still type strict; alist += {1,2,3} works, but aset |= [1,2,3] does not, so we can't just call set's ixor unmodified, we'd need a _PySet_IXorIterable separate from the existing set_ior, where both of them might be implemented in terms of set_update_internal, but _PySet_IXorIterable only checks that the argument is iterable to return NotImplemented.

#1 and #3 seem to be the only fully correct options; #1 could squeeze out more performance, but requires a *lot* more new code, #3 requires a lot less new code, and gets a tiny performance boost.

I don't see explicit calls to or being all that common though, and I have a hard time envisioning a class that would hit this case without explicitly calling or; for {}.keys() | x the only time you'd see a behavioral difference from set() | x is when:

  1. x is not iterable, and
  2. x defines a __ror__ that can do something meaningful with a set

I'd personally assume anything that meets criterion #2 would be iterable; anyone have a counter-example? If this is only a theoretical problem, might it make sense to just leave it as is?

@MojoVampire MojoVampire mannequin changed the title Inconsistent behavior between set and dict_keys/dict_items: for non-iterable object x, set().__or__(x) raises NotImplementedError, but {}.keys().__or__(x) raises TypeError Inconsistent behavior between set and dict_keys/dict_items: for non-iterable object x, set().__or__(x) returns NotImplemented, but {}.keys().__or__(x) raises TypeError Aug 7, 2019
@rhettinger rhettinger added the 3.9 only security fixes label Aug 7, 2019
@rhettinger rhettinger self-assigned this Aug 7, 2019
@rhettinger
Copy link
Contributor

If this is only a theoretical problem, might it make sense
to just leave it as is?

I'm fine with that. In the last four years, I'm the only one who ever noticed, so it doesn't appear to be a problem in practice.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.9 only security fixes type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants