New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

17.3.0 cmp/default/hash/WeakKeyDictionary interaction #289

Closed
glyph opened this Issue Nov 9, 2017 · 14 comments

Comments

Projects
None yet
4 participants
@glyph
Contributor

glyph commented Nov 9, 2017

On 17.3.0, an @attr.s(hash=False) class which has an attr.ib with a @default that looks itself up in a WeakKeyDictionary into which it has previously been placed by an earlier @default raises an AttributeError on construction, unless @attr.s(cmp=False) is also set.

This worked properly on 17.2.0.

import attr
import weakref
bugs = weakref.WeakKeyDictionary()

@attr.s(hash=False)
class Bug(object):
    a = attr.ib()
    @a.default
    def make_a(self):
        bugs[self] = 1
        return 1

    b = attr.ib()
    @b.default
    def make_b(self):
        bugs[self]
        return 2

Bug()
@glyph

This comment has been minimized.

Contributor

glyph commented Nov 9, 2017

Traceback:

Traceback (most recent call last):
  File "attrbug.py", line 19, in <module>
    Bug()
  File "<attrs generated init 658a71b6629a7d170a675afc1ad532e94ad99934>", line 9, in __init__
  File "attrbug.py", line 16, in make_b
    bugs[self]
  File ".../weakref.py", line 394, in __getitem__
    return self.data[ref(key)]
  File ".../site-packages/attr/_make.py", line 720, in eq
    return attrs_to_tuple(self) == attrs_to_tuple(other)
  File ".../site-packages/attr/_make.py", line 713, in attrs_to_tuple
    return _attrs_to_tuple(obj, attrs)
  File ".../site-packages/attr/_make.py", line 679, in _attrs_to_tuple
    return tuple(getattr(obj, a.name) for a in attrs)
  File ".../site-packages/attr/_make.py", line 679, in <genexpr>
    return tuple(getattr(obj, a.name) for a in attrs)
AttributeError: 'Bug' object has no attribute 'b'
@glyph

This comment has been minimized.

Contributor

glyph commented Nov 9, 2017

I think the proper solution to this is to eliminate the nonsense combination of cmp=True, hash=False. An identity-hashable object must be identity-comparable, since dict, WeakKeyDictionary et. al. will use the __eq__ protocol to have a very specifically defined meaning in this context.

One possible solution is to make cmp=None the default.

@markrwilliams

This comment has been minimized.

markrwilliams commented Nov 9, 2017

The bad commit courtesy of git bisect: 8b490b8

@wsanchez

This comment has been minimized.

wsanchez commented Nov 9, 2017

relating #170

@hynek

This comment has been minimized.

Member

hynek commented Nov 10, 2017

I just woke up but looking at this it seems like this unearthed former buggy behavior? Wasn’t it comparing the attribute definition before?

@hynek

This comment has been minimized.

Member

hynek commented Nov 10, 2017

relating #170

How? All I see is eq?

I think the proper solution to this is to eliminate the nonsense combination of cmp=True, hash=False.

I don’t quite follow, isn’t that the only way how you can compare mutable objects by their values?

I’m rather surprised that Weakref uses eq, but I’ll have to look at it later when I’m at a computer.

@hynek

This comment has been minimized.

Member

hynek commented Nov 10, 2017

So it seems like the answer to the question what’s happening is buried somewhere in C code and as far as I can see, this is absolutely not a regression in attrs, but instead #253 did what it was supposed to do: oust subtle bugs.

What @glyph’s code was doing is adding a half-initialized instance into a weakref.WeakKeyDictionary() and then accessing it before it’s fully initialized. That’s apparently a bug by weakref.WeakKeyDictionary() semantics and it worked by accident, because previously it compared the Attribute() definitions of b. The correct approach would have been to add the finished instance in __attrs_post_init__.


This is btw why I’ve fought so long and so hard against passing half-initialized instances around. I was bound to get my morning ruined like this eventually.


If you’d like to follow along, this is equivalent to the reproducer above:

>>> import weakref
... d = weakref.WeakKeyDictionary()
... class C:
...     __hash__ = object.__hash__
...     def __eq__(self, other):
...         return self.x == other.x and self.y == other.y
...     def __init__(self):
...         d[self] = 1
...         self.x = 1
...         d[self]
...         self.y = 2

>>> C()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 10, in __init__
  File "/Users/hynek/.virtualenvs/attrs/lib/python3.6/weakref.py", line 394, in __getitem__
    return self.data[ref(key)]
  File "<stdin>", line 6, in __eq__
AttributeError: 'C' object has no attribute 'y'
'C' object has no attribute 'y'

Now the question is: why does return self.data[ref(key)] ever call __eq__?

What I believe it comes down is some internal stuff in weaker.ref()’s callback argument, because this works:

>>> class C:
...     __hash__ = object.__hash__
...     def __eq__(self, other):
...         raise Exception("WTF")

>>> d = {}

>>> wr1 = weakref.ref(c)

>>> wr2 = weakref.ref(c)

>>> d[wr1] = 1

>>> d[wr2]
1

but this doesn’t:

>>> d = {}

>>> wr1 = weakref.ref(c, lambda *a, **kw: None)

>>> wr2 = weakref.ref(c, lambda *a, **kw: None)

>>> d[wr1] = 1

>>> d[wr2]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in __eq__
Exception: WTF
WTF

(it’s enough for one of the refs to have a callback).

If someone feels like diving into the C implementation of ref(), be my guest. :) But I figure my responsibilities end here.


All that to say: what do you want me to do about it? Add a stern warning about half-initialized instances and that they may behave weirdly?

@hynek hynek added Documentation and removed Bug Regression labels Nov 11, 2017

@glyph

This comment has been minimized.

Contributor

glyph commented Nov 12, 2017

Upon further consideration, you're right; the code was previously buggy, and this makes it a little easier to tell.

However, hash=False, cmp=True is a nonsense state that attrs should have prevented in the first place, and that state is still possible (and in fact the default if you ask to hash by identity without realizing this nuance); this just makes it blow up in a really weird edge case.

@glyph glyph changed the title from 17.3.0 cmp/default/hash/WeakKeyDictionary interaction regression to 17.3.0 cmp/default/hash/WeakKeyDictionary interaction Nov 12, 2017

@glyph

This comment has been minimized.

Contributor

glyph commented Nov 12, 2017

I think that when I filed this issue originally I was under the misguided impression that hash=False used to imply cmp=False if cmp was unspecified; if that had changed, it would indeed have been a regression. (But by the time I had written my second comment I'd realized that was not the case.)

@glyph

This comment has been minimized.

Contributor

glyph commented Nov 12, 2017

(Also: maybe Klein should stop trying to hash instances with a Klein attribute on them and should try to find some other way to squirrel away its state?)

@hynek

This comment has been minimized.

Member

hynek commented Nov 13, 2017

I still can’t wrap my head around the fact that it’s supposed to be a mistake to make an object hashable by ID but comparable by value – to me that’s like comparing pointers/references vs comparing values.

If I knew what unpractical tire fire this is when I started writing attrs, hash and cmp would have been off by default. It’s so upsetting someone ever thought this might be a good idea and I bet 99.999% of Python programmers don’t realize this.

I think I’ll have to write a blog about all of this to order my thoughts and look forward to the reactions.

@glyph

This comment has been minimized.

Contributor

glyph commented Nov 13, 2017

I still can’t wrap my head around the fact that it’s supposed to be a mistake to make an object hashable by ID but comparable by value – to me that’s like comparing pointers/references vs comparing values.

I'm happy to explain why __eq__ is part of the hash interface - but I'm not sure if that's what you're saying you don't understand here :)

@glyph

This comment has been minimized.

Contributor

glyph commented Nov 20, 2017

@hynek

This comment has been minimized.

Member

hynek commented Nov 21, 2017

The question is what next. I want to move to semantic enums medium term but even then people will probably be confused. As an immediate solution I guess I could rewrite the post for attrs and make it the first narrative doc?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment