Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It doesn't make sense that cmp and hash are set independently, does it? #136

Closed
njsmith opened this issue Jan 18, 2017 · 22 comments
Closed
Labels
Milestone

Comments

@njsmith
Copy link

njsmith commented Jan 18, 2017

Right now, cmp= and hash= are treated as independent options, both in attr.s() and attr.ib(). This does't make much sense to me.

If you do attr.s(cmp=True, hash=False) then you get a broken class that violates Python's invariant that objects which compare equal must hash equal (if they are hashable at all).

If you do attr.s(cmp=False, hash=True) then you get a class with a really weird hash that has lots of collisions (Foo(x=1) and Foo(x=1) are different objects so they compare non-equal, but their hashes are the same.) Also, it is broken if the object is mutable, because the hash can change over time. ...and now that I think about it, objects that have cmp=True also have the same problem with mutability. Actually I guess all of my attrs classes are currently illegal, because the defaults are to generate illegal classes. I guess I should go fix that.

Anyway, I think the only plausible configurations are:

  • cmp=True, frozen=True, hash=True
  • cmp=True, frozen=True, hash undefined (attrs doesn't have a nice way to do this right now I think?)
  • cmp=False, hash=False (use object.__hash__)
  • cmp=False, hash undefined

And then for attr.ib, why are cmp and hash separate arguments? Wouldn't it make more sense to drop the hash argument, and just use the logic that if attr.ib(cmp=True) and we're generating a hash method at all, then we should include this attrib?

@hynek
Copy link
Member

hynek commented Jan 18, 2017

There’s exactly one reason: unhashable attributes. Sometimes you want an attribute contain a dictionary and still be comparable.

@hynek
Copy link
Member

hynek commented Jan 18, 2017

Oh and sometimes you want to implement __hash__ yourself.

@njsmith
Copy link
Author

njsmith commented Jan 18, 2017

Sure, the "contain a dictionary and still be comparable" case is cmp=True, hash raises an error, and the implement-__hash__-yourself case is cmp=False. Once __eq__ is defined, though, there are exactly two correct ways to implement __hash__: either it hashes exactly the same things that __eq__ cares about and those things are immutable, or it else raises an exception.

Currently there are 2^3 = 8 possible settings for cmp/hash/frozen when calling attr.s, but:

  • if frozen=False, then hash=True is just broken (I guess modulo some edge cases around classes where some attributes are immutable in practice, and those are the only ones that participate in comparison; because of missing Simple way to make individual attributes immutable? #133 then the only way to express this to attrs is by telling it that the whole class is mutable and then just carefully not touching the immutable subset)
  • if cmp=False, then hash=True is just broken
  • if cmp=True, then hash=False is just broken, unless the user also adds __hash__ = None to the body of their class

And when calling attr.ib, there's literally never any options about how to set hash, right? Either it doesn't matter (if you used attr.s(hash=False), or else it must be set to match cmp.

My main concern is that plain @attr.s by default generates broken classes, and the obvious fix of using @attr.s(cmp=False) doesn't fix it. So the minimal correct ways of calling @attr.s are either @attr.s(frozen=True) or @attr.s(cmp=False, hash=False), right?

I think in a perfect world the API I'd prefer is: attr.s takes two arguments, cmp= and frozen= (well and other ones that aren't relevant here, but not hash=). The defaults are frozen=True, and cmp=<whatever frozen is set to>. (Rationale: 99% of the time you want frozen == cmp, and immutable is a better default than mutable.) If cmp=True and frozen=True, then the default is to generate a __hash__. If cmp=True and frozen=False, then the default is to set __hash__ = None. If cmp=False, then the default is not to generate a __hash__ at all. In all cases if the user defined __hash__ then we leave it alone. (This covers the weird cases like if you to explicitly mark an immutable comparable object as being non-hashable, it's written:

@attr.s(cmp=True, frozen=True)
class Foo:
    __hash__ = None

)

Of course backwards-compatibility is a problem...

njsmith added a commit to python-trio/trio that referenced this issue Jan 19, 2017
@Tinche
Copy link
Member

Tinche commented Jan 19, 2017

Ugh, this is a complicated issue.

I think #3 is relevant here. Some of the ideas from #3 were having attr.value and attr.object as higher-level, more opinionated wrappers for attr.s that made sure the affected classes made sense according to certain internal rules. Maybe defining these could be used to get around backwards compatibility problems.

I agree we're maybe too trigger happy with defining hashes for classes. Making your class hashable is a complex thing. Basically the only reason to make your class hashable is to use it as a key in a dict, or to put it in a set, which introduces additional constraints, which makes this even more complex.

@attr.s(hash=False)
class A:
    pass

hash(A())
-9223372036575954421

This is probably a bug. (Interestingly, it will work properly and raise an exception if slots=True, because that will recreate the class.)

I think an easy case could be made for the following change: if the class is slots=False, cmp=True, hash=False and Cls.__hash__ is object.__hash__, set Cls.__hash__ to None (to make the class unhashable).

@hynek
Copy link
Member

hynek commented Feb 4, 2017

There’s a lot of issues raised here and honestly I can’t follow all of them. Just as a preamble/excuse: the reason you can set all these things is that attrs started out with a different scope. More of a class construction kit so the users was supposed to know what they did and invariants weren’t caught. The reality changed though (which makes me happy of course) and this dark little corner is just something nobody really cared about (which is also why I’m OK with being a bit more aggressive about fixing it).

What I understand:

  • If hash=False is set for the whole class, we shouldn’t just not set a hash, we should set __hash__ to None (instead of falling though to object’s). I agree this is a bug and I’d be willing allow this change. If anyone was relying on this behavior it's most likely a bug and it will explode loudly as it should. Maybe it’s necessary to allow the use to signal they want to inherit the __hash__ because they do (god forbid) subclass something else than object)? That makes me think that it would be best to use hash=False for current behavior and hash=None for the correct one (and document it properly).
  • We have a related issue for cmp=False because Foo(x=1) == Foo(x=1) is False but hash(Foo(x=1)) == hash(Foo(x=1)) is True. hash=None would take care of it.
  • I was coming up with examples for attr.ib(hash=False) but the answer is always “yep, but then they can also carry cmp=False”). :)
  • I never really thought about it, but yeah mutable objects shouldn’t have a __hash__ at all. → If a class defines mutable objects and implements an __eq__() method, it should not implement __hash__(), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).

What I don’t understand

What’s wrong with cmp=True, hash=None, frozen=False because I’d tend to make that the new default. I know y’all love immutable data, but let’s think of the 99,999% of developers out there that don’t want to slow down their software. :)

This is in line with what the Python docs say, and if people want to have it hashed, they have to put extra effort into it. But it’s still comparable.

What Should Be Done

  1. Add the case of hash=None and promote it.
  2. Maybe even make it default because it might be masking bugs and is currently in violation with the Python spec.
  3. Let’s assume hash, cmp, slots, and frozen are more of internal knobs. You list four plausible configurations, I think there’s a fifth. We should give them names and add them as high-level, opinionated templates. Which is basically clean division between "value" and "object" #3. It just makes me sad, that it breaks the naming. @attr.s should not become an implementation detail. Going that way is even more fodder for [RFC] Add attrs package #131.
  4. I’m not sure what to do about attr.ib(). With the new thinking, unhashable is the default. It seems weird to add something like hashable=True to all attributes. But maybe this is the lowest priority of them all.

I think 1+2 is something we should tackle before the new release.


In any case thank you for being patient with me. It took me two coffees, to write this down and honestly I don’t think anyone really thought this completely though.

I’d also welcome input from @glyph as usual :)

@njsmith
Copy link
Author

njsmith commented Feb 4, 2017

AFAICT the cases that make sense are:

  • mutable object, inherit __eq__ from superclass, inherit __hash__ from superclass (= the default behavior if defining a class without using attrs at all)
  • mutable object, attrs-generated __eq__, __hash__ = None
  • immutable object, inherit __eq__ from superclass, inherit __hash__ from superclass
  • immutable object, attrs-generated __eq__, attrs-generated __hash__

My proposal above was that we map these 4 cases to the 4 argument patterns, respectively:

  • frozen=False, cmp=False
  • frozen=False, cmp=True
  • frozen=True, cmp=False
  • frozen=True, cmp=True

So I don't see any use for a hash= argument at all really... it's always implied by the other arguments. And this nicely avoids the issue you mention about having a split between high-level "templates" and the actual attr.s arguments.

The one additional wrinkle is that in principle someone might want something really odd, like, "this is an immutable class with an attrs-defined __eq__, so it totally could be hashable, but for some reason I have decided that I never want to use it as a dictionary key, and I want to get an error if that happens". My suggestion to cover that case is to say that if the user explicitly defines a __hash__ in the class body, then that we always leave it alone. (So this would be written like:

@attr.s(frozen=True, cmp=True)
class Foo:
    __hash__ = None

)

Does this cover all of your 5 cases? I'm not quite clear on what they are, since the 5th one you added is already on my list of 4 :-).

And yeah of these four cases probably the best default is frozen=False, cmp=True.

@hynek
Copy link
Member

hynek commented Feb 4, 2017

My fifth is frozen=False, cmp=True, hash=None which isn’t in your original list. The use of the hash argument might be debatable but it’s here to stay so we have to take it into account when modeling the cases. :)

My understanding is the following:

hash=False doesn’t make sense, except when subclassing:

class Base:
    def __hash__(self):
        return something_something_very_smart_that_we_cannot_anticipate

@attr.s(hash=False)
class C(Base):
    ...

Main cases:

  • frozen=False, cmp=True, hash=None: normal Python class.
  • frozen=False, cmp=False, hash=None: immutable container that may contain mutable/unhashable attributes
  • frozen=True, cmp=True, hash=True: immutable container with only hashable contents

Valid but kind of pointless:

  • frozen=False, cmp=False, hash=None

After we’ve added hash=None (and made it default), the only deficiency seems to be that there should be a nicer/more idiomatic way to define immutable containers (→ #3). People can still manually break their classes but that’s their problem if they really want to, I’m not their nanny.

Am I missing something?


One final thought: in attrs, True/False for hash etc always meant “attrs won’t attach one” not “class won’t have one”. Which seems fine except for hash. Again, when attrs started, I had a different scope in mind…

@hynek hynek added the Bug label Feb 4, 2017
@hynek hynek added this to the 17.1 milestone Feb 4, 2017
@njsmith
Copy link
Author

njsmith commented Feb 4, 2017

hash=False doesn’t make sense, except when subclassing:

I use frozen=False, cmp=False, hash=False all the time, for when I want object-style compare-and-hash-by-identity but still want to take advantage of attrs' other conveniences. (I know this is kind of an abuse of attrs compared to how you were imagining it would be used, but those conveniences are pretty convenient! :-)) I just checked and in my current code-base I have 26 uses of @attr.s, and they're split exactly 50/50 between ones that use frozen=True, cmp=True, hash=True and ones that use frozen=False, cmp=False, hash=False.

but it’s here to stay so we have to take it into account when modeling the cases. :)

Why does it have to stay? The only case you've come up with where it's meaningful is hash=None, which would be a new feature anyway -- all the existing cases are either broken or redundant, no?

@hynek
Copy link
Member

hynek commented Feb 4, 2017

Why does it have to stay? The only case you've come up with where it's meaningful is hash=None, which would be a new feature anyway -- all the existing cases are either broken or redundant, no?

The subclassing use-case is still existent. I want to discourage it’s use but I’m not comfortable to take away the knob altogether.

The only thing you’re arguing against is hash=False, not hash itself, isn’t it? You said yourself, that you use it sometimes?

Sorry if I’m too dumb to follow. :|


Anyhow: can we agree, that adding hash=None, making it default, (and attending PyCon under a pseudonym) would fix the actual bug in attrs?

There’s two additional feature requests that should be handled in separate issues:

  1. better way to define a class, where hash=True makes sense.
  2. do something about attr.ib(cmp/hash).

Or is there something I’m missing?

@njsmith
Copy link
Author

njsmith commented Feb 4, 2017

The subclassing use-case is still existent. I want to discourage it’s use but I’m not comfortable to take away the knob altogether.

subclassing is cmp=False, hash=False in your proposal, and cmp=False in my proposal, no?

Anyhow: can we agree, that adding hash=None, making it default, (and attending PyCon under a pseudonym) would fix the actual bug in attrs?

Yes, this would make it so that attrs never generates classes that break Python's rules without being explicitly asked to, because it's always legal to have __hash__ = None. The reason I prefer my proposal is that mine does that and also avoids boilerplate in the common cases, and also removes a bunch of options that don't make any sense (e.g. cmp=False, hash=True).

Concretely, for the project I mentioned that has 26 calls to @attr.s:

Right now, I have:

  • 13 frozen=True
  • 13 cmp=False, hash=False

With the hash=None-by-default approach, I'd have to change it to:

  • 13 frozen=True, hash=True
  • 13 cmp=False, hash=False

With my proposal, it becomes:

  • 13 frozen=True
  • 13 cmp=False

I.e. on this codebase then the hash=None-by-default approach actually increases the necessary boilerplate, while my proposal removes it.

@hynek
Copy link
Member

hynek commented Feb 4, 2017

Your proposal is pretty much my point 1 from above.

To my maintainer eyes things look like this:

  1. attrs does something wrong by default. Nobody really cares about this stuff so I’m gonna take the risk and fix it to do the right thing.
  2. It’s annoying to create certain types of classes. I’m very much opposed to add implicit effects by setting one or another. But I understand where you‘re coming from (and @glyph basically complained about the same thing).

Now, I really want to tackle those things separately because they’re separate to me. One bug and one feature. I’ll open a PR later for 1 unless you convince me it’s wrong but let’s talk about 2 to give your mind some peace. :)


I’m thinking to talk about “archetypes” (code name?)?

How does this look to you:

from attr.archetypes import hashable

@hashable
class Item:
    """Implies frozen=True, cmp=True, hash=True."""

The intent would be apparent from the name, not by squinting at the combination of parameters and figuring out side-effects that would vary by attrs release.

I’m open to other ways to achieve that. Adding more arguments to @attr.s seams like the wrong way tho? It’s pretty crowded already.

@ambv
Copy link
Contributor

ambv commented Feb 5, 2017

Hynek, my 3 eurocents.

Please don't introduce a new concept with archetypes, keep it simple. Nathaniel is right that your great anti-boilerplate library should stay an anti-boilerplate library.

cmp=

I also agree with Nathaniel's idea to effectively deprecate use of hash=. It's really simple to reason about this:

  • cmp=False means inherit both __eq__ and __hash__
    • with frozen=False the class is mutable
    • with frozen=True the class is immutable
  • cmp=True means generate both __eq__ and __hash__
    • with frozen=False the class is mutable so the generated __hash__ is None
    • with frozen=True the class is immutable so it's safe to generate an actual __hash__ method

The new default would be cmp=True, frozen=False. Note that with cmp=False there is no implicit magic at all.

hash=

As for hash=, it would have three possible values:

  • hash=None - automatic consideration based on rules above
  • hash=False - don't ever generate a __hash__ regardless of the rules above; that covers your sub-classing case
  • hash=True - generates a DeprecationWarning, as it is either redundant or can lead to invalid classes.

I've been bitten by hashes defined for mutable classes many times (Thrift classes for Python have this problem and badly defined attrs-based classes, too).

@njsmith
Copy link
Author

njsmith commented Feb 5, 2017

Another thing that might be worth pointing out explicitly: in Python 3, it's actually a feature of the language that defining an __eq__ method implicitly blocks inheritance of __hash__. If you define __eq__ but don't mention __hash__, then it's like you wrote __hash__ = None:

class Foo:
    def __eq__(self, other):
        pass
assert Foo.__hash__ is None

You might expect @attr.s(cmp=True, hash=False) to be equivalent to defining-__eq__-but-not-defining-__hash__, but unfortunately this actually isn't true right now (I guess because the interpreter magic happens before the attrs magic):

@attr.s(cmp=True, hash=False)
class Foo:
    pass
assert Foo.__hash__ is object.__hash__

@glyph
Copy link
Contributor

glyph commented Feb 7, 2017

This issue is sprawling and huge and I'm not sure I have fully worked through all the implications, but at this point I have seen the PR and I think I'm mostly comfortable with the consequences...

@hynek
Copy link
Member

hynek commented Feb 7, 2017

Im at the point of begrudgingly admitting that ?ukasz might be right.

Let True/False be and None does the right thing.

hynek added a commit that referenced this issue Feb 12, 2017
Our previous default behavior was incorrect.  Adding __hash__ has to be
deliberate action and only makes sense for immutable classes.

By default, hash now mirrors eq which the correct behavior.

Fixes #136.
hynek added a commit that referenced this issue Feb 12, 2017
Our previous default behavior was incorrect.  Adding __hash__ has to be
deliberate action and only makes sense for immutable classes.

By default, `hash` now mirrors `cmp` which the correct behavior per spec.

Fixes #136.
@hynek
Copy link
Member

hynek commented Feb 20, 2017

Fixed in #142

@Julian
Copy link
Member

Julian commented May 17, 2017

This issue is a bit too long for me to follow too, but was there consideration for deprecating the current behavior and emitting a warning rather than just changing it?

There were valid uses here, despite the ones that were broken.

@njsmith
Copy link
Author

njsmith commented May 17, 2017

@Julian: I won't debate the release procedures because I didn't follow the details, but I'm curious what you mean by "valid use cases". Obviously it's unfortunate that builds got broken no matter whose "fault" it is, but do you mean you have examples of cases where hashing mutable classes is the Right Thing To Do?

@Julian
Copy link
Member

Julian commented May 17, 2017

@njsmith no -- they're immutable classes (for whatever that means in Python), so their hashing behavior was correct and working semantically, but now that the default has changed, all of them need to specify hash=True.

Example working code:

from pyrsistent import m
import attr


@attr.s
class F(object):
    oo = attr.ib()


@attr.s
class G(object):
    oo = attr.ib(default=m())


print hash(F(oo=G()))

and:

⊙  virtualenv --quiet v && v/bin/pip install --quiet 'attrs<17.1.0' pyrsistent && v/bin/python test.py; rm -rf v                                                  julian@yoga ●
2579775048696017352

~/Desktop 
⊙  virtualenv --quiet v && v/bin/pip install --quiet 'attrs' pyrsistent && v/bin/python test.py; rm -rf v                                                         julian@yoga ●
Traceback (most recent call last):
  File "test.py", line 15, in <module>
    print hash(F(oo=G()))
TypeError: 'F' objects are unhashable

And yeah I care less about fault certainly :) than to just report "hey this change broke lots of our builds" in case it was unintentional or unconsidered and might be helpful. I know most of y'all, and that you've got pretty good intentions generally :)

I'm imagining that the thinking was something like that immutable is defined by using frozen=True, but we don't do that, and I've never even bothered reading what it does -- whatever it does, we don't need, our classes are immutable because they just are, and people don't mutate them.

@hynek
Copy link
Member

hynek commented May 17, 2017

I can only say that I thought long and hard about how to do this without breakage and found nothing. Doing a pro-forma deprecation (w/o warnings) might have saved a few but realistically speaking most breakage would happen anyway except that we'd have a buggy behavior for another year. I'm honestly sorry but nobody came up with anything better than ripping off the bandaid.

@glyph
Copy link
Contributor

glyph commented May 17, 2017

IMHO the thing to do "next time" (I mean, hopefully, for attrs, we never need to do this again) is to figure out a way to do the testing / fixing in downstream libs in advance, not to try to do something like this with a deprecation. The problem with a deprecation is that you don't want to have a long period where you have unnecessarily-wordy-and-explicit options be necessary folk wisdom.

@Julian
Copy link
Member

Julian commented May 18, 2017

@hynek totally understood (and appreciated as usual!).

Gonna follow up one more time just to make sure I understand here --

The way I would have expected this to go, given that there are both valid and invalid reasons for the default (which I want to double check is agreed upon by everyone?) would have been:

  • Make attr.s(cmp=True, hash=False) raise an Exception immediately -- there's no reason to do this that I can think of either, it's a broken class, and I'd treat it as a bug
  • attr.s(cmp=False, hash=True) is a tricky case -- for me the issue with this one which I think I've brought up before (back in characteristic days, but I assume attrs is the same) is that == and <= are conflated here -- personally I have lots of classes where I want to define equality but not partial order, so if I'd use that combination, it'd be to satisfy that case, and I'd define __eq__ and __ne__ myself. I don't remember if I have any cases like this already where I've done so, but I suspect I do somewhere
  • Make attr.s by itself warn with a DeprecationWarning (or some other warning if we care about Python's terrible defaults) within the defined __hash__ function, so that users with valid use cases have a chance to move -- this use case isn't broken, so with a warning, it'd have been possible to fix this before the next release say, without needing to rush and fix or rush and pin away from the new release

(I've probably missed a case or two, and probably I still am missing some important stuff given that I still haven't fully gone through this ticket, so apologies if I have)

@glyph I didn't fully follow your comment -- I know #191 affects twisted, but I and whoever else uses attrs have got lots of internal code which obviously wouldn't have been tested and fixed before a backwards incompatible release so those users need some form of notification right?

Anyways, yeah, again, deeeeefinitely not slinging any shade here, just trying to make sure I follow the thought process here. Thanks as usual all, in case it hasn't been said enough, all of you are great :).

Julian added a commit to Julian/txdatadog that referenced this issue May 18, 2017
Refs: python-attrs/attrs#136

* origin/attrs-17:
  hash=True everywhere
ricklupton added a commit to ricklupton/floweaver that referenced this issue Dec 6, 2017
They already should have been, as they're internally used as dictionary keys.
Changed in attrs v17.1.0 (see python-attrs/attrs#136)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants