Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subclasses of BaseModel can be hashable #1303

Closed
seansfkelley opened this issue Mar 11, 2020 · 7 comments
Closed

subclasses of BaseModel can be hashable #1303

seansfkelley opened this issue Mar 11, 2020 · 7 comments

Comments

@seansfkelley
Copy link

Feature Request

Output of python -c "import pydantic.utils; print(pydantic.utils.version_info())":

             pydantic version: 1.4
            pydantic compiled: False
                 install path: /Users/username/Library/Caches/pypoetry/virtualenvs/virtualenvname/lib/python3.6/site-packages/pydantic
               python version: 3.6.4 (default, Mar  1 2018, 18:36:50)  [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)]
                     platform: Darwin-18.7.0-x86_64-i386-64bit
     optional deps. installed: ['typing-extensions']

I use Pydantic extensively in place of dataclasses throughout my projects. It would be nice to be able to use some of the simpler types as dict keys, or to put into sets:

import pydantic

class Foo(pydantic.BaseModel):
  foo: str = "foo"

d = { Foo(): "bar" }

I tried writing a superclass/mixin to selectively add this behavior to existing models:

class HashableMixin:
    def __hash__(self):
        return hash(
            (type(self),) + tuple(getattr(self, f) for f in self.__fields__.keys())
        )

though this particular implementation has struggles, as it doesn't work when it's not the first in the list of inherited classes. I think this has something to do with Pydantic's initialization and maybe metaclasses, but I didn't dig too deep. So I wrote it as a decorator instead:

def hashable(cls):
    def h(self):
        return hash(
            (type(self),) + tuple(getattr(self, f) for f in self.__fields__.keys())
        )
    setattr(cls, "__hash__", h)
    return cls

which seems to work more or less alright, though I haven't really run it through its paces so I don't know if I've missed anything.

Anyway, it would be great to have this baked in, even if it were default off. Maybe with something on Config?

@seansfkelley seansfkelley changed the title Support hashable types subclasses of BaseModel can be hashable Mar 11, 2020
@samuelcolvin
Copy link
Member

I would create your own MyBaseModel and use that in place of BaseModel to accomplish this.

from pydantic import BaseModel

class MyBaseModel(BaseModel):
    def __hash__(self):
        return hash((type(self),) + tuple(self.__dict__.values()))

class Foo(MyBaseModel):
    foo: str = 'foo'

f = Foo()
d = {f: 'bar'}
print(d)

Anyway, it would be great to have this baked in, even if it were default off. Maybe with something on Config?

I would argue that this is:

  1. very easy to achieve as demonstrated above
  2. Not that common requirement
  3. Often would require custom implementations for the tradeoff of performance vs. completeness (e.g. accepting more complex field values, non-hashable fields, sub models, __fields_set__)

You'd basically need to implement an entire hashable subset of python e.g. for lists, dicts, custom types etc.

So let's stick with the above of people people implementing their own solution for now.

@seansfkelley
Copy link
Author

Yeah, I understand the desire to not bloat the API surface area. I was just missing this feature from dataclasses.

FWIW I don't think you'd need to implement a hashable subset of the standard library: I don't consider (or want) models with lists to be hashable, which is a nice side effect of the implementation above that just forwards to the tuple hash function.

Anyway, for anybody who decides to do this on their own, note that you should also look into the "immutability" flag that Pydantic offers. I'm only trying to prevent me from shooting myself in the foot so it doesn't need to be watertight (and indeed, it's a pain to try to make it so when subclasses are involved), but it's good hygiene to require the flag to be set.

@Congee
Copy link

Congee commented Jun 3, 2020

  1. Often would require custom implementations for the tradeoff of performance vs. completeness (e.g. accepting more complex field values, non-hashable fields, sub models, fields_set)

This makes a lot of sense. It's just counter intuitive to me, because I treat pydantic.BaseModel as a drop-in replacement of dataclass.

Can you please document hash is unimplemented for pydantic.BaseModel?

@samuelcolvin
Copy link
Member

Pr welcome to add it to documentation.

@antonl
Copy link

antonl commented Jun 11, 2020

By the way, you can use the pydantic dataclass support for this, maybe it's sufficient?

from pydantic.dataclasses import dataclass as pyd_dataclass

@pyd_dataclass(eq=True, frozen=True)
class Foo:
  foo: str = "foo"

d = { Foo(): "bar" }

@cslominski
Copy link

Building on antonl's response, here is a custom decorator in pydantic's style:

import typing
import pydantic

def hashable_dataclass(_cls: typing.Optional[typing.Type[typing.Any]] = None,
*,
init: bool = True,
repr: bool = True,
order: bool = False,
unsafe_hash: bool = False,
config: typing.Type[typing.Any] = None,
) -> typing.Union[typing.Callable[[typing.Type[typing.Any]], typing.Type['Dataclass']], typing.Type['Dataclass']]:
def wrap(cls: typing.Type[typing.Any]) -> typing.Type['Dataclass']:
return pydantic.dataclasses.dataclass(cls, init=init, repr=repr, eq=True, order=order, unsafe_hash=unsafe_hash,
frozen=True, config=config)

if _cls is None:
    return wrap

return wrap(_cls)

@4thel00z
Copy link

4thel00z commented Apr 12, 2024

Good stuff, but don't use __dict__. Use .dict() instead:

class HashableModel(BaseModel):
    def __hash__(self):
        return hash((type(self),) + tuple(self.dict().items()))

If you need stable hash, use this guy:

from hashlib import sha512

from pydantic import BaseModel


class HashableModel(BaseModel):
    def __hash__(self):
        return int.from_bytes(sha512(f"{self.__class__.__qualname__}::{self.json()}".encode('utf-8', errors='ignore')).digest())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants