Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unstable hash for different data input order #1049

Closed
Pedrexus opened this issue Dec 27, 2022 · 4 comments
Closed

Unstable hash for different data input order #1049

Pedrexus opened this issue Dec 27, 2022 · 4 comments
Labels
bug Something isn't working

Comments

@Pedrexus
Copy link

Pedrexus commented Dec 27, 2022

Describe the bug
The hash of a DictConfig is not stable. hash(cfg) changes although python dict comparison returns true.

The problem for me is that when reading from a yaml file, different runs of the same script produce different hashes.

To Reproduce

from omegaconf import DictConfig

d1 = DictConfig({"a": 1, "b": 2})
d2 = DictConfig({"b": 2, "a": 1})

d1 == d2, hash(d1) == hash(d2)
# (True, False)

It seems, the hash magic method is defined in a way that is not stable for different dictionary input orders:

def __hash__(self) -> int:
    return hash(str(self))

Expected behavior
I think hash should be the same if dict data is the same, regardless of input order.

Not sure actually if this behavior is expected by the project authors, but it was unexpected to me.

Additional context
The solution I'm using now is:

from omegaconf import OmegaConf
import hashlib
import json

d1 = DictConfig({"a": 1, "b": 2})
d1_hash = hashlib.sha1(json.dumps(OmegaConf.to_container(d1), sort_keys=True).encode()).hexdigest()
@Pedrexus Pedrexus added the bug Something isn't working label Dec 27, 2022
@Jasha10
Copy link
Collaborator

Jasha10 commented Dec 28, 2022

I think hash should be the same if dict data is the same, regardless of input order.

To push back on this, The order of keys in a python dict is part of the dict's api.
For example, calling list({"a": 1, "b": 2}.items()) gives a different result from calling list({"b": 2, "a": 1}.items()). For this reason, I think it makes sense for the hash of objects to be different if the order of keys is different.

@Pedrexus
Copy link
Author

I understand. Thanks for the reply. It's fine to leave it as a is, but I'd just point out that this only seems to be stable for python>=3.7, so in older python versions you could actually see the same object have different hashes upon a read event or such.

@Jasha10 Jasha10 closed this as not planned Won't fix, can't repro, duplicate, stale Dec 30, 2022
@Jasha10
Copy link
Collaborator

Jasha10 commented Dec 30, 2022

OmegaConf is soon dropping support for python3.6 anyway. (#791).

@omry
Copy link
Owner

omry commented Jan 10, 2023

It seems correct that hash of equal DictConfigs is the same if the objects are considered equal by ==.
This is very low pri because honestly, I am not sure what reason you have to hash those in the first place :).
I would not object to a pull request fixing it (maybe by using a variant of str that is sorting the keys from hash, or alternatively just make a proper faster implementation of hash).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants