Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change dataclasses hashing to use unsafe_hash boolean (default to False) #77110

Closed
ericvsmith opened this issue Feb 24, 2018 · 14 comments
Closed
Assignees
Labels
3.7 (EOL) end of life 3.8 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@ericvsmith
Copy link
Member

BPO 32929
Nosy @gvanrossum, @ericvsmith, @ned-deily, @stevendaprano
PRs
  • bpo-32929: Dataclasses: Change the tri-state hash parameter to the boolean unsafe_hash. #5891
  • [3.7] bpo-32929: Dataclasses: Change the tri-state hash parameter to the boolean unsafe_hash. (GH-5891) #5902
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ericvsmith'
    closed_at = <Date 2018-02-26.02:33:23.282>
    created_at = <Date 2018-02-24.01:44:53.790>
    labels = ['3.7', '3.8', 'type-bug', 'library']
    title = 'Change dataclasses hashing to use unsafe_hash boolean (default to False)'
    updated_at = <Date 2018-02-26.09:43:44.328>
    user = 'https://github.com/ericvsmith'

    bugs.python.org fields:

    activity = <Date 2018-02-26.09:43:44.328>
    actor = 'eric.smith'
    assignee = 'eric.smith'
    closed = True
    closed_date = <Date 2018-02-26.02:33:23.282>
    closer = 'eric.smith'
    components = ['Library (Lib)']
    creation = <Date 2018-02-24.01:44:53.790>
    creator = 'eric.smith'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 32929
    keywords = ['patch']
    message_count = 14.0
    messages = ['312686', '312688', '312702', '312729', '312738', '312763', '312825', '312827', '312829', '312834', '312871', '312872', '312873', '312905']
    nosy_count = 4.0
    nosy_names = ['gvanrossum', 'eric.smith', 'ned.deily', 'steven.daprano']
    pr_nums = ['5891', '5902']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue32929'
    versions = ['Python 3.7', 'Python 3.8']

    @ericvsmith
    Copy link
    Member Author

    See https://mail.python.org/pipermail/python-dev/2018-February/152150.html for a discussion.

    This will remove the @DataClass hash= tri-state parameter, and replace it with an unsafe_hash= boolean-only parameter.

    From that thread:

    """
    I propose that with @DataClass(unsafe_hash=False) (the default), a __hash__
    method is added when the following conditions are true:

    • frozen=True (not the default)
    • compare=True (the default)
    • no __hash__ method is defined in the class

    If we instead use @DataClass(unsafe_hash=True), a __hash__ will be added
    regardless of the other flags, but if a __hash__ method is present, an
    exception is raised.

    Other values (e.g. unsafe_hash=None) are illegal for this flag.
    """

    @ericvsmith ericvsmith self-assigned this Feb 24, 2018
    @ericvsmith ericvsmith added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Feb 24, 2018
    @stevendaprano
    Copy link
    Member

    This is a truly awful name, there's nothing unsafe about the hash
    parameter. We rightly hesitate to label *actual* unsafe functions, like
    eval and exec, with that name. Are we committed to this name?

    @ericvsmith
    Copy link
    Member Author

    That's a quote from Guido. There weren't any better ideas on the python-dev thread.

    @gvanrossum
    Copy link
    Member

    It is important to convey the idea that if you are thinking of using this you are probably doing something fishy.

    An alternative could be _hash, which at least indicates it is not for general use, like sys._getframe().

    @ericvsmith
    Copy link
    Member Author

    Note that this class (from the test suite) will now raise an exception:

            @dataclass(unsafe_hash=True)
            class C:
                i: int
                def __eq__(self, other):
                    return self.i == other.i

    That's because it has a __hash__, added when __eq__ is defined. I think we're better off with the rule being:

    If unsafe_hash=True, raise an exception if __hash__ exists and is not None. Otherwise add __hash__.

    That's how it used to be with hash=True (in 3.70b1).

    Or is that what you meant by "but if a __hash__ method is present, an exception is raised"? Notice the word "method", instead of attribute.

    @gvanrossum
    Copy link
    Member

    Sorry, I used imprecise language. What you propose is fine.

    On Feb 24, 2018 09:54, "Eric V. Smith" <report@bugs.python.org> wrote:

    Eric V. Smith <eric@trueblade.com> added the comment:

    Note that this class (from the test suite) will now raise an exception:

        @dataclass(unsafe_hash=True)
        class C:
            i: int
            def \_\_eq__(self, other):
                return self.i == other.i
    

    That's because it has a __hash__, added when __eq__ is defined. I think
    we're better off with the rule being:

    If unsafe_hash=True, raise an exception if __hash__ exists and is not
    None. Otherwise add __hash__.

    That's how it used to be with hash=True (in 3.70b1).

    Or is that what you meant by "but if a __hash__ method is present, an
    exception is raised"? Notice the word "method", instead of attribute.

    ----------


    Python tracker <report@bugs.python.org>
    <https://bugs.python.org/issue32929\>


    @ericvsmith
    Copy link
    Member Author

    Here's the simplest way I can describe this, and it's also the table used inside the code. The column "has-explicit-hash?" is trying to answer the question "is there a __hash__ function defined in this class?". It is set to False if either __hash__ is missing, or if __hash__ is None and there's an __eq__ function. I'm assuming that the __hash__ is implicit in the latter case.

    # Decide if/how we're going to create a hash function.  Key is
    #  (unsafe_hash, eq, frozen, does-hash-exist).  Value is the action to
    #  take.
    # Actions:
    #  '':          Do nothing.
    #  'none':      Set __hash__ to None.
    #  'add':       Always add a generated __hash__function.
    #  'exception': Raise an exception.
    #
    #                +-------------------------------------- unsafe_hash?
    #                |      +------------------------------- eq?
    #                |      |      +------------------------ frozen?
    #                |      |      |      +----------------  has-explicit-hash?
    #                |      |      |      |
    #                |      |      |      |        +-------  action
    #                |      |      |      |        |
    #                v      v      v      v        v
    _hash_action = {(False, False, False, False): (''),
                    (False, False, False, True ): (''),
                    (False, False, True,  False): (''),
                    (False, False, True,  True ): (''),
                    (False, True,  False, False): ('none'),
                    (False, True,  False, True ): (''),
                    (False, True,  True,  False): ('add'),
                    (False, True,  True,  True ): (''),
                    (True,  False, False, False): ('add'),
                    (True,  False, False, True ): ('exception'),
                    (True,  False, True,  False): ('add'),
                    (True,  False, True,  True ): ('exception'),
                    (True,  True,  False, False): ('add'),
                    (True,  True,  False, True ): ('exception'),
                    (True,  True,  True,  False): ('add'),
                    (True,  True,  True,  True ): ('exception'),
                    }

    PR will be ready as soon as I clean a few things up.

    @gvanrossum
    Copy link
    Member

    I don't know if I'm unique, but I find such a table with 4 Boolean keys
    hard to understand. I'd rather see it written up as a series of nested 'if'
    statements.

    @ericvsmith
    Copy link
    Member Author

    if unsafe_hash:
            # If there's already a __hash__, raise TypeError, otherwise add __hash__.
            if has_explicit_hash:
                hash_action = 'exception'
            else:
                hash_action = 'add'
        else:
            # unsafe_hash is False (the default).
            if has_explicit_hash:
                # There's already a __hash__, don't overwrite it.
                hash_action = ''
            else:
                if eq and frozen:
                    # It's frozen and we added __eq__, generate __hash__.
                    hash_action = 'add'
                elif eq and not frozen:
                    # It's not frozen but has __eq__, make it unhashable.
                    #  This is the default if no params to @dataclass.
                    hash_action = 'none'
                else:
                    # There's no __eq__, use the base class __hash__.
                    hash_action = ''

    @ericvsmith
    Copy link
    Member Author

    I've been focused on getting the code fix in for the next beta release on 2018-02-26, and leaving the possible parameter name change for later. I don't feel strongly about the parameter name. Right now it's unsafe_hash, per Guido's original proposal.

    @gvanrossum
    Copy link
    Member

    @steven.daprano the name was +1'ed by many people in the python-dev discussion as sending the right message. So I'm reluctant to change it. If you can cause a landslide of support for _hash there we can change it in b3.

    @ericvsmith
    Copy link
    Member Author

    New changeset dbf9cff by Eric V. Smith in branch 'master':
    bpo-32929: Dataclasses: Change the tri-state hash parameter to the boolean unsafe_hash. (bpo-5891)
    dbf9cff

    @ericvsmith
    Copy link
    Member Author

    I'm going to close this. Steven: if you gain support for a parameter name change, please open another issue.

    @ericvsmith
    Copy link
    Member Author

    New changeset 4cffe2f by Eric V. Smith (Miss Islington (bot)) in branch '3.7':
    bpo-32929: Dataclasses: Change the tri-state hash parameter to the boolean unsafe_hash. (GH-5891) (GH-5902)
    4cffe2f

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants