Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save the hash value of tuples #43144

Closed
noamr mannequin opened this issue Apr 1, 2006 · 7 comments
Closed

Save the hash value of tuples #43144

noamr mannequin opened this issue Apr 1, 2006 · 7 comments

Comments

@noamr
Copy link
Mannequin

noamr mannequin commented Apr 1, 2006

BPO 1462796
Nosy @tim-one, @birkenfeld, @josiahcarlson
Files
  • savetuplehash.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2006-04-02.22:00:06.000>
    created_at = <Date 2006-04-01.20:59:25.000>
    labels = []
    title = 'Save the hash value of tuples'
    updated_at = <Date 2006-04-02.22:00:06.000>
    user = 'https://bugs.python.org/noamr'

    bugs.python.org fields:

    activity = <Date 2006-04-02.22:00:06.000>
    actor = 'noamr'
    assignee = 'none'
    closed = True
    closed_date = None
    closer = None
    components = ['None']
    creation = <Date 2006-04-01.20:59:25.000>
    creator = 'noamr'
    dependencies = []
    files = ['7133']
    hgrepos = []
    issue_num = 1462796
    keywords = ['patch']
    message_count = 7.0
    messages = ['49935', '49936', '49937', '49938', '49939', '49940', '49941']
    nosy_count = 4.0
    nosy_names = ['tim.peters', 'georg.brandl', 'josiahcarlson', 'noamr']
    pr_nums = []
    priority = 'normal'
    resolution = 'rejected'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue1462796'
    versions = []

    @noamr
    Copy link
    Mannequin Author

    noamr mannequin commented Apr 1, 2006

    (Copied from a post to python-dev):

    I've found out that the hash value of tuples isn't
    saved after it's calculated. With strings it's
    different: the hash value of a string is calculated
    only on the first call to hash(string), and saved in
    the structure for future use. Saving the value makes
    dict lookup of tuples an operation with an amortized
    cost of O(1).

    Saving the hash value means that if an item of the
    tuple changes its hash value, the hash value of the
    tuple won't be changed. I think it's ok, since:

    1. Hash value of things shouldn't change.
    2. Dicts assume that too.

    I tried the change, and it turned out that I had to
    change cPickle a tiny bit: it uses a 2-tuple which is
    allocated when the module initializes to lookup tuples
    in a dict. I changed it to properly use PyTuple_New and
    Py_DECREF, and now the complete test suite passes. I
    run test_cpickle before the change and after it, and it
    took the same time (0.89 seconds on my computer).

    @noamr noamr mannequin closed this as completed Apr 1, 2006
    @tim-one
    Copy link
    Member

    tim-one commented Apr 1, 2006

    Logged In: YES
    user_id=31435

    I don't want this: it increases the size of all tuple
    objects without obvious (let alone compelling) benefit. All
    strings are hashable, but not all tuples are. Python
    internals arrange to force string interning in many
    speed-sensitive cases, but only the empty tuple is interned
    and hashing the empty tuple goes fast anyway; that makes it
    less likely that a cached tuple hash will ever do any good.
    Finally, len(tuple) is typically small so the time needed
    for hash(tuple) is also typically small (for tuples and
    strings, the hash time is proportional to the number of
    elements, and strings are typically "bigger" this way).

    @noamr
    Copy link
    Mannequin Author

    noamr mannequin commented Apr 1, 2006

    Logged In: YES
    user_id=679426

    What about changing cPickle to avoid its dependency on
    tuples not saving their hash value?

    @josiahcarlson
    Copy link
    Mannequin

    josiahcarlson mannequin commented Apr 2, 2006

    Logged In: YES
    user_id=341410

    "Saving the value makes dict lookup of tuples an operation
    with an amortized cost of O(1)."

    This is an incorrect statement in regards to caching. The
    initial hashing costs whatever it takes to hash the tuple (
    Omega(len(tuple)) time at least), but subsequent hashes cost
    O(1). Ammortized over 2 hashes is still Omega(len(tuple))
    per hash, not O(1) as you stated. In order for
    ammortization to help the bounds, you would need to hash the
    tuple Omega(len(tuple)) times (assuming the contents of the
    tuple are hasable in O(1) time without ammortization), at
    which point you would get O(1) ammortization.

    @josiahcarlson
    Copy link
    Mannequin

    josiahcarlson mannequin commented Apr 2, 2006

    Logged In: YES
    user_id=341410

    I should also mention that the point of ammortized analysis
    is to properly bound time based on an expected number of
    operations. You are not doing this, which is why your
    ammortization is incorrect. You are right that it would be
    constant after you cache the hash, but it is not constant
    ammortized in general.

    @birkenfeld
    Copy link
    Member

    Logged In: YES
    user_id=849994

    As per Guido's -1, rejecting this patch.

    @noamr
    Copy link
    Mannequin Author

    noamr mannequin commented Apr 2, 2006

    Logged In: YES
    user_id=679426

    I just wanted to say that I believe that my analysis is
    correct: you can count the O(n) of the firsh hash with the
    creation of the tuple, which is obviously O(n). So if you
    count calculating the hash value as O(1), you'll end up with
    the correct O() for the complete program, even though the
    first hash isn't really O(1).

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    None yet
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants